A High-Acceptance-Rate VxWorks Fuzzing Framework Based on Protocol Feature Fusion and Memory Extraction

Wang, Yichuan; Han, Jiazhao; Deng, Xi; Hei, Xinhong

doi:10.3390/fi17080377

Open AccessArticle

A High-Acceptance-Rate VxWorks Fuzzing Framework Based on Protocol Feature Fusion and Memory Extraction

¹

School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, China

²

Shaanxi Key Laboratory for Network Computing and Security Technology, Xi’an 710048, China

^*

Author to whom correspondence should be addressed.

Future Internet 2025, 17(8), 377; https://doi.org/10.3390/fi17080377

Submission received: 14 July 2025 / Revised: 9 August 2025 / Accepted: 19 August 2025 / Published: 21 August 2025

(This article belongs to the Special Issue Secure Integration of IoT and Cloud Computing)

Download

Browse Figures

Versions Notes

Abstract

With the widespread application of Internet of Things (IoT) devices, the security of embedded systems faces severe challenges. As an embedded operating system widely used in critical mission scenarios, the security of the TCP stack in VxWorks directly affects system reliability. However, existing protocol fuzzing methods based on network communication struggle to adapt to the complex state machine and grammatical rules of the TCP. Additionally, the lack of a runtime feedback mechanism for closed-source VxWorks systems leads to low testing efficiency. This paper proposes the vxTcpFuzzer framework, which generates structured test cases by integrating the field features of the TCP. Innovatively, it uses the memory data changes of VxWorks network protocol processing tasks as a coverage metric and combines a dual anomaly detection mechanism (WDB detection and heartbeat detection) to achieve precise anomaly capture. We conducted experimental evaluations on three VxWorks system devices, where vxTcpFuzzer successfully triggered multiple potential vulnerabilities, verifying the framework’s effectiveness. Compared with three existing classic fuzzing schemes, vxTcpFuzzer demonstrates significant advantages in test case acceptance rates (44.94–54.92%) and test system abnormal rates (23.79–34.70%) across the three VxWorks devices. The study confirms that protocol feature fusion and memory feedback mechanisms can effectively enhance the depth and efficiency of protocol fuzzing for VxWorks systems. Furthermore, this approach offers a practical and effective solution for uncovering TCP vulnerabilities in black-box environments.

Keywords:

IoT; fuzzing; TCP; VxWorks; system security; vulnerability detection

1. Introduction

The Internet of Things (IoT), defined as a collection of objects with embedded systems [1], enables interconnection or wireless communication and has become an indispensable part of our lives. It is projected that the number of connected IoT devices will reach 38 billion by the end of 2025 and rise to 50 billion by 2030 [2]. Meanwhile, IoT technologies have been widely applied in critical infrastructure, industrial sectors, and smart home domains. Critical infrastructures such as power plants, water resources, and transportation systems are vital to national operations, while smart home devices bring convenience to most people. Most of these IoT devices are embedded system objects with firmware and various applications.

However, security threats posed by software vulnerabilities in embedded systems are continuously increasing. For example, Mirai malware [3] infected millions of IoT devices and ordered them to launch large-scale cyberattacks. Due to these attacks, hundreds of thousands of web servers around the world have fallen into a denial of service. In fact, an attacker can also move sideways with vulnerable devices to achieve critical targets. For example, in the work-from-home scenarios during COVID-19, Trend Micro has reported that introducing vulnerable IoT devices to the household will expose employees to malware and attacks that could slip into a company’s network [4]. According to [5,6], more than 1.5 billion cyberattacks have been detected in the first half of 2021, targeting 50 billion embedded devices, including pacemakers, cars, and various IoT devices.

As a representative operating system of embedded systems, VxWorks is widely used in various mission-critical scenarios with its high reliability and real-time capabilities. VxWorks is renowned for its unparalleled deterministic performance. It is designed for a scalable, safe, secure, and reliable operating environment ideal for mission-critical computing systems with the highest demands [7]. However, even well-designed VxWorks systems are not invulnerable. Historical research has shown that high-risk exploitable vulnerabilities (such as buffer overflow vulnerabilities [8]) have existed in the core network components of VxWorks systems, which may lead to system crashes or even more severe consequences.

With the extensive integration of Internet technologies within embedded systems, particularly the widely used and network-capable VxWorks system, the security assurance of its network system has become particularly crucial. In the network framework of VxWorks, the transmission control protocol (TCP) stack serves as the core module for network communication. Consequently, the security of the TCP stack is directly linked to the stability and protection capability of the entire system. Especially in practical application scenarios, VxWorks is often employed in environments with extremely high security requirements, which further highlights the significance of research on the security of its network protocol stack. Therefore, this paper primarily focuses on the vulnerability detection methods of the TCP protocol in VxWorks. Due to the rich functionality of the TCP protocol (such as reliable transmission and congestion control), complex state model, and various possible exception handling mechanisms, correctly implementing them is challenging [9,10]. As a result, developers may inadvertently introduce serious bugs when implementing the TCP stack [11]. Owing to its complexity and wide application, the TCP has long been a key target for attackers. In recent years, research on vulnerabilities in the TCP has shown that its implementation may have serious security risks such as remote code execution and denial-of-service.

In the research on detecting errors in the TCP stack, some studies [12,13,14] have adopted model checking or static and dynamic analysis based on source code to detect errors in TCP implementations. However, these approaches require specific TCP expertise, complex configuration operations, and heavily rely on the source code of the TCP stack. This is undoubtedly difficult to implement for the closed-source VxWorks system. In contrast, network communication-based protocol fuzz testing technology has shown promising prospects in overcoming these issues and has become one of the mainstream methods for automated network protocol vulnerability discovery. Fuzzing is “an automated testing method that uses random data (from files, network protocols, API calls, etc.) as software input to generate a large number of test cases in order to find exploitable vulnerabilities” [15], first proposed by Miller et al. [16] in 1990. Although fuzzing is an effective technique for automatically detecting software vulnerabilities, applying this technology directly to embedded devices that lack visibility and have strong hardware dependencies is challenging [17,18]. First, the methods in [19,20,21,22,23,24,25,26,27] are all current fuzzing approaches targeting various application-layer protocols. They cannot directly control the implementation details or state machines of TCP, which belongs to the transport layer, making it difficult to directly apply these methods to TCP fuzz testing. Second, TCP communication packets adhere to strict syntactic specifications. Most random mutation strategies employed in traditional fuzzing inadvertently violate these grammatical rules, leading to immediate rejection of test cases during the pre-execution syntax validation phase. This highlights the inefficiency of generic mutation strategies for stateful transport-layer protocols. Finally, because internal execution information cannot be obtained from embedded system devices, most existing IoT network protocol fuzzers [19,22,24,25] work in a black-box manner. This leads to the optimization mutation strategies of seeds becoming random and blind, making the entire fuzzing process more like a brute-force attack.

In regard to challenges, in this paper, we focus on detecting vulnerabilities in TCP by sending messages to VxWorks devices based on network communication in a black-box environment. Therefore, to develop an efficient fuzz testing framework for the VxWorks TCP protocol stack, the following core challenges must be addressed:

Inherent Specificity of TCP. As a connection-oriented transport-layer protocol, TCP fundamentally differs from application-layer protocols in implementation. Existing network protocol fuzzing tools typically overlook critical TCP aspects, including connection states, sequence number synchronization, retransmission mechanisms, and complex option fields. These limitations render most traditional protocol test case generation methods inapplicable to TCP directly.
The black-box testing process lacks effective guidance. In network communication-based fuzz testing, the closed-source nature of VxWorks results in a lack of visibility during the fuzzing process. Consequently, it is almost impossible to obtain the system’s internal execution information to guide the fuzz testing process (as is performed in most typical black-box fuzzers). Therefore, there is a need for a lightweight solution in the black-box environment to acquire feedback information of VxWorks during fuzz testing. This information can then serve as a new coverage metric to guide and optimize subsequent testing.

To address the aforementioned challenges, we propose a high-acceptance-rate network fuzz testing framework for VxWorks, which integrates protocol features and memory extraction and is named vxTcpFuzzer. vxTcpFuzzer consists of three key technologies. First, we implement a fuzzing method based on protocol feature fusion. This method extracts TCP field features and integrates them into the test case generation process to produce highly structured test cases with a higher acceptance rate. Meanwhile, it takes into account the multiple states of TCP and state transitions and sequentially performs coverage testing on all server states. Second, we extract memory data of the network protocol processing tasks in VxWorks and use changes in task memory data during testing as feedback to guide the direction of fuzzing. Specifically, we can obtain the content of the memory area being executed by the network protocol processing task (tNet0) according to the value of its program counter (PC) register. This enables us to detect changes in the memory data executed by the task before and after each round of testing. The changes in task memory data are then used as feedback information during testing to form a new coverage metric, which guides and optimizes subsequent fuzzing. Finally, we implement a dual anomaly detection mechanism to detect whether anomalies occur in VxWorks during fuzzing. By improving the Wind River DeBug (WDB) detection mechanism and skillfully combining it with a heartbeat detection mechanism, a more comprehensive anomaly detection mechanism is achieved. Our main contributions are summarized as follows:

A novel fuzzing framework. We propose a network communication-based fuzzing framework, vxTcpFuzzer, specifically designed for TCP in VxWorks systems under black-box environments. vxTcpFuzzer can bypass the encapsulation logic of the local kernel protocol stack, construct TCP packets with arbitrary data, and perform fuzzing.
A new method. We adopt a new method to implement an automated, multi-state coverage TCP fuzzing framework, vxTcpFuzzer. vxTcpFuzzer includes a test case generation method that integrates protocol features, a feedback guidance method that extracts memory data, and a dual anomaly detection mechanism that detects the state of the test system from multiple aspects.
Implementation and vulnerability discovery. We implement the designed fuzzing framework vxTcpFuzzer and evaluate it on three types of VxWorks devices. During testing, six crashes were successfully triggered, verifying the effectiveness of the proposed framework. Meanwhile, a comparison with three advanced fuzzing schemes was conducted, revealing their inapplicability.

The remaining parts of this paper are structured as follows: Section 2 introduces the background and motivation of this work. Section 3 elaborates on the implementation details of the proposed fuzzing framework vxTcpFuzzer. Section 4 presents our experimental results and comparisons with three fuzzing schemes. Section 5 discusses the existing limitations of this work and prospects for future work. Finally, Section 6 provides a summary of this work.

2. Background and Motivation

2.1. Overview of the TCP Protocol

TCP is a connection-oriented, reliable, byte-stream-based transport layer communication protocol defined by the IETF’s RFC 793 [28]. In the simplified OSI model of computer networks, it performs the functions specified by the fourth layer, the transport layer. In practical applications, the TCP has different implementations, forming various TCP stacks. For example, there are kernel-level TCP stacks such as Linux TCP and FreeBSD TCP, as well as user-level TCP stacks like mTCP and TLDK. The VxWorks system employs a protocol stack modified from BSD4.4 TCP/IP, which has been optimized for real-time performance [29], including optimizations such as the addition of zero-copy technology at the TCP layer. However, all these TCP stacks follow the TCP/IP standard protocol and have the same protocol format and state machine.

The TCP packet has a fixed format. Figure 1 shows the format of the TCP protocol header. The TCP header is a fixed-length structure composed of different fields, which contain key information for reliable transmission control. These fields work together to ensure that TCP provides reliable, ordered, and error-free data transmission. Through the coordination of sequence numbers, acknowledgment numbers, and window sizes, TCP implements functions such as retransmission, flow control, and congestion control. To ensure that no packets are lost, TCP assigns a sequence number to each packet, and the sequence numbers also ensure the in-order reception of packets by the receiving entity. The receiving entity then sends back a corresponding acknowledgment (ACK) for each successfully received packet. Additionally, TCP uses a checksum function to verify whether there are errors in the data, calculating the checksum both during sending and receiving. Therefore, there are inter-packet relationships in the TCP, that is, each pair of packets must meet the appropriate IP addresses, port numbers, logically sequential sequence numbers (i.e., within the window), and correct checksum. For example, the acknowledgment number of the current packet should be equal to the sum of the sequence number and data length of the previous packet to be a correct acknowledgment number. During a TCP connection, only packets with appropriate IP addresses, port numbers, logically sequential sequence numbers, and correct checksums can be accepted by the other party.

Therefore, during TCP fuzzing, the generation or mutation of test cases must consider both the field features of TCP packets and the inter-packet correlation features. Otherwise, most generated test cases will be directly discarded by the tested system without any processing due to errors such as format violations or data verification failures, leading to a significant decline in fuzzing efficiency.

TCP is a stateful protocol, and its state model follows the basic state model defined in RFC 793. Figure 2 illustrates the TCP state machine model, which consists of 11 states and 20 state transitions. The 11 states can be categorized into client states and server states based on different entities. The server states include CLOSED, LISTEN, SYN_RCVD, ESTABLISHED, CLOSE_WAIT, and LAST_ACK.

As indicated by the TCP state machine model, when fuzzing a TCP stack, to improve code coverage, test cases should cover as many protocol states and state transitions as possible. Therefore, during network communication-based TCP fuzzing, it is necessary to perform coverage testing targeting different protocol states and state transitions.

2.2. Network Protocol Fuzzing Method

Currently, when using fuzzing to mine vulnerabilities in network protocols, the fuzzer typically acts as a client, while the network protocol implementation program runs on the server under test [30]. The fuzzer interacts with the network protocol implementation program through a specific port. The client generates and sends data packets, while also receiving response packets from the server under test. The server under the test receives data packets from the client, updates its internal state after processing the request, and returns the processing results to the client.

At present, communication-based network protocol fuzz testing methods can mainly be divided into two categories. The first category is black-box protocol fuzz testing, which is relatively fast. Representative works include Boofuzz [19], Peach [24], and IoTInfer [25]. Customized templates are used to generate test data packets and send them to the designated test ports. These tools infer the presence of vulnerabilities either through side-channel indicators such as response latency or by reconstructing a finite-state machine from observed traffic. However, writing protocol primitives requires substantial expert experience and manual intervention. Additionally, black-box fuzzing cannot obtain internal feedback from the server to improve the quality of test case generation. The inherent randomness leads to significant time wastage on ineffective test cases.

The second category is gray-box protocol fuzzing. Some researchers have transplanted gray-box fuzzing technologies to network scenarios. In 2020, AFLNET [21], a stateful gray-box fuzzer for network servers, was proposed. AFLNET extracts the internal state of the server by analyzing the content of response packets from the network server, enabling fuzzing for specific states. Meanwhile, it can obtain the code coverage of the server under test to improve the effectiveness of test cases. CGFuzzer (2022) [26] employed a coverage-guided generative adversarial network to learn Industrial Internet of Things protocol grammars and synthesize high-acceptance test cases, achieving significant coverage improvements. MPFuzz (2024) [27] further extended this line of work with a parallel fuzzing architecture that synchronizes critical fields across instances using protocol-specific information and refines generated packets via a semantics-aware optimization module, markedly enhancing parallel fuzzing efficiency.

However, both black-box and gray-box network protocol fuzzing methods are primarily designed for fuzz testing of application-layer protocols and cannot directly control the implementation details of the TCP belonging to the transport layer. Secondly, these methods generally rely on Sockets to send data for testing and cannot independently control the state changes of the TCP. At the same time, the TCP is fundamentally different from the application-layer protocols, and these fuzzing methods do not take into account the basic characteristics and functions of the TCP. Consequently, their test-case generation strategies and feedback mechanisms are not directly applicable to fuzzing the TCP stack in VxWorks, leading to a substantial degradation in testing efficacy.

2.3. The Task Characteristics of VxWorks

VxWorks is a high-reliability and real-time embedded operating system developed by Wind River Systems in the United States. Due to its high reliability and excellent real-time performance, it is widely used in various fields such as aviation, aerospace, medical, communication, and industry. Its representative customers include Boeing, Airbus, NASA, Samsung, Siemens, Huawei, and Cisco [31]. VxWorks is also a real-time multitasking operating system. Its kernel provides a basic multitasking environment, allowing a program to run as a series of independent tasks. Each task has its own thread and system resources. Therefore, in the VxWorks system, a task is the basic execution unit and the main object for resource allocation and scheduling. Each task has an independent execution environment, including a stack pointer (SP), a register set, and its own stack and data segment. VxWorks manages the state and attributes of each task through a task control block (TCB), which contains key information about the task, such as its priority, status, stack pointer, and program counter. In terms of memory management, VxWorks uses a partitioned memory model. All tasks share the same physical address space and ensure the privacy of local run-time data by allocating independent stack areas. In task scheduling, VxWorks uses a priority-based preemptive scheduling algorithm. The kernel supports task switching by saving the task’s context (including the program counter PC register, stack pointer SP, etc.). The PC register stores the address of the instruction currently being executed by the task and is the core pointer of the task’s running state. For multitasking systems, the PC register enables the operating system to restore the execution progress of a task during task switching. Therefore, by capturing the PC register values of tasks, the memory address ranges, currently accessed by the tasks, can be indirectly obtained. This allows for the precise positioning and extraction of memory data in the task’s running area.

Inspired by the characteristics of the PC register in VxWorks tasks, we can design a lightweight and non-intrusive task memory monitoring method. This method aims to address the challenge of acquiring internal execution information from the system under test during network fuzz testing, which is crucial for effectively guiding the direction of fuzz testing.

2.4. VxWorks Debugging Mechanism

VxWorks provides remote debugging capabilities through the WDB (Wind River Debug) RPC protocol. This protocol enables communication between a host and target devices, thereby facilitating task monitoring, memory access, and exception capture. In several existing studies [8,32], automated vulnerability detection methods specifically designed for VxWorks systems have been implemented. These studies utilize the WDB RPC protocol for target exception capture, a mechanism we refer to as the WDB detection mechanism.

The implementation of the WDB detection mechanism primarily relies on VxWorks’ inherent task exception handling. When VxWorks’ task exception handling mechanism detects an exception in a task, it proactively transmits relevant exception information to the connected host via the WDB RPC protocol. However, VxWorks’ task exception handling mechanism operates by jumping to corresponding exception handling routines based on the exception vector table. Consequently, certain unknown error types may fail to be correctly captured by VxWorks’ exception handling mechanism, and it is incapable of addressing exceptions involving complete network failures in VxWorks. Thus, the WDB detection mechanism has inherent limitations. It can only detect exception types that are capturable by VxWorks’ task exception handling, potentially resulting in missed exception cases.

3. Design and Implementation

This section elaborates on various methods for implementing the vxTcpFuzzer framework, primarily including a protocol feature fusion fuzzer, a memory feedback utilization method, and a dual anomaly detector.

3.1. Framework

This paper takes the TCP transport layer protocol of VxWorks as the research object, exploring how to develop a lightweight, high-acceptance-rate, and practical fuzz testing method for the TCP protocol in a black-box environment. As shown in Figure 3, it is the overall workflow of the vxTcpFuzzer framework.

One of the core objectives of vxTcpFuzzer is to systematically cover and test all critical states and their transitions in the TCP server. Therefore, before each round of fuzz testing targeting a specific state (e.g., SYN_RCVD), a protocol status activation phase is essential to precisely drive and confirm that the system has reached the target state. First, based on the transition conditions defined in the TCP state machine model (Figure 2) as shown in Table 1, vxTcpFuzzer pre-constructs a set of specific, syntactically correct TCP packets (referred to as the “activation corpus”). These packets are specifically designed to reliably drive the VxWorks TCP server from its current state to the target state under test (e.g., sending a SYN packet to drive the system to the SYN_RCVD state). Then, prior to the start of each fuzz testing round, vxTcpFuzzer sends the corresponding packets from the activation corpus and analyzes the response packets from VxWorks (as shown in the “Response Packets” column in Table 1). By parsing information such as flag bits (e.g., SYN+ACK) in the response packets, vxTcpFuzzer confirms whether the system has successfully entered the expected target state. For instance, when testing the SYN_RCVD state of the TCP protocol, we first send a TCP packet with the SYN flag set (trigger condition). Once VxWorks responds with a {SYN, ACK} packet (state confirmation signal), it indicates that the system has successfully entered the SYN_RCVD state, allowing the initiation of fuzz testing for this specific state.

Notably, CLOSED in Table 1 is not a genuine server state but a hypothetical starting/ending point. Therefore, our testing primarily focuses on the other five TCP server states.

The framework in our fuzz testing process is primarily divided into three components. The first is the Protocol Feature Fusion Fuzzer, which analyzes and extracts the feature attributes of each TCP field and integrates them with test case generation to implement a protocol feature fusion-based test case generation method (Section 3.2).

The second component is the Memory Feedback Utilization Method. During protocol fuzz testing, memory data of corresponding network tasks is extracted before and after each test round. The variation in network task memory data is used as coverage metrics for test cases during fuzz testing (Section 3.3).

The final component is the Dual Anomaly Detector. Throughout the entire fuzzing process, a dual anomaly detector specific to the VxWorks system is employed to detect system anomalies and implement post-anomaly recovery of the testing environment (Section 3.4).

3.2. Protocol Feature Fusion Fuzzer

The quality of initial test cases significantly impacts the overall effectiveness of fuzz testing. In Section 2.2, we discussed the unique characteristics of the TCP protocol and its fundamental differences from other application-layer protocols. This renders traditional conventional protocol fuzzers incapable of generating valid TCP test cases. Test case generation methods targeting the TCP protocol require more sophisticated simulation of the protocol’s state machine, connection procedures, error handling, and interactions with the network environment to produce effective and feasible test cases. Therefore, we propose the design of a more sophisticated TCP fuzzer that incorporates detailed implementations of connection state transitions, flow control mechanisms, and synchronization of sequence and acknowledgment numbers. Our approach focuses on individual TCP fields, extracting and analyzing their characteristic attributes and inter-field dependencies, which are then integrated into the test case generation process. By leveraging a protocol feature fusion-based fuzzing method, we generate structured test cases that adhere to TCP syntax and semantics, thereby enhancing the overall quality of initial test cases.

The protocol feature fusion-based fuzzing method integrates characteristic attributes of TCP fields into the test case generation process. By leveraging these protocol-specific features, we can more effectively select mutation strategies to generate high-quality test cases. Regarding the field characteristic information of TCP, it primarily includes field name, data type, field length, typical field values, and correlation relationships between fields, among others. Specifically, the feature vector of each field is defined as a five-tuple structure: V = (Name, Type, Len, Default, Constraints). The role of each field’s feature vector is mainly reflected in three aspects: first, in strategy selection, mutation strategies are matched according to Type and Len. Second, in value constraints, default and constraints are used to ensure syntactic validity; third, in maintaining correlation relationships, logical consistency between fields is preserved through constraints. For example, the feature vector of the urgent pointer field is as follows: V_URG = (URG_PTR, uint16, 16, 0, {flag_U = 1}). This determines that its mutation must satisfy the following: when the URG flag in the flag field is activated, non-zero values need to be generated. Otherwise, the default value 0 is maintained. In this paper, we extract all TCP field features and represent them as feature vectors and then apply tailored mutation strategies based on distinct feature vectors. Table 2 outlines the customized generation strategies for different TCP protocol fields.

The specific content of the generation strategies implemented based on different features of TCP protocol fields is as follows:

For the source port field, a strategy of random acquisition after inspection is implemented. By randomly obtaining the port number of the current host and then inspecting whether the port is an idle port, the idle port number of the current host can be obtained. This value is then assigned to the source port field to change the value of the source port number on the basis of ensuring that the local host port is idle.

For sequence number and acknowledgment number fields, a strategy of timing dependency calculation is implemented. After using activation cases in the activation phase to drive the TCP service of the system to the state under test, the timing information contained in the system’s response to the activation cases is extracted. This timing information is then further processed using TCP message timing relationship calculation rules to derive the specific values of the sequence number and acknowledgment number acceptable to the test object in the next step. To more clearly demonstrate the implementation details of this strategy, we provide a basic example. Figure 4 illustrates the calculation process for the values of the sequence number and acknowledgment number fields in a test case during the testing of the SYN_RCVD state of the TCP.

For the data offset, reserved, flags, window, and urgent pointer fields, a progressive assignment strategy is implemented. The progressive assignment strategy generates a series of values for a field by incrementally assigning values from zero up to the maximum allowable value within the field’s defined length constraints. This approach systematically explores the entire value range of the field to generate diverse test inputs.

For the options field, a two-layer composite mutation strategy is implemented. The so-called two-layer composite mutation is divided into a lower-layer option value mutation and upper-layer option tuple position mutation. A lower-layer mutation involves performing mutation operations such as bit flipping, insertion, replacement, or deletion based on the initial value of the option value within the same type of option tuple. To increase the complexity of fuzzing, after the initial length of the option tuple has been mutated based on the initial value, it is assigned to the original type of option. The source of the option tuple is a typical option set formed by combining TCP common options obtained from documents such as RFC 793 [28], RFC 1323 [33], and RFC 5925 [34]. An upper-layer mutation involves fuzzing the option tuple that has undergone a lower-layer mutation through copying, crossing, and position replacement of the option tuple and finally obtaining a series of option lists with different values. To more clearly illustrate the generation process of this strategy, we present a simple example. Figure 5 depicts a process of implementing a two-layer composite mutation on the MSS option. First, the initial option undergoes a lower-layer mutation based on its initial value to generate a series of option tuples. Subsequently, an upper-layer mutation is performed on these option tuples to obtain the final option list.

The above generation strategies do not include mutations for the destination port and checksum fields. For these two fields, this paper chooses not to perform mutations. The destination port is set to the TCP service port opened by the VxWorks system under test during experiments (e.g., port 21 for FTP services). Fixing this field ensures that test cases always act on the protocol stack of the target service, preventing test cases from being discarded by underlying network modules due to incorrect port settings. As for the checksum field, its correct value needs to be dynamically calculated after the packet is constructed. Random mutations on this field would cause the packet to be directly discarded by the protocol stack during the verification phase, making it impossible to trigger anomalies in deep-seated state machines or memory processing logic. The design is precisely based on the relevant characteristics of TCP fields: The checksum serves as the basic verification mechanism for TCP reliable transmission, and any invalid checksum will result in packet discarding. The destination port, on the other hand, is the first-layer filtering condition for the protocol stack to distribute packets. Maintaining the validity of these two fields can significantly improve the probability of test cases passing the initial verification of the protocol stack, thereby enabling more effective testing of potential vulnerabilities in core protocol logic.

The field mutation strategies described above do not generate a single test case by assigning a value to each field independently. Instead, they consider factors such as TCP state transitions and synchronization of sequence/acknowledgment numbers. Test cases are generated by first determining values for critical fields (source port, destination port, sequence number, and acknowledgment number) and then combining these with values from other fields. This approach ensures that each test case passes the initial validation of the target system’s protocol stack, thereby enhancing the quality of individual test cases.

In this paper, a fuzzing method based on protocol feature fusion is used to generate the initial test cases for the fuzzing process, which are then placed into the test case pool.

After the generated initial test cases are executed, the Havoc mutation algorithm proposed in Section 3.3.2 utilizes the saved seeds to generate new test cases for continuous testing. This process iterates cyclically until the user halts the program or a timeout occurs.

3.3. Memory Feedback Utilization Method

To address the issue of insufficient effective feedback from VxWorks systems in a black-box environment, we drew inspiration from the characteristics of VxWorks tasks outlined in Section 2.3 and designed a feedback mechanism based on memory data changes of tasks. The core of this mechanism lies in monitoring memory data changes of network tasks, using them as a new coverage metric. Based on this, “interesting” test cases that can trigger new memory states are selected to be added to the seed queue, providing a foundation for subsequent heuristic Havoc mutations, thereby guiding the fuzz testing to explore potential new execution paths. This method mainly consists of two components: a task memory data extraction method (responsible for acquiring memory changes) and a heuristic Havoc mutation algorithm (responsible for generating new test cases through seed mutation).

3.3.1. Memory Data Extraction Method

In the VxWorks operating system, the task named tNet0 is responsible for executing network drivers and handling network protocols within the VxWorks network stack. When a TCP packet arrives, it triggers a state transition of tNet0, leading to a context switch of the task, and further causing corresponding changes in the contents of the task register set. Therefore, in the process of fuzz testing the TCP protocol, this paper monitors the PC register values of the tNet0 task. When the PC register value of the tNet0 task changes, the memory data of the task execution area is extracted. During fuzzing, we compare the task memory data extracted from two consecutive rounds of testing. When there are differences in the task memory data, we consider the test case of this round as “interesting” because this test case has altered the task memory data, which indirectly indicates that the test case may have covered a new task execution area. Therefore, we use the task memory data changes caused by test cases as a new coverage metric.

The task memory data extraction method implemented in this paper is shown in Algorithm 1. First, the task name is converted to a taskID, that is, the specific ID assigned by VxWorks to the tNet0 task is looked up (line 1). Then, the initial PC register value of the task is obtained through the tNet0 taskID (line 2). The PC register of tNet0 is then monitored for changes (lines 3–4). When the PC value differs from the initial value, 100 bytes of memory data starting from the PC address are extracted and written to a specified file (lines 5–7). If the PC register value does not change, the task is monitored for a certain period of time before continuing to monitor (line 9).

Algorithm 1: Task memory data extraction algorithm

Input: task name, taskName
Output: 100 bytes of memory data, memData
1:   taskID ← taskNameToId(taskName);
2:   init_pc ← taskRegsGet(taskID);
3:   while True do
4:           cur_pc ← taskRegsGet(taskID);
5:           if cur_pc is not equal to init_pc then
6:                      memData ← memcpy(cur_pc, 100);
7:                      memfwirte(memData, memfile);
8:           else
9:                      taskDelay();
10:          end if
11: end while

We applied the task memory data extraction algorithm in vxTcpFuzzer to obtain feedback information for each round of testing. First, before and after each round of testing, the task memory data of tNet0 is read once respectively, and we compare whether the content of the memory data has changed. If a change occurs, it will be further compared with the memory data after the previous round of testing. When the memory data from two test rounds are inconsistent, the test case of the current round is considered as one that we are interested in (i.e., it triggers a new memory state). In brief, changes in memory data are obtained by comparing the memory data of the task execution area extracted before and after each test round; its direct purpose is to identify and select “interesting” test cases. Finally, all “interesting” test cases will be added to the seed queue, serving as the basic input for subsequent heuristic Havoc mutations.

3.3.2. Heuristic Havoc Mutation

To perform continuous and efficient testing, we implement mutation operations on the seed cases in the seed queue. These seeds refer to the “interesting” test cases that have successfully triggered new changes in memory states, indicating that they may have explored new execution paths. In the phase where the protocol feature fusion fuzzer generates initial cases, the adopted generation strategy only mutates a certain field in the protocol each time. However, the conditions for triggering bugs may be complex. For example, it may require modifying different data fields in the same packet to trigger an exception. Therefore, the testing process needs the involvement of Havoc mutation. But the traditional Havoc mutation randomly selects some random fragments in a packet for mutation, which has strong blindness. To address this, this paper designs a heuristic Havoc mutation algorithm, which enables the Havoc mutation to focus on these valuable seeds selected through memory feedback and perform purposeful mutations based on them.

The overall workflow of the heuristic Havoc mutation algorithm is shown in Algorithm 2: First, traverse the seed queue, locate the abnormal fields and extract the abnormal values for each seed, and divide the abnormal values into corresponding lists according to the field types (line 1). Then map the field names to their corresponding abnormal value lists in a dictionary called Field_lists (line 2). Filter out the fields with non-empty abnormal value lists from Field_lists to form a new non-empty dictionary Noempty_fields (line 3). Extract all field names that need to be combined from Noempty_fields and store them in the list Fields (line 4). For each combination size (from 2 to the length of Fields), generate all possible field subset combinations (lines 5–6). For each field subset, obtain the corresponding abnormal value lists and calculate the Cartesian product of these abnormal value lists to generate all possible abnormal value combinations (lines 7–8). For each abnormal value combination, construct a new TCP packet and set the abnormal values for the field subset to generate a new test case (line 9). Finally, add the newly constructed test case to the test case set P (line 10).

Algorithm 2: Heuristic Havoc mutation algorithm

Input: Seed queue, S
Output: New test case set, P
1:   L ← excfield_position(S);
2:   Map the field name to the corresponding list L to the Field_lists
3:   Filter out the empty list in the Field_lists to get Noempty_fields
4:   Extract all field names from the Noempty_fields to Fields
5:   for each combination size comsize in between range 2 and len(Fields) do
6:             for each field combination fieldsubset in all subsequences of length comsize in Fields do
7:                      Get the list of exceptional values for each field in the current fieldsubset to Value_lists;
8:                      for each exceptional value combination values in Value_lists Cartesian product do
9:                               C ← build_newcase(fieldsubset, values);
10:                            add C to set P
11:                     end for
12:          end for
13: end for
14: return P

Algorithm 2 begins with the identification and extraction of abnormal fields from the seed queue. The detailed procedure is presented in Algorithm 3. For every seed in the queue (Line 1), we extract the abnormal value of each field and compare it with the corresponding default value (Lines 2–4). Whenever a mismatch is detected, the value is recorded as an anomaly and appended to the anomaly list of the corresponding field, and the current seed is skipped (Lines 5–7).

Algorithm 3: Abnormal field identification and extraction algorithm

Input: Seed queue, S
Output: Lists of abnormal value, L
1:    for each seed in S do
2:              for each field in seed do
3:                      value ← seed[field];
4:                      default ← get_default(field);
5:                      if value != default then
6:                                add value to L[field]
7:                                break
8:                      end if
9:              end for
10: end for
11: return L

Specifically, the heuristic Havoc mutation targets the seeds preserved during the fuzzing campaign. The processing method for the obtained seeds involves first locating the abnormal fields of the seeds. By sequentially comparing the values of each field in the seed with the default values, the abnormal fields causing changes in the memory data of the tNet0 task are positioned. After locating the abnormal fields of the seed, the values of these fields are extracted and stored in the corresponding abnormal value lists. The abnormal value lists are created separately according to the field types. Then, Cartesian-product-style iterative combinations are performed on the contents of the abnormal value lists to generate new test cases. The so-called Cartesian-product-style iterative combination refers to extracting combinations of multiple fields from these abnormal value lists to generate new test cases. First, mutations of combinations of every two fields are generated, followed by combinations of three fields, and so on, until combinations of all abnormal fields are covered. In this way, we explore the interactions between abnormal fields to produce test cases that can cover multiple field domains simultaneously. After iterative combination of the seed queue, a large number of new test cases can be generated. These test cases can conduct comprehensive coverage testing for scenarios where abnormalities are jointly triggered in different field domains, potentially triggering more complex bugs.

3.4. Dual Anomaly Detector

The detection of abnormal conditions in the test object is a crucial part of fuzz testing. Only by effectively and comprehensively monitoring the state of the devices under test (DUT), potential vulnerabilities in the DUT can be discovered in a timely manner. Given the certain limitations of the WDB detection mechanism analyzed in Section 2.4, we have improved the WDB detection mechanism and combined it with the client heartbeat detection mechanism to jointly implement the dual anomaly detection mechanism proposed in this paper.

As shown in Figure 6, the dual abnormal detection mechanism consists of two parts: the WDB detection mechanism and the client heartbeat detection mechanism. First is the WDB detection mechanism implemented using the WDB RPC protocol of VxWorks. This mechanism acts as a HOST agent to establish a connection with VxWorks. It first uses the WDB_TARGET_CONNECT2 function in the WDB RPC protocol to create a connection request packet to achieve connection establishment. Then, after each round of testing, the WDB detection mechanism uses the WDB_EVENT_GET function to create an abnormal detection packet and send a detection request to VxWorks. Through the detection request packet, it can detect whether there is a task exception in VxWorks. For the client heartbeat detection mechanism, we did not create a separate client to specifically monitor the network communication of the test target. Instead, we cleverly used the activation corpus in the protocol state activation phase described in Section 3.1. Specifically, before testing a certain state of TCP in each round of test cases, the activation corpus is used to activate TCP to the state to be tested. Suppose the test case of this round causes the network communication of the test target to crash, and then the protocol state activation before the start of the next round of testing will fail because no response is received. We use this feature: if the protocol state activation fails for 3 consecutive times, it is considered that the network service of the DUT has crashed. Immediately record the crash information and the corresponding test case at this time.

When the WDB detection mechanism or client heartbeat detection mechanism identifies an anomaly in the test target, a similar anomaly handling procedure is initiated. However, discrepancies exist in implementation details, with the anomaly handling process primarily categorized into two aspects: anomaly information preservation and test environment restoration.

For the WDB detection mechanism, the test case triggering the anomaly is first saved, accompanied by recording the corresponding anomaly information. Subsequently, the WDB_REGS_GET and WDB_MEM_READ functional APIs are employed to extract and save the register set and memory data at the breakpoint of the abnormal task, respectively, to facilitate subsequent failure analysis. Finally, the WDB_CONTEXT_KILL function is invoked to initiate a hot restart of VxWorks, enabling the system to restore the test environment via reboot.

For the client heartbeat detection mechanism, the anomaly information preservation process mirrors that of the WDB mechanism: the triggering test case is saved alongside corresponding log records. Notwithstanding, the test environment restoration differs in implementation. For the VxWorks system installed on a virtual machine, the client heartbeat detection mechanism uses a forced restart of the virtual machine file. For the VxWorks system deployed on a development board, it uses a power-off and then power-on restart method.

4. Implementation and Evaluation

In this section, we explain the fuzzing experiments conducted on multiple VxWorks devices using vxTcpFuzzer. Furthermore, we compare it with several benchmark tools to further evaluate the effectiveness of the framework.

4.1. Experimental Setup

The experimental setup included the following:

Environment Configuration. To better monitor network communications, all devices under test (DUTs) were directly connected to a local PC. Our fuzzing framework was executed on a Windows 10 desktop PC equipped with an AMD Ryzen 7 5700U with Radeon Graphics 1.80 GHz CPU and 16 GB of RAM.

Devices Under Test. First, we installed two versions of the VxWorks system, namely VxWorks6.6 and VxWorks6.9, on virtual machines with Pentium series CPUs. VxWorks 6.6 and 6.9 represent two major versions of VxWorks that are widely deployed in the embedded field, and these two versions have been confirmed to potentially have security vulnerabilities [35]. Additionally, to test a more realistic network environment, we ported VxWorks6.9 to a ZYNQ development board with a Cortex-A9 CPU, simulating a VxWorks device operating in a real-world environment. The Cortex-A9-based ZYNQ platform was chosen as the target for porting because VxWorks 6.9 provides support for this platform, and it is representative in terms of usage in the embedded field [36]. For convenience, we refer to the VxWorks devices installed on the virtual machine and the ZYNQ development board as VxWorks6.9 and VxWorks6.9_z7, respectively.

Test Service Target. The VxWorks operating system provides TCP services on multiple ports. In our testing experiments, we selected the FTP server available on port 21. The default number of concurrent connections for the VxWorks FTP server is eight. Therefore, after each test case, our fuzzer sends a TCP packet with the RST flag to actively release the connection, preventing subsequent test cases from failing due to exhausted connection resources.

Baseline Tool. To further verify the performance of the proposed framework in terms of crash detection and test case generation, we compared it with three other network communication-based fuzzing tools as benchmarks.

We performed fuzz testing on the three DUTs using a certain number of test cases and repeated the process five times to eliminate randomness. Each experiment was run independently without interference from others.

4.2. Runtime Testing

Table 3 presents the fuzzing results of vxTcpFuzzer on three VxWorks devices, including the number of memory changes, average test case acceptance rate (ATCAR), average test system abnormal rate (ATSAR), and the number and types of crashes detected. The ATCAR for VxWorks6.6 is over 44%, while, for VxWorks6.9 and VxWorks6.9_z7, it is above 50%. The specific definitions and calculations of test case acceptance rate (TCAR) and test system abnormal rate (TSAR) will be elaborated in Section 4.3, including the detailed TCAR and TSAR data obtained during the fuzz testing process. The other test results presented in Table 3 will be analyzed in detail from the following three aspects: memory data changes, vulnerability identification, and the performance of the dual anomaly detection mechanism.

4.2.1. Memory Data Changes

In vxTcpFuzzer, we propose the number of task-memory data changes as a novel coverage metric. The more memory data changes occur, the more likely it is that the test cases cover new task execution areas. Consequently, the growth trend of memory data changes reflects both the effectiveness of individual test cases and the overall efficiency of the fuzzing campaign. Figure 7 illustrates this trend for vxTcpFuzzer while fuzzing three VxWorks devices.

As can be seen from the figure, the number of memory changes for all three VxWorks devices continues to increase. As the number of test cases grows, the number of different task execution branches covered by vxTcpFuzzer also keeps rising. In Figure 7, it can be observed that the memory data changes for VxWorks6.6 stop at 80,000 test cases. This is because, during the fuzzing process of VxWorks6.6, vxTcpFuzzer can generate over 80,000 test cases on average, while, for VxWorks6.9 and VxWorks6.9_z7, it can generate more than 100,000 test cases. The same situation will be followed in the subsequent analysis.

4.2.2. Vulnerability Identification

After conducting multiple fuzzing tests on the TCP protocol of three VxWorks devices using vxTcpFuzzer, we detected a total of six crashes. As shown in Table 3, four crashes were detected in VxWorks6.6, with the corresponding vulnerability types being integer overflow and denial of service (DoS). One crash was detected in each of VxWorks6.9 and VxWorks6.9_z7, with the corresponding vulnerability type being DoS. We further manually verified these crashes and found that they all occurred when the TCP service of the test system was in a specific state and received test inputs with abnormal fields. Through analysis of the crashes, we discovered that the DoS-type vulnerabilities in VxWorks6.6, VxWorks6.9, and VxWorks6.9_z7 were caused by the same error:

Integer Overflow. As shown in Table 3, during the testing process of VxWorks6.6, a total of four crashes were triggered, three of which were caused by integer overflow. The manifestation was that, after sending test cases to VxWorks6.6, the system task crashed and was unable to perform any TCP connection interactions. Figure 8 displays the system output of VxWorks6.6 when these three crashes were triggered. Figure 9 presents the test cases that triggered these three crashes. Through analyzing the system outputs and test cases, we found that the condition for triggering this crash is as follows: when the TCP is in the ESTABLISHED state, receiving a test case with the URG flag in the flag field, an urgent pointer field of 0, and carrying a large amount of payload, those conditions will cause the crash (corresponding to the part in red font in Figure 9).

Specifically, the urgent pointer field in the TCP protocol is used to identify the position of urgent data in the data stream. However, VxWorks6.6 triggered an integer underflow when processing the URG flag in the TCP packet due to the urgent pointer being equal to 0. During the testing of the ESTABLISHED state of the TCP protocol, vxTcpFuzzer’s heuristic Havoc mutation method generated test cases with the URG flag and an urgent pointer of 0, thereby triggering the corresponding crashes. Finally, by combining the information from the test system crashes, we confirmed that the integer overflow vulnerability causing these three crashes was the CVE-2019-12255 vulnerability. The vulnerability has been confirmed to permit remote code execution across VxWorks 6.5–6.9 and early releases of VxWorks 7, affecting more than two billion embedded devices deployed in industrial control, medical, and networking equipment. Given VxWorks’ ubiquity in mission-critical domains—including avionics, aerospace, and industrial automation—a successful exploit can crash core network tasks and render the entire system incapable of network communication. In scenarios that rely on real-time data delivery (e.g., flight control and industrial process monitoring), such a disruption may trigger cascading failures or lead to complete loss of system control.

Another type of crash was found:

Denial of Service (DoS). During the testing process of VxWorks6.6, VxWorks6.9, and VxWorks6.9_z7, each experienced one crash of the DoS type. Through analysis, it was determined that these crashes in the three VxWorks devices were caused by the same DoS vulnerability. Figure 10 presents the test case that caused this DoS incident. Through analyzing the content of the test case and the response messages from the VxWorks system, we identified that this DoS vulnerability is triggered when the VxWorks system receives a test case containing illegal TCP options after a normal TCP connection has been established. Specifically, when the test case received by the tested VxWorks system contains a WSOPT option with an empty data field in the option field, the test system determines that there is an illegal option. As a result, it actively resets and disconnects the current TCP connection, causing a DoS. During the testing of the TCP state after establishing a connection, vxTcpFuzzer’s dual-layer composite mutation strategy generates test cases with option content consisting of five groups of empty WSOPT options, thereby triggering the DoS vulnerability in the VxWorks test system. Finally, by analyzing the output information of the test system when the DoS vulnerability occurred, the DoS vulnerability was confirmed to be the CVE-2019-12258 vulnerability.

Although this vulnerability only causes a single connection to be reset (rather than the crash of an entire task), its impact is equally non-negligible. In VxWorks-powered embedded devices (e.g., medical equipment and network infrastructure), frequent triggering of such vulnerabilities can render the core TCP services provided by the devices (such as remote monitoring, data uploading, and firmware updating) unreliable. More alarmingly, attackers may exploit this vulnerability to launch low-cost DoS attacks. By continuously sending malicious packets to exhaust limited connection resources, they can completely block legitimate users from accessing device services, thereby achieving a DoS effect. In critical infrastructure or public service networks, this could serve as a prelude to or a component of larger-scale attacks, with an impact scope that may far exceed individual devices. This is consistent with attack patterns such as those of the Mirai botnet.

4.2.3. Performance of the Dual Anomaly Detector

We have improved the WDB detection mechanism. We have skillfully integrated it with the client heartbeat detection mechanism to achieve a dual anomaly detection mechanism. As described in Section 3.4, the WDB detection mechanism may miss some abnormal situations. Therefore, this paper compensates for this defect by combining it with the client heartbeat detection mechanism, and facts have proven that this measure is necessary. Table 4 shows the number of anomalies detected by the WDB detection mechanism and the client heartbeat detection mechanism during the fuzzing of three types of VxWorks devices.

As shown in Table 4, during the testing of the three types of VxWorks devices, the vast majority of crashes were successfully detected by the WDB detection mechanism. However, in the case of VxWorks6.6, one crash instance was missed by the WDB mechanism. This anomaly was successfully captured by the client heartbeat detection mechanism. The crash scenario illustrated in Figure 11 corresponds to the system output at the moment the anomaly was detected by the heartbeat mechanism. In this case, the WDB detection mechanism failed to identify the crash, even though the VxWorks network service had already crashed and was in an unresponsive state.

From the output information shown in Figure 11, it can be inferred that the root cause of the anomaly was memory corruption in the network task, which led to abnormal network communication. VxWorks’ internal exception handling mechanism was unable to capture this type of error, which in turn prevented the WDB detection mechanism from recognizing the anomaly. In contrast, the client heartbeat detection mechanism was able to detect the crash. Once the number of failed network communication attempts reached a predefined threshold, the heartbeat mechanism triggered the corresponding exception handling procedures, including logging the anomaly and restoring the test environment.

While the WDB detection mechanism is efficient in detecting anomalies in VxWorks, it is possible for it to miss some anomalies. Therefore, combining it with the client heartbeat detection mechanism is necessary and effective.

4.3. Comparison with Benchmark Tools

To further verify the performance of vxTcpFuzzer in terms of test case generation and crash detection, we selected three different fuzzing schemes as benchmarks:

Boofuzz-chksum: Boofuzz [19] is an excellent network protocol fuzzer, an improved version based on the Sulley framework. It supports manually defined protocol tree structures as input for continuous test generation. Therefore, we can utilize its protocol definition method to implement the definition of the TCP protocol, thereby generating test cases for fuzz testing targeting the TCP. However, when generating test cases, since Boofuzz does not provide a TCP checksum algorithm, all the test cases it generates will be discarded during the initial checksum verification phase, failing to achieve the actual testing effect. To enable the test cases generated by Boofuzz to pass the initial checksum phase, we added a TCP checksum algorithm module to the original Boofuzz, which is denoted as Boofuzz-chksum.
Netzob-generation: Netzob [37] is a protocol reverse analysis tool developed by Bossert et al. It can infer message formats and state machines through passive/active methods and generate test cases based on the inferred protocol model for fuzzing. Netzob-generation uses an active definition-based message format generation algorithm for fuzzing. Similar to Boofuzz, we can use the method of actively defining protocol messages provided by Netzob to implement the format definition of the TCP protocol, thus generating TCP-compliant test cases for fuzz testing targeting TCP.
Netzob-mutation: Netzob-mutation is a TCP fuzzing scheme we implemented using another method provided in Netzob, which passively infers message formats and state machines. First, Netzob is used to reversely infer message formats and state machines using captured TCP traffic. Then, mutation algorithms are applied to mutate the inferred results, thereby generating TCP test cases for fuzzing.

There are also many advanced fuzz testing tools capable of testing protocols through network communication, such as Peach and AFLNET. However, since they are gray-box fuzzers that require access to the system firmware, it is neither feasible nor fair to use these tools as baselines for black-box solutions.

We will compare the efficiency of vxTcpFuzzer with these three baseline approaches in the following three aspects: test case acceptance rate, test system abnormal rate, and found bugs.

4.3.1. Test Case Acceptance Rate

The Test Case Acceptance Rate (TCAR) is generally defined as the proportion of test cases successfully executed by the system under test (SUT) out of the total number of test cases. During the fuzzing process, a low TCAR often indicates that a large number of test cases are not executed by the SUT. This can result in insufficient coverage of the target system, failing to comprehensively test all functions and boundary conditions of the target system. During the fuzz testing of the network protocol in VxWorks, it is impossible to directly obtain information on whether the test cases are executed by VxWorks. Therefore, we regard the test cases with response messages as those successfully executed by VxWorks. Conversely, test cases without response messages are considered as not being successfully executed. However, this evaluation method is not comprehensive. This is because test cases that are successfully executed do not necessarily generate responses. The TCAR calculation formula used in this paper is shown in Equation (1), where N_response represents the total number of test cases with response messages, and N_all represents the total number of test cases received by VxWorks. In our fuzzing process, we calculate the TCAR every 5000 test cases.

TCAR = N_response/N_all × 100%

(1)

As shown in Figure 12, Figure 13 and Figure 14, the TCAR of vxTcpFuzzer, Boofuzz-chksum, Netzob-generation, and Netzob-mutation during the fuzz testing of three VxWorks devices are recorded. Figure 12 represents the test case acceptance rate of VxWorks6.6. The average acceptance rates of Boofuzz-chksum, Netzob-generation, and Netzob-mutation are 4.74%, 8.69%, and 20.87%, respectively, while that of vxTcpFuzzer is 44.9%. Figure 13 shows the test case acceptance rate of VxWorks6.9. The average acceptance rates of Boofuzz-chksum, Netzob-generation, and Netzob-mutation are 4.96%, 8.82%, and 23.80%, respectively, and the average acceptance rate of vxTcpFuzzer is 53.9%. Figure 14 depicts the test case acceptance rate of VxWorks6.9_z7. The average acceptance rates of Boofuzz-chksum, Netzob-generation, and Netzob-mutation are 4.96%, 8.82%, and 23.80%, respectively, and the average acceptance rate of our vxTcpFuzzer is 54.9%.

After comparison, the TCAR of vxTcpFuzzer is generally higher than the other three fuzz testing methods. In the test case generation phase, vxTcpFuzzer extracts the field feature information of the TCP and forms a feature vector. It then matches the corresponding mutation strategy based on the feature vector. This allows the generated test cases to retain a large number of syntactic and semantic features, thereby increasing the probability of the test cases being valid. Boofuzz-chksum can quickly construct and generate test cases. However, due to its coarse-grained protocol definition rules and the randomness and blindness of its test case generation method, it produces a large number of invalid test cases. Netzob-generation, which is based on manually defined protocol formats, has more refined protocol definition rules than Boofuzz-chksum. Therefore, its TCAR is slightly higher than Boofuzz-chksum. However, its test case generation algorithm is still random and blind. Netzob-mutation, which is based on reverse analysis, can perform detailed segmentation and clustering of TCP traffic. It can obtain a more comprehensive protocol format and state machine, resulting in a higher TCAR. However, there may be some errors in the reverse analysis results. Moreover, the test case mutation algorithm is not highly integrated with the protocol features. Therefore, its TCAR is still lower than that of vxTcpFuzzer.

4.3.2. Test System Abnormal Rate

The test system anomaly refers to the phenomenon in fuzzing where test cases violating TCP protocol specifications are incorrectly accepted as valid by VxWorks. Ideally, erroneous or anomalous test cases should not be processed by VxWorks; instead, they should trigger connection termination via RST-flagged TCP responses. However, when logical anomalies exist in the test system, some invalid test cases may still be executed, leading to potential security vulnerabilities. A higher TSAR indicates greater fuzzing efficiency. In this study, we calculate the number of anomalous test cases by subtracting the count of RST-flagged responses from the total number of test cases that generate responses. As shown in Equation (2), where N_response represents the total number of test cases with response messages, N_rst represents the total number of test cases with RST responses, and N_all represents the total number of test cases received by VxWorks. In our fuzzing process, we calculate the TSAR every 5000 test cases.

TSAR = (N_response − N_rst)/N_all × 100%

(2)

During fuzzing, the test system abnormal rate serves as an intuitive metric for evaluating fuzzer performance. An increased anomaly rate indicates that the fuzzer can more effectively uncover potential vulnerabilities in the system, thereby signifying superior fuzzer performance. In Table 5, Table 6 and Table 7, we present the TSAR of vxTcpFuzzer, Boofuzz-chksum, Netzob-generation, and Netzob-mutation on three VxWorks devices during the testing process.

Table 5, Table 6 and Table 7 present the test system anomaly rate for vxTcpFuzzer, Boofuzz-chksum, Netzob-generation, and Netzob-mutation across three VxWorks devices. On the three VxWorks devices, the average test system anomaly rates are 23.79%, 31.83%, and 34.70% for vxTcpFuzzer, 0.57%, 0.72%, and 0.73% for Boofuzz-chksum, 2.99%, 3.03%, and 3.03% for Netzob-generation, and 7.61%, 10.26%, and 10.26% for Netzob-mutation. Overall, our approach (vxTcpFuzzer) demonstrates significantly higher TSAR compared to the other three fuzzing methods. This advantage is attributed to vxTcpFuzzer’s capability to conduct fuzz testing across all TCP server-side states. Concurrently, it enables real-time tracking of TCP state transitions during testing, facilitating comprehensive evaluation of state stability and transition integrity.

As shown in Table 5, Table 6 and Table 7, vxTcpFuzzer exhibits a decline in TSAR during the final testing phase for each VxWorks device. This is attributed to the fuzzer’s focus on the LAST_ACK state—the terminal state of TCP connection termination—during this phase. Since most test cases targeting this state yield no response messages, the TSAR metric naturally decreases. Nevertheless, this observation further validates the capability of our approach to perform multi-state fuzzing across the TCP lifecycle.

4.3.3. Found Bugs

To ensure a fair comparison of vulnerability detection capabilities, we compared the tools under identical testing conditions:

Test case scale: All tools tested 80,000 cases on VxWorks 6.6 and 100,000 cases each on VxWorks 6.9 and VxWorks 6.9_z7 (consistent with vxTcpFuzzer);
Anomaly detection mechanism: The benchmark tools uniformly adopted the dual anomaly detection mechanism of vxTcpFuzzer (WDB detection + heartbeat detection);
Testing targets: All targeted the TCP services exposed by the three types of VxWorks devices.

Under the same testing scale and anomaly monitoring mechanism, none of the three benchmark tools detected any crashes, while vxTcpFuzzer detected a total of two potential errors, corresponding to integer overflow and DoS vulnerabilities.

Boofuzz-chksum and Netzob-generation rely on manually defined protocol formats based on fixed rules, resulting in rigid test-case generation strategies. They also lack state-tracking capabilities, restricting testing to a single TCP state and yielding insufficient coverage.

Netzob-mutation leverages protocol-inference algorithms to model TCP message structures and state machines, enabling state-aware fuzzing. However, limitations in its reverse-engineering algorithms impede effective clustering of variable-length fields (e.g., TCP options), while the absence of feedback-driven optimization prevents iterative model refinement. Consequently, it overlooks numerous protocol formats and state transitions.

4.4. Evaluation of Memory Feedback Utilization

The vxTcpFuzzer we designed and implemented is capable of fuzzing each server state of the TCP independently. For each state, the fuzzing process is divided into two phases. The first phase is the initial cases testing phase, which utilizes the protocol feature fusion fuzzer to generate initial test cases for testing. The second phase is the Havoc mutation testing phase, which mutates the seeds retained from the initial testing phase using heuristic Havoc strategies and then performs further testing. We use the number of test cases triggering memory data changes during these two phases for each protocol state as the evaluation metric for the memory feedback utilization module. In other words, we compare the number of interesting test cases (i.e., those that induce memory data changes) across the two phases.

Because the interesting test cases generated in the first phase serve as seeds for the second phase Havoc mutations, an increase in the number of interesting test cases in the second phase demonstrates the effectiveness of the memory feedback module during fuzzing.

Figure 15, Figure 16 and Figure 17 record, for each TCP state and for all three VxWorks devices, the number of interesting test cases preserved in the two phases. As shown, the second phase consistently produces more interesting test cases than the first phase for every state on all three devices. This is attributed to the heuristic Havoc mutation method employed in the second phase, which uses the interesting cases from the first phase as seeds. The method first locates the anomalous fields within these seeds and then generates new test cases through Cartesian-product-style iterative combinations of these fields. Consequently, each new test case contains at least two anomalous field values. These results further confirm the effectiveness of our memory feedback module.

5. Discussion

The vxTcpFuzzer has been successfully tested on three different VxWorks devices, revealing two potential security vulnerabilities. However, the efficiency and scalability of the framework still have certain limitations. In this section, we mainly discuss the limitations of the framework and propose possible solutions for future work:

Manual Protocol Feature Extraction. In the framework of this paper, the protocol feature fusion fuzzer relies on manual extraction and a certain level of TCP protocol knowledge in the protocol field feature extraction part. The manual extraction work mainly includes the manual analysis and extraction of the relevant features of each field in the TCP protocol header. It requires a specific understanding of the basic structure and function of each field in TCP and the ability to use programming to describe the feature information of each field in the form of corresponding vectors. This part of the work is the foundation and beginning of the entire fuzz testing and to some extent determines the efficiency of the entire fuzz testing. Therefore, manual extraction increases the workload of the fuzzing framework and may affect the effectiveness of fuzz testing. To remove this limitation, we will consider using LLM models to automatically extract protocol field features in future work.

Test Protocol Scope. The fuzzing framework implemented in this paper currently only tests the TCP protocol of the VxWorks system, covering a limited range of protocols. VxWorks has a powerful networking system that provides users with a variety of network protocols, including IP, ICMP, RIP, FTP, Telnet, HTTP, and DNS. Therefore, we plan to further expand the types of protocols that the framework can test to provide a multi-protocol coverage fuzzing framework for the VxWorks network structure.

Potential Risks and Balancing Strategies. In mission-critical real-time systems (such as VxWorks), aggressive fuzz testing strategies may introduce additional risks. For instance, high-frequency abnormal inputs could lead to system resource exhaustion or service interruptions, undermining the scheduling stability of real-time tasks; the triggering of certain vulnerabilities might cause irreversible system crashes, resulting in severe consequences in scenarios like aerospace and medical care. Thus, in practical deployment, it is necessary to balance testing depth against system reliability. This can be achieved through measures such as limiting testing rates, utilizing test environments isolated from production environments, or adopting progressive load injection strategies. Future work may explore dynamic adjustment mechanisms that adaptively modify testing intensity based on the real-time state of the system, aiming to balance security and availability.

Limitations and Future Work. The fuzzing framework in this paper can be enhanced in several aspects. First, as described in Section 3.3, the new coverage metric used in this framework judges by comparing whether the task memory data changes before and after each round of testing. This is a relatively coarse and direct use of memory change information. Therefore, more fine-grained analysis can be conducted, such as analyzing program code through memory data to obtain specific coverage path information. Second, in the protocol feature fusion fuzzing stage of this framework, the extraction of protocol features requires manual analysis and is not fully automated. To address this issue, we plan to try methods such as [38,39] to achieve automated protocol feature extraction and fusion. Finally, the current framework only targets the TCP protocols in the VxWorks system. We therefore plan to further extend it to other protocols in the VxWorks system, including various application layer protocols provided by VxWorks.

6. Conclusions

In this paper, we present vxTcpFuzzer, a TCP fuzzing framework specifically designed for VxWorks operating systems in black-box environment. Unlike conventional black-box network fuzzers, vxTcpFuzzer leverages memory data changes during VxWorks network task execution to establish a feedback-driven mechanism that guides the fuzzing process. Additionally, vxTcpFuzzer analyzes and extracts field features of the protocol, matching different generation strategies based on these features, thereby enabling the generation of test cases that highly conform to syntactic rules. Moreover, vxTcpFuzzer can activate and track the state changes of the TCP, performing fuzzing with multi-state coverage of the protocol. We tested vxTcpFuzzer on three VxWorks system devices, and it successfully detected potential vulnerabilities in the devices, verifying the effectiveness of the method.

This work increases the efficiency of vulnerability discovery in the TCP stack of VxWorks and, more importantly, provides an immediate benefit to the security of widely deployed IoT devices. As the operating system in critical IoT devices—industrial control systems, medical equipment, and network infrastructure—VxWorks demands a robust network stack. vxTcpFuzzer is an efficient, lightweight tool that actively uncovers deep bugs such as CVE-2019-12255 and CVE-2019-12258. Exploitation of these flaws can trigger denial-of-service, enable remote control, or turn devices into stepping stones for botnets like Mirai. By furnishing a practical technique for hardening IoT infrastructure, vxTcpFuzzer strengthens resilience against cyber attacks.

Author Contributions

Conceptualization, Y.W. and J.H.; methodology, Y.W.; software, J.H. and X.H.; validation, Y.W., J.H., and X.H.; formal analysis, J.H.; investigation, J.H. and X.D.; resources, Y.W. and X.H.; data curation, Y.W. and J.H.; writing—original draft preparation, J.H.; writing—review and editing, J.H. and X.D.; visualization, J.H.; supervision, Y.W.; project administration, Y.W.; funding acquisition, Y.W. and X.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research work is supported by the National Natural Science Founds of China (U2468206,62302389) and the Key Research and Development Program of Shaanxi Province (2022CGKC-09).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kostas, K.; Just, M.; Lones, M.A. IoTDevID: A behavior-based device identification method for the IoT. IEEE Internet Things J. 2022, 9, 23741–23749. [Google Scholar] [CrossRef]
Chui, M.; Collins, M.; Patel, M. The Internet of Things: Catching up to an Accelerating Opportunity; McKinsey & Company: New York, NY, USA, 2021. [Google Scholar]
Affinito, A.; Zinno, S.; Stanco, G.; Botta, A.; Ventre, G. The evolution of Mirai botnet scans over a six-year period. J. Inf. Secur. Appl. 2023, 79, 103629. [Google Scholar] [CrossRef]
Micro, T. Smart Yet Flawed: IoT Device Vulnerabilities Explained. Secur. News, Trend Micro Inc., Irving, TX, USA, Tech. Rep 2020. Available online: https://www.trendmicro.com/vinfo/hk-en/security/news/internet-of-things/smart-yet-flawed-iot-device-vulnerabilities-explained (accessed on 8 August 2025).
Nordrum, A. Popular internet of things forecast of 50 billion devices by 2020 is outdated. IEEE Spectr. 2016, 18, 223–236. [Google Scholar]
Travis, F.J.M. Secure Interface Improvements Internet of Things (IoT) Vendors Need to Protect Smart Home IoT Devices from Cyber Attacks. Ph.D. Thesis, University of the Cumberlands, Williamsburg, KY, USA, 2023. [Google Scholar]
More, S.; Mukhede, S.; Deshmukh, M.M. Comparative Analysis of Embedded Operating Systems: A Criteria-Based Evaluation. Int. J. Eng. Technol. Manag. Sci. 2024, 1, 34–41. [Google Scholar]
Formaggio, Y. Attacking VxWorks: From Stone Age to Interstellar. 44CON Cyber Security 2015. Available online: https://www.youtube.com/watch?v=T6N-87GlmsI (accessed on 8 August 2025).
Bishop, S.; Fairbairn, M.; Norrish, M.; Sewell, P.; Smith, M.; Wansbrough, K. Rigorous specification and conformance testing techniques for network protocols, as applied to TCP, UDP, and Sockets. In Proceedings of the 2005 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, Philadelphia, PA, USA, 22–26 August 2005; pp. 265–276. [Google Scholar]
Edwards, A.; Muir, S. Experiences implementing a high performance TCP in user-space. ACM SIGCOMM Comput. Commun. Rev. 1995, 25, 196–205. [Google Scholar] [CrossRef]
Zou, Y.-H.; Bai, J.-J.; Zhou, J.; Tan, J.; Qin, C.; Hu, S.-M. {TCP-Fuzz}: Detecting memory and semantic bugs in {TCP} stacks with fuzzing. In Proceedings of the 2021 USENIX Annual Technical Conference (USENIX ATC 21), Santa Clara, CA, USA, 14–16 July 2021; pp. 489–502. [Google Scholar]
Lockefeer, L.; Williams, D.M.; Fokkink, W. Formal specification and verification of TCP extended with the Window Scale Option. Sci. Comput. Program. 2016, 118, 3–23. [Google Scholar] [CrossRef]
Chen, Q.A.; Qian, Z.; Jia, Y.J.; Shao, Y.; Mao, Z.M. Static detection of packet injection vulnerabilities: A case for identifying attacker-controlled implicit information leaks. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA, 12–16 October 2015; pp. 388–400. [Google Scholar]
Kothari, N.; Mahajan, R.; Millstein, T.; Govindan, R.; Musuvathi, M. Finding protocol manipulation attacks. In Proceedings of the ACM SIGCOMM 2011 Conference, Toronto, ON, Canada, 15–19 August 2011; pp. 26–37. [Google Scholar]
Oehlert, P. Violating assumptions with fuzzing. IEEE Secur. Priv. 2005, 3, 58–62. [Google Scholar] [CrossRef]
Miller, B.P.; Fredriksen, L.; So, B. An empirical study of the reliability of UNIX utilities. Commun. ACM 1990, 33, 32–44. [Google Scholar] [CrossRef]
Muench, M.; Stijohann, J.; Kargl, F.; Francillon, A.; Balzarotti, D. What You Corrupt Is Not What You Crash: Challenges in Fuzzing Embedded Devices. In Proceedings of the NDSS, Montreal, QC, Canada, 3–8 December 2018. [Google Scholar]
Zheng, Y.; Davanian, A.; Yin, H.; Song, C.; Zhu, H.; Sun, L. {FIRM-AFL}:{High-Throughput} greybox fuzzing of {IoT} firmware via augmented process emulation. In Proceedings of the 28th USENIX Security Symposium (USENIX Security 19), Santa Clara, CA, USA, 14–16 August 2019; pp. 1099–1114. [Google Scholar]
JTPEREYDA. Boofuzz: Network Protocol Fuzzing for Humans. Available online: https://github.com/jtpereyda/boofuzz (accessed on 28 June 2025).
Luo, Z.; Zuo, F.; Jiang, Y.; Gao, J.; Jiao, X.; Sun, J. Polar: Function code aware fuzz testing of ics protocol. ACM Trans. Embed. Comput. Syst. (TECS) 2019, 18, 1–22. [Google Scholar] [CrossRef]
Pham, V.-T.; Böhme, M.; Roychoudhury, A. Aflnet: A greybox fuzzer for network protocols. In Proceedings of the 2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST), Porto, Portugal, 24–28 October 2020; pp. 460–465. [Google Scholar]
Chen, J.; Diao, W.; Zhao, Q.; Zuo, C.; Lin, Z.; Wang, X.; Lau, W.C.; Sun, M.; Yang, R.; Zhang, K. IoTFuzzer: Discovering memory corruptions in IoT through app-based fuzzing. In Proceedings of the NDSS, Montreal, QC, Canada, 3–8 December 2018; pp. 1–15. [Google Scholar]
Luo, Z.; Yu, J.; Zuo, F.; Liu, J.; Jiang, Y.; Chen, T.; Roychoudhury, A.; Sun, J. Bleem: Packet sequence oriented fuzzing for protocol implementations. In Proceedings of the 32nd USENIX Security Symposium (USENIX Security 23), Anaheim, CA, USA, 9–11 August 2023; pp. 4481–4498. [Google Scholar]
Eddington, M. Peach Fuzzing Platform. Available online: https://gitlab.com/gitlab-org/securi-ty-products/protocol-fuzzer-ce (accessed on 28 June 2025).
Shu, Z.; Yan, G. IoTInfer: Automated Blackbox Fuzz Testing of IoT Network Protocols Guided by Finite State Machine Inference. IEEE Internet Things J. 2022, 9, 22737–22751. [Google Scholar] [CrossRef]
Yu, Z.; Wang, H.; Wang, D.; Li, Z.; Song, H. CGFuzzer: A Fuzzing Approach Based on Coverage-Guided Generative Adversarial Networks for Industrial IoT Protocols. IEEE Internet Things J. 2022, 9, 21607–21619. [Google Scholar] [CrossRef]
Luo, Z.; Yu, J.; Du, Q.; Zhao, Y.; Wu, F.; Shi, H.; Chang, W.; Jiang, Y. Parallel Fuzzing of IoT Messaging Protocols Through Collaborative Packet Generation. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2024, 43, 3431–3442. [Google Scholar] [CrossRef]
RFC 793: TCP (Transmission Control Protocol). Available online: https://www.rfc-editor.org/rfc/rfc793 (accessed on 28 June 2025).
Liu, P.; Lu, J.; Huang, S.; Lu, P.; Wang, J. Real-time performance analysis of network buffer under multi-core scheduling platform. Multimed. Tools Appl. 2023, 82, 34653–34677. [Google Scholar] [CrossRef]
Feng, X.; Sun, R.; Zhu, X.; Xue, M.; Wen, S.; Liu, D.; Nepal, S.; Xiang, Y. Snipuzz: Black-box fuzzing of iot firmware via message snippet inference. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, New York, NY, USA, 15–19 November 2021; pp. 337–350. [Google Scholar]
Li, R. Computer embedded automatic test system based on VxWorks. Int. J. Embed. Syst. 2022, 15, 183–192. [Google Scholar] [CrossRef]
Zheng, W.; Zhou, Y.; Wang, B. Design and Implementation of VxWorks System Vulnerability Mining Framework Based on Dynamic Symbol Execution. In Proceedings of the 9th International Conference on Computer Engineering and Networks, Hefei, China, 17–19 October 2020; pp. 801–811. [Google Scholar]
RFC 1323: TCP Extensions for High Performance. Available online: https://www.rfc-editor.org/rfc/rfc1323 (accessed on 28 June 2025).
RFC 5925: The TCP Authentication Option. Available online: https://www.rfc-editor.org/rfc/rfc5925 (accessed on 28 June 2025).
11 Zero Day Vulnerabilities Impacting Billions of Mission-Critical Devices. Available online: https://www.armis.com/research/urgent-11 (accessed on 28 June 2025).
Zynq-7000 SoC Data Sheet: Overview. Available online: https://docs.amd.com/v/u/en-US/ds190-Zynq-7000-Overview (accessed on 28 June 2025).
Bossert, G.; Guihéry, F.; Hiet, G. Netzob: Protocol Reverse Engineering, Modeling and Fuzzing. Available online: https://github.com/netzob/netzob (accessed on 28 June 2025).
Yan, H.; Li, X.; Dai, R.; Li, H.; Zhao, X.; Li, F. MARS: Automated protocol analysis framework for internet of things. IEEE Internet Things J. 2022, 9, 18333–18345. [Google Scholar] [CrossRef]
Zhao, S.; Yang, S.; Wang, Z.; Liu, Y.; Zhu, H.; Sun, L. Crafting Binary Protocol Reversing via Deep Learning With Knowledge-Driven Augmentation. IEEE/ACM Trans. Netw. 2024, 32, 5399–5414. [Google Scholar] [CrossRef]

Figure 1. TCP packet header.

Figure 2. TCP state machine model.

Figure 3. Workflow of vxTcpFuzzer.

Figure 4. An example of implementing the timing correlation calculation strategy for sequence numbers and acknowledgment numbers when testing the SYN_RCVD state (In the figure, the “…” indicates the omitted content of other fields in the TCP message).

Figure 5. An example of the process for implementing the two-layer composite mutation strategy on the MSS option.

Figure 6. Dual anomaly detection mechanism structure.

Figure 7. Trend of memory data changes in the testing process of three VxWorks devices.

Figure 8. System output of VxWorks6.6 during integer overflow.

Figure 9. Test cases triggering three integer overflows in VxWorks 6.6 (The text in red indicates the key content that triggers the crash).

Figure 10. Test case triggering DoS in VxWorks (The text in red represents the key content that causes the DoS).

Figure 11. System output when anomaly detected by client heartbeat detection mechanism.

Figure 12. TCAR on VxWorks6.6.

Figure 13. TCAR on VxWorks6.9.

Figure 14. TCAR on VxWorks6.9_z7.

Figure 15. Number of interesting cases for each protocol state during the testing process of VxWorks6.6.

Figure 16. Number of interesting cases for each protocol state during the testing process of VxWorks6.9.

Figure 17. Number of interesting cases for each protocol state during the testing process of VxWorks6.9_z7.

Table 1. Transition conditions and response packets for TCP server states.

TCP Server States	Conversion Conditions	Response Packets
LISTEN	-	-
SYN_RCVD	SYN	{SYN, ACK}
ESTABLISHED	SYN, ACK	-
CLOSE_WAIT	SYN, ACK, FIN	ACK
LAST_ACK	SYN, ACK, FIN	FIN
CLOSED	SYN, ACK, FIN, ACK	-

Table 2. Generation strategies for TCP protocol fields.

TCP Fields	Formation Strategy
Source port	Random acquisition
Sequence, Acknowledgment number	Timing dependency calculation
Data offset, Reserved, Flags, Window size, Urgent pointer	Progressive assignment
Options	Two-layer composite mutation

Table 3. The fuzzing results of vxTcpFuzzer on three VxWorks devices.

Test Devices	Number of Memory Changes	ATCAR	ATSAR	Number of Crashes	Vulnerability Type
VxWorks6.6	1661	44.94%	23.79%	4	Integer overflow, DoS
VxWorks6.9	3629	53.93%	31.83%	1	DoS
VxWorks6.9_z7	2427	54.92%	34.70%	1	DoS

Table 4. Number of anomalies detected by the dual anomaly detection mechanism during fuzzing.

Test Devices	WDB Detection Mechanism	Client Heartbeat Detection Mechanism
VxWorks6.6	3	1
VxWorks6.9	1	0
VxWorks6.9_z7	1	0

Table 5. TSAR of vxTcpFuzzer and three fuzzing schemes on VxWorks6.6.

Number of Test Cases	vxTcpFuzzer TSAR	Boofuzz-Chksum TSAR	Netzob-Generation TSAR	Netzob-Mutation TSAR
5000	33.9200%	0.6000%	3.2600%	11.8800%
10,000	43.3400%	0.6000%	3.1400%	12.1400%
15,000	28.9000%	0.6133%	3.2533%	12.4200%
20,000	38.3000%	0.6150%	3.1900%	12.3750%
25,000	30.6400%	0.6080%	3.1720%	12.4200%
30,000	25.5333%	0.6100%	3.1767%	10.3500%
35,000	24.9228%	0.6142%	3.1485%	8.8714%
40,000	27.4575%	0.6075%	3.1300%	7.7625%
45,000	24.4066%	0.6089%	3.1489%	6.9000%
50,000	21.9660%	0.6120%	3.1640%	6.2100%
55,000	20.2727%	0.6090%	3.1581%	5.6454%
60,000	19.1033%	0.6100%	3.1617%	5.1750%
65,000	18.0384%	0.6123%	3.1569%	4.7769%
70,000	17.0157%	0.6114%	3.1785%	4.4357%
75,000	15.8813%	0.6066%	3.1880%	4.1400%
80,000	14.8887%	0.6075%	3.2087%	3.8812%

Table 6. TSAR of vxTcpFuzzer and three fuzzing schemes on VxWorks6.9.

Number of Test Cases	vxTcpFuzzer TSAR	Boofuzz-Chksum TSAR	Netzob-Generation TSAR	Netzob-Mutation TSAR
5000	33.9200%	0.7600%	3.2600%	18.3800%
10,000	56.9700%	0.7500%	3.1400%	18.6500%
15,000	55.7466%	0.7666%	3.2533%	18.7066%
20,000	45.1200%	0.7700%	3.1900%	18.6300%
25,000	36.0960%	0.7600%	3.1720%	18.6360%
30,000	30.4133%	0.7633%	3.1766%	15.5300%
35,000	26.0714%	0.7685%	3.1485%	13.3114%
40,000	26.6700%	0.7600%	3.1300%	11.6475%
45,000	27.6800%	0.7644%	3.1489%	10.3533%
50,000	24.9120%	0.7680%	3.1640%	9.3180%
55,000	22.6472%	0.7636%	3.1581%	8.4709%
60,000	20.7600%	0.7650%	3.1616%	7.7650%
65,000	22.1107%	0.7661%	3.1569%	7.1676%
70,000	27.5914%	0.7657%	3.1785%	6.6557%
75,000	32.3786%	0.7640%	3.1880%	6.2120%
80,000	34.8900%	0.7662%	3.2087%	5.8237%
85,000	36.8011%	0.7658%	3.2235%	5.4811%
90,000	37.8422%	0.7633%	3.1988%	5.1766%
95,000	35.8926%	0.7673%	3.2063%	4.9042%
100,000	34.1110%	0.7760%	3.1920%	4.6590%

Table 7. TSAR of vxTcpFuzzer and three fuzzing schemes on VxWorks6.9_z7.

Number of Test Cases	vxTcpFuzzer TSAR	Boofuzz-Chksum TSAR	Netzob-Generation TSAR	Netzob-Mutation TSAR
5000	33.9200%	0.7600%	3.2600%	18.3800%
10,000	44.1400%	0.7500%	3.1400%	18.6500%
15,000	40.400%	0.7666%	3.2533%	18.7066%
20,000	38.7050%	0.7700%	3.1900%	18.6300%
25,000	30.9640%	0.7600%	3.1720%	18.6360%
30,000	25.8066%	0.7666%	3.1766%	15.5300%
35,000	30.3485%	0.7714%	3.1485%	13.3114%
40,000	27.6700%	0.7625%	3.1300%	11.6475%
45,000	24.5955%	0.7666%	3.1489%	10.3533%
50,000	30.4000%	0.7700%	3.1640%	9.3180%
55,000	36.4090%	0.7654%	3.1581%	8.4709%
60,000	39.0333%	0.7667%	3.1616%	7.7650%
65,000	37.3661%	0.7692%	3.1569%	7.1676%
70,000	41.8400%	0.7685%	3.1785%	6.6557%
75,000	45.7173%	0.7667%	3.1880%	6.2120%
80,000	43.0600%	0.7687%	3.2087%	5.8237%
85,000	40.5270%	0.7694%	3.2235%	5.4811%
90,000	41.4000%	0.7667%	3.1988%	5.1766%
95,000	39.2210%	0.7705%	3.2063%	4.9042%
100,000	37.2600%	0.7800%	3.1920%	4.6590%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Y.; Han, J.; Deng, X.; Hei, X. A High-Acceptance-Rate VxWorks Fuzzing Framework Based on Protocol Feature Fusion and Memory Extraction. Future Internet 2025, 17, 377. https://doi.org/10.3390/fi17080377

AMA Style

Wang Y, Han J, Deng X, Hei X. A High-Acceptance-Rate VxWorks Fuzzing Framework Based on Protocol Feature Fusion and Memory Extraction. Future Internet. 2025; 17(8):377. https://doi.org/10.3390/fi17080377

Chicago/Turabian Style

Wang, Yichuan, Jiazhao Han, Xi Deng, and Xinhong Hei. 2025. "A High-Acceptance-Rate VxWorks Fuzzing Framework Based on Protocol Feature Fusion and Memory Extraction" Future Internet 17, no. 8: 377. https://doi.org/10.3390/fi17080377

APA Style

Wang, Y., Han, J., Deng, X., & Hei, X. (2025). A High-Acceptance-Rate VxWorks Fuzzing Framework Based on Protocol Feature Fusion and Memory Extraction. Future Internet, 17(8), 377. https://doi.org/10.3390/fi17080377

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A High-Acceptance-Rate VxWorks Fuzzing Framework Based on Protocol Feature Fusion and Memory Extraction

Abstract

1. Introduction

2. Background and Motivation

2.1. Overview of the TCP Protocol

2.2. Network Protocol Fuzzing Method

2.3. The Task Characteristics of VxWorks

2.4. VxWorks Debugging Mechanism

3. Design and Implementation

3.1. Framework

3.2. Protocol Feature Fusion Fuzzer

3.3. Memory Feedback Utilization Method

3.3.1. Memory Data Extraction Method

3.3.2. Heuristic Havoc Mutation

3.4. Dual Anomaly Detector

4. Implementation and Evaluation

4.1. Experimental Setup

4.2. Runtime Testing

4.2.1. Memory Data Changes

4.2.2. Vulnerability Identification

4.2.3. Performance of the Dual Anomaly Detector

4.3. Comparison with Benchmark Tools

4.3.1. Test Case Acceptance Rate

4.3.2. Test System Abnormal Rate

4.3.3. Found Bugs

4.4. Evaluation of Memory Feedback Utilization

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI