BSFuzz: Branch-State Guided Hybrid Fuzzing
Round 1
Reviewer 1 Report
Internal validity:
- The experimental setup compares BSFuzz against an established hybrid fuzzer QSYM using the same benchmark programs and time duration. This provides a fair comparison to evaluate the effectiveness of the proposed techniques.
- Repeating each 24 hour experiment 5 times and averaging the results helps account for the variability in fuzzing. This strengthens the internal validity.
- The benchmarks chosen are real-world programs commonly used in fuzzing research, making them representative subjects for evaluation.
- The coverage metrics focus on branch coverage, which directly relates to the techniques BSFuzz aims to improve. This is an appropriate measure for the goals.
Potential limitations:
- More implementation details could be provided for how BSFuzz was integrated with QSYM and AFL. This would give better insight into internal validity.
- The results can vary substantially across different program benchmarks. More investigation into program characteristics that impact effectiveness could be beneficial.
External validity:
- The benchmark programs cover a diverse set of real-world applications including file utilities, document processors, network tools, and multimedia libraries.
- The generally positive results across this set suggest the techniques may transfer well to other subjects, supporting external validity.
- The authors appropriately limit claims to the context of hybrid fuzzing. The ideas may not be as applicable for pure fuzzing or concolic execution.
- Testing for 24 hours follows conventions for fuzzing evaluations, but longer-term impacts are still uncertain.
Overall, I would say the methodology is reasonably rigorous and the validity is decent but could be strengthened with the additional context and investigations suggested. The paper presents promising initial evidence that broader application of these techniques may be beneficial, but further generalization requires more research. Additionally, the manuscript presents a novel approach called BSFuzz for improving hybrid fuzz testing by tracking branch coverage state and constraint solving history. The idea is interesting and has the potential to advance the state-of-the-art in software testing. The methodology is sound, leveraging lightweight instrumentation to track coverage and selectively solving constraints. The two main strategies of filtering unsolvable branches and high-frequency branches are reasonable. The experimental methodology comparing BSFuzz to QSYM on real-world benchmarks over 24 hours follows standard practice. The results generally validate the effectiveness of BSFuzz in increasing branch coverage and solving speed. The writing is clear and well-structured. The background provides sufficient context and the authors clearly explain the motivation behind their approach.
Recommendations:
- While the experimental results are positive overall, the improvements vary across benchmarks. It would be good to provide more insight into why the results on some programs like pdftops and pngfix are less strong.
- The limitations around instrumentation granularity and accuracy in tracking branches could be expanded on. How big is this problem in practice? Some examples where it causes inaccuracies would be useful.
- The related work section covers the key areas, but could go into more technical depth contrasting how existing hybrid fuzzers handle scheduling and synchronization. This would better highlight the novelty of your techniques.
- There are a few typos and minor grammar issues that should be corrected in a final edit.
Overall, I would recommend accepting this paper with minor revisions to address the points above. The core idea is solid and experimental results demonstrate the potential of BSFuzz. Addressing the limitations and expanding the related work should further strengthen the manuscript. This is an excellent contribution that will interest researchers and practitioners in software testing and cybersecurity.
There are a few typos and minor grammar issues that should be corrected in a final edit.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Authors propose a system that could provide automated software testing support, by generating test cases based on fuzzification, in aim to provide a better test cases coverage for particular branches.
These are advices for authors to make some changes to the submitted manuscript:
1. Introduction (third paragraph, line 48) - for each statement, unless it is related to their contribution, authors should provide reference (from literature list) in text. For example, the first sentence of this third paragraph of Introduction starts with "For test case generation, random mutation is much faster than constraint solving, which causes a delay in coverage state synchronization between the fuzzer and the concolic executor." This sentence is a statement that should be accompanied with [reference number] from the literature list, to have the statement supported by the results of another research. Otherwise, this statement could be understood as a conclusion of these authors, while it is stated in the introduction section, not discussion section that discusses results of research presented in this manuscript. So, authors should generally be more careful with statements.
2. It is not usual to have tables and diagrams in the motivation section. These elements (tables with data and diagrams) are to be used in the results section, not here in Motivation section.
3.In the motivation section there are some statements that are confusing, for example "We conduct tests on a set of real-world benchmark programs using QSYM" (line 109)...Here it is not clear if authors made some experiments within this paper, or they presented it in some conference or journal.
4. It is not appropriate to provide details on experimental research within Motivation section, but within Research methodology ,Results and discussion section. For example, in the first line of 2.1. section it is "A significant number of test cases are generated during the fuzzing process." This 2.1. belongs fo Motivation section and this sentence is very confusing, since it has the verb "are generated" which implies that authors did that recently within this manuscript. There are many more such examples in the Motivation section. Another example: "Through research, we discovered that the concolic executor will continually solve the constraints of some branch positions that are not actually solvable, wasting time and resources. ".
5. Figure 1 presents a diagram with some experimental results and it should not be presented in this section - Motivation. The right place for diagrams as these is Results and Discussion section.
6. Figures should have a brief content explanation in Figure caption. All the data related to the content of an image (explanation of diagram axis and results) should be placed after the figure, not in the figure caption (i.e. figure name)
7. Table 3. is too wide and does not fit into the page well enough. It is not neat enough.
8. Section 2.3. "High-frequency Branches" starts with a code listing. It is usual to have at least several sentences before any listing or diagram...
9. Figure 4. is totally unnecessary, since it is placed in the Motivation section.
10. General comment - authors should provide a a better organized manuscipt, with sections ordered as follows: Introduction, Motivation and Background, Related work, Proposed system, Research methodology (having details on the software tools used and the source for experimental data), Results and discussion, Conclusion.
English words are not properly written, particularly for tenses of verbs.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
electronics-2618007 : BSFuzz: Branch-State Guided Hybrid Fuzzing
Content
----------
The goal of this paper is to propose BSFuzz, a Branch-State Guided Hybrid Fuzzing, which keeps tracking the coverage state and solving state in a lightweight branch state map.
Firstly, BSFuzz timely synchronizes the current coverage state of all test cases from the fuzzer’s queue with the concolic
executor to reduce constraint solving for high-frequency branches.
Therefore, It also records the branch-solving state during concolic execution to reduce repeated solving of unsolvable branches.
Guided by the coverage state and historical solving state, BSFuzz can efficiently discover and solve more branches.
BSFuzz has been tested by two separate strategies, BSFuzz-uns and BSFuzz-fre, on six popular benchmark programs.
The experimental results on real-world programs shows that BSFuzz can effectively increase the speed of the concolic
executor and improve branch coverage.
Major comments
--------------
1. Concolic testing is a new concept comes from :
Towards Optimal Concolic Testing. In ICSE ‘18: ICSE ‘18: 40th. International Conference on Software Engineering , May 27-June 3, 2018.
2. Listing 1, 2 should use format of pseudo code.
3. The format of Table 3 is broken. It should be a three-line table.
Evaluation
--------------
Given the above, I'm in a position to minor revision.
Minor editing of English language required
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 4 Report
This paper propose a BSFuzz(Branch-State Fuzzing) which keeps tracking the coverage state and solving state in a lightweight branch state map.
This paper has some additions and corrections.
- Complete the state map in Figure 3 by adding it
- Please present the structure of the state map.
- Show the differences and performance evaluation results of similar SHFuzz, DigFuzz, etc.
Nothing
Author Response
Please see the attachment.
Author Response File: Author Response.pdf