BugMiner: Mining the Hard-to-Reach Software Vulnerabilities through the Target-Oriented Hybrid Fuzzer

: Greybox Fuzzing is the most reliable and essentially powerful technique for automated software testing. Notwithstanding, a majority of greybox fuzzers are not effective in directed fuzzing, for example, towards complicated patches, as well as towards suspicious and critical sites. To overcome these limitations of greybox fuzzers, Directed Greybox Fuzzing (DGF) approaches were recently proposed. Current DGFs are powerful and efﬁcient approaches that can compete with Coverage-Based Fuzzers. Nevertheless, DGFs neglect to accomplish stability between usefulness and proﬁciency, and random mutations make it hard to handle complex paths. To alleviate this problem, we propose an innovative methodology, a target-oriented hybrid fuzzing tool that utilizes a fuzzer and dynamic symbolic execution (also referred to as a concolic execution) engine. Our proposed method aims to generate inputs that can quickly reach the target sites in each sequence and trigger potential hard-to-reach vulnerabilities in the program binary. Speciﬁcally, to dive deep into the target binary, we designed a proposed technique named BugMiner, and to demonstrate the capability of our implementation, we evaluated it comprehensively on bug hunting and crash reproduction. Evaluation results showed that our proposed implementation could not only trigger hard-to-reach bugs 3.1, 4.3, 2.9, 2.0, 1.8, and 1.9 times faster than Hawkeye, AFLGo, AFL, AFLFast, QSYM, and ParmeSan respectively but also scale to several real-world programs.


Introduction
The development of modern information technology is accompanied by adverse events such as industrial spyware, computer crimes, unauthorized access, and modification or loss of confidential information. The main reason for this is the presence of a vulnerability in the software. According to Edgescan's [1] vulnerability stats report 2019, it was discovered that network vulnerabilities increased slightly from 73% to 81%, and 27% of discovered application vulnerabilities decreased to 19%. Despite the fact that the Internet has a lot of vulnerability density, the application layer has the majority of high and critical risk exposures residing.
Software bugs have emerged as the underpinning reason for dangers to the safety of virtual life. Defined in RFC 2828 [2], a software flaw is an error or faintness in a system's layout, application, or process and control that could break the system's security policy. Attacking the system by using these bugs, specifically on 0-day vulnerabilities, can bring serious damages. Therefore, primarily hunting bugs is vital in the bug supervision procedure.
Fuzzing is the most reliable technique for automated software testing that feeds programs with random input and detects vulnerabilities critical to security [3]. The currently increased application of fuzzing [4] in either academia and commerce, such as Microsoft's Springfield [5] and Google's OSS-FUZZ [6], demonstrates its aptitude in discovering numerous sorts of vulnerabilities in real-life programs. It is viewed as the most efficient and upgradable, which affords several seeds to the Program Under Test (PUT) and displays the nonstandard actions such as 0-day vulnerabilities, buffer or heap overflow, etc., [7]. Meanwhile, the suggested fuzzing has won preeminence in industry and academia. It also advanced into various forms of fuzzers for dissimilar testing situations. Fuzzing can be categorized as blackbox, whitebox, or greybox [8], rendering to their consciousness of the core structure of the PUT. Lately, greybox fuzzers have been widely utilized and recognized to be actual. Specifically, American Fuzzy Loop (AFL) [9] and its sources [10][11][12][13] receive significant percentages of attention.
While fuzzing can be extraordinarily powerful, it presents drawbacks at the same time. It has been unproductive at detecting hard-to-reach vulnerabilities deep inside programs [14], for the majority of randomly mutated new seeds fail to gratify the constraints that illustrate proper seeds and later fail to execute the rest of the code. Inspired by the necessity to find hard-to-reach bugs, scholars have implemented directed fuzzing [14][15][16].
Directed Grey-box Fuzzing (DGF) [11,17] is designed to fuzz on selected suspicious target locations, with programs to a variety of safety perspectives: (1) bug reproduction [17,18], for example, if a JavaScript engine for embedded systems called MJS [19] discovers a bug on MSP432 ARM platform that bug may arise within the conforming code for other platforms. In such a circumstance, the fuzzing must be directed to detect the vulnerability at these places, (2) patch testing [20,21], when a bug is fixed, the programmers are required to investigate whether the patch entirely fixes the bug. This involves fuzzing to emphasize its endeavors on those fixed codes. In these two situations, the fuzzing needs to be directed at attaining particular user-indicated target sites in the PUT. According to the application, the target sites are originated from static analysis reports, bug stack traces, or patches. Nevertheless, models of the DGF generating and parsing graphs and calculating distances in the instrumentation stage are very tedious. If those who use these DGF tools are sensitive to time consumption or have restricted computing resources, the above-mentioned might be extremely difficult. Consequently, such users will choose a lightweight and frivolous analysis approach, which can diminish the computing resource supplies and general analysis time.
We perform BugMiner as a new Target-Oriented Hybrid Fuzzer (TOHF) that attempts to overcome commonly described issues. BugMiner combines and directs two of the most efficient testing tools, including state-of-the-art fuzzing and Concolic Execution (CE) across input synchronization. The cardinal goal of BugMiner is to dive deeply into the program binary for mining the hard-to-reach vulnerabilities faster than other existing software testing tools. More specifically, the first step of the BugMiner is to identify unsafe functions and extract them from the Common Vulnerabilities and Exposures (CVE) [22]. To automatically extract unsafe functions, BugMiner utilizes Artificial Intelligence's text mining tool named Natural Language Processing (NLP) [23]. It is widely accepted that the advantages of natural language processing are innumerable. In addition, it is worth considering that we used NLP to mine bug report automatically and collect the vulnerable targets. Collected data were used in the next static analysis step to calculate the distance between the entry point of the program and the target function. Furthermore, we implemented a BranchPruner module to overcome the path explosion issue and enhance the hybrid fuzzing performance. After this process, the dynamic analysis starts to fuzz the program with fuzzing and CE. In addition, to dive deep faster into the program, we designed the dynamic analysis of BugMiner with an input prioritization module that categorizes the test-inputs.
The vital contributions of this research include the following: • This research presents a novel target-oriented hybrid fuzzing tool named BugMiner, which combines greybox fuzzing and concolic execution engine. It solves the constraints to execute the uncovered complex nested branches and generates more effective test-inputs to dive deep into the program. • A set of novel methods is proposed to increase the efficiency of the bug hunting approach and decrease the time-consuming preprocessing. More specifically, we implemented a Bug Report Analyzer built on a Machine Learning tool named NLP. The bug report analyzer identifies vulnerable functions and extracts them from bug reports, then builds a specified target database that can be used in further steps. In addition, BugMiner provides three test-case pools (T-Pool) that create an opportunity to add newly generated inputs into different categories based on the priority. • To enhance target-oriented hybrid fuzzing and overcome the Path Explosion problem of CE, we designed BugMiner with a BranchPruner module that gathers a set of branch addresses not related to the specified target. This allows BugMiner to trigger the hard-to-reach bug without excessive effort. We also significantly reduced the budget for bug hunting time than the existing approaches. • We validated the effectiveness of BugMiner by carrying out several experiments with different datasets, including LAVA-M, Binutils, LibPNG, OpenSSL, and real-world programs. We compared the capability of BugMiner against the popular software testing tools, such as AFL, AFLFast, AFLGo, Hawkeye, QSYM, and ParmeSan. Comprehensive experiment results demonstrate that the proposed implementation can trigger hard-to-reach bugs faster than baseline tools and scale to several real-world programs. Furthermore, we provide dataset statistics for the comfort of readers.
The rest of this paper will proceed as follows. We introduce the background knowledge of this research in Section 2. Section 3 annotates our motivation for this research. In Section 4, we explain the proposed methodology to improve the efficiency of software testing. In Section 5, we depict the proposed method's implementation. Section 6 demonstrates various comparable experiments and examines the evaluation results. Finally, Section 7 concludes this research work.

Background
Fuzzing. Fuzzing is the most efficient technique to detect software vulnerabilities. There are blackbox, whitebox, and greybox fuzzing approaches in software testing. The simplest form of Blackbox Fuzzing produces random inputs to detect software vulnerabilities. The advantage of blackbox fuzzing is that it is straightforwardly well-matched with any program. On the opposite side of the range, there is Whitebox Fuzzing [30,31], utilizing heavyweight examination, for example, Symbolic Execution (SE), to produce test cases that trigger vulnerabilities, as opposed to blindly analyzing a massive variety of inputs. Practically, whitebox fuzzing is weak at compatibility problems or scalability in real-world applications. Recently, Greybox Fuzzing, which presents a medium position between whitebox and blackbox fuzzing, has become the most scalable and sensible fuzzing technique. Using the identical, the most adaptable methodology such as blackbox fuzzing is beneficial. Nevertheless, practical speaking strategies of greybox fuzzing procedures profit scalable and effective testing due to the lightweight mutation of seeds [32]. The famous state-ofthe-art fuzzing tool is "American Fuzzy Lop" (AFL) [9] that utilizes performance outlining the information to mutate the seed. Meanwhile, such fuzzing techniques as VUzzer [33] and Angora [12], depending on active Data-Flow Analysis (DFA) to rapidly produce inputs that cause uncovered branches in the program binary, with the purpose of expanding code coverage.
Hybrid Fuzzing. Hybrid Fuzzing as the main topic has pulled in huge consideration and made noteworthy contributions to bug hunting. Hybrid fuzzing [28,[34][35][36] naturally combines fuzzing with SE or CE engine to deal with the insufficiency of both methodologies. The goal of the CE in a hybrid fuzzer is to enable the fuzzer to overcome narrow boundaries and better understand application logic. Driller [36] alternates the fuzzing with the CE during the program testing process. Moreover, the CE starts automatically when the fuzzing gets stuck. Another hybrid fuzzer, DigFuzz [37], evaluates the probability and selects the suitable paths by utilizing the "Monte Carlo" model. In order to recognize nested checksums, TaintScope [15] uses taint analysis, and then it starts to analyze the target with SE to generate the test cases. QSYM [28] improved the performance of hybrid fuzzing by merging CE and native execution. As a result, it showed high efficiency while testing real-world programs.
Coverage-based fuzzing or hybrid fuzzing are excellent standard approaches to acquire more paths and cover the more basic blocks, but discovering deep target bugs and analyzing patched bugs could be time-consuming with those undirected approaches. BugMiner tackles these issues by combining the fuzzer and Target-Oriented Concolic Execution (TOCE). To achieve high testing performance, TOCE takes inputs from the fuzzer, and, based on the paths, TOCE generates efficient inputs that can dive deep to trigger the hard-to-reach vulnerabilities.
Directed fuzzing. Directed fuzzing has been designed to guide fuzzing towards weak, vulnerable points in programs [38]. The instinct is that, compared with coveragebased fuzzing by directing fuzzing towards a particular target point within the program, the directed fuzzing can define precise vulnerabilities more quickly. As stated above, traditional directed fuzzing utilizes symbolic execution, which suffers from compatibility troubles or scalability.
Directed symbolic execution. SE plays a vital role in the directed fuzzer due to its path examination method. Therefore, the majority of proposed directed fuzzers [39][40][41][42] are whitebox fuzzing grounded on SE. Basically, they assemble symbolic path constraints while executing the program and run constraint solving modules to produce efficient inputs that can dive deeper into uncovered paths. As an example, Do et al. [39] apply an extended chaining method and information dependence analysis to construct event sequences main to the target locations and demonstrate directed concolic execution to produce aim-based test cases. To reach the specified target site and analyze programming patches, KATCH [20] associates a SE tool named KLEE [30] with numerous innovative heuristics. Another approach, BugRedux [18], uses a classification of application declarations as input and produces the seeds that trigger the bug. Directed symbolic execution alters the issue of reachability to constraint satisfiability issue and devotes the vast majority of execution time to heavyweight software examination and constraint solving, and, for this reason, this method is considered powerful [43]. Though the efficiency of the directed SE is high, to analyze the heavyweight real-world programs consumes significant time. On the other hand, BugMiner utilizes TOCE as a support to the fuzzing process that eases the incompetence of SE.
Directed Greybox Fuzzing. Within a time when the SE produces single test cases, a greybox fuzzer such as AFL [9] can create a few different test cases, so utilizing the greybox fuzzers can significantly improve the performance of the directed fuzzers. Hawkeye [17], Lolly [44], and AFLGo [11] are DGFs that emphasize the importance of arriving at the target sites in a program. AFLGo [11] is DGF that showed high efficiency in reaching the target by accepting a meta-heuristic to endorse the inputs. Specifically, it accepts target reachability challenges as a core issue and tries to diminish inputs' distance to the target sites. Unfortunately, AFLGo converts the majority of program examination to the instrumentation stage in return for efficiency at runtime. Because the instrumentation phase of AFLGo calculates the distance between each node and the target position as well as measuring the seed distance to the target position, AFLGo needs to analyze the PUT's call graph and the Control Flow Graph (CFG). It takes a long time to parse the graph and measure the distance at the instrumentation phase. Hawkeye [17] proposed an indirect function call that eases to calculate the distance and improved the performance of the AFLGo to decrease the instrumentation time. One of the lightweight DGFs is Lolly [44], which analyses targets according to sequence coverage. However, the seed scheduling of Lolly is not as efficient as AFLGo. ParmeSan [29] proposed a novel sanitizer-guided fuzzer that increases the bug coverage automatically. To improve the distance evaluation accuracy, ParmeSan utilizes DFA in CFG construction time. It also uses DFA information in the bug detecting stage to achieve the low Time-To-Exposure (TTE). Zong et al. [45] proposed a new directed fuzzing tool named FuzzGuard that can trigger hard-to-reach vulnerabilities. More precisely, FuzzGuard is a Deep Learning based DGF that filter-out inefficient test-inputs that cannot arrive at a specified destination point; then, fuzzer executes the PUT with only productive test-inputs. As a result, the target software vulnerability can be triggered more quickly.

Motivation
A simplified example in Listing 1 shows problems regarding recent fuzzing tools that demonstrate our motivation. There is a User-After-Free (UAF) vulnerability because of a missing exit () call, a typical root cause of these vulnerabilities, for example, CVE-2014-9296. The system reads a file, and the contents of this file are copied to a buffer. Precisely, after allocating the memory chunk pointed at through m (Line 17), m_alias and m become aliases (Line 20). The memory pointed to by each pointer is freed in the vuln_func (Line 15) function. The UAF vulnerability arises while the dereferenced memory is reclaimed through m (Line 24). Bug-triggering states. When the first three bytes of the seed are 'SNU', the UAF vulnerability is triggered. To determine this vulnerability immediately, fuzzer has to navigate the correct path through the if conditional statements in lines 8, 18, and 23 to deal with UAF's three events including free, alloc and use, respectively. It should be emphasized that this UAF vulnerability does not cause the program to crash, thus current greybox fuzzing tools without sanitization will not discover these kinds of vulnerabilities.
Coverage-based greybox fuzzing. Beginning with an empty input, AFL rapidly produces three new seeds such as 'SSS', 'NNN' and 'UUU' to activate separately the free, alloc and use events. None of these inputs cause a memory error. Since the probability of producing the seed starting with 'SNU' from empty input is tremendously minor, the coverage-based fuzzing cannot be efficient here in monitoring a chain of UAF activities despite the fact that each distinct event is activated.
Directed Greybox Fuzzing. Provided a bug trace created by a lightweight instrumentation tool ASan [46], the DGF interrupts fuzzer from examining uncovered paths, such as the else part at line 10 in function func, in case the circumstance at line 8 is complicated. However, directed fuzzing tools have their misinterpretations. For instance, typical DGF seed selection methods support a seed that runs traces covering numerous sites on the target, rather than trying to arrive at these sites in the specified order. As an example, the standard DGF distances [11,17] for targets (S, N, U) do not distinguish between a seed S1 with the path S-N-U and other seed S2 with U-S-N. For one more example, Hawkeye proposed a power function that can allocate much energy to seeds where tracking does not arrive at the target location. This means that it might get lost in the toy example of the else part of line 10. AFLGo and Hawkeye cannot detect this kind of bug within 2 h, whereas our proposed BugMiner can detect this bug in less than 20 min.
Moreover, AFLGo is fruitful for patch test challenges and achieving high crash reproduction results, but it has a critical problem due to its distance calculating in the static analysis stage. AFLGo users suffer from excessive instrumentation time during preprocessing. As an example, in our comprehensive evaluation, AFLGo spent almost two hours compiling the Libarchive program and consumed more than three hours on compiling and instrumenting the Popler program. Performed observations encourage us to study a new, target-oriented hybrid fuzzing with high competence called BugMiner.

Proposed Methodology
This section will discuss the workflow design and methodology of BugMiner in detail. First, we express the overview of BugMiner in Section 4.1. Next, in Section 4.2, we explain on the Bug Report Analyzer approach that extracts unsafe functions from the bug reports. Section 4.3 details the Static Phase of BugMiner. Finally, Section 4.4 describes the Dynamic Phase of BugMiner, which contains the most up-to-date, fast, and powerful software testing approaches.

BugMiner Overview
This section provides a clear picture of the workflow of the proposed approach named BugMiner. Figure 1 depicts the overview of BugMiner, which includes three key components: bug report analyzer, static analysis, and dynamic analysis. The bug report analyzer demonstrates the feasibility of using NLP tools to automatically identify and extract the unsafe functions by analyzing descriptions of CVE. Extracted vulnerable functions are then placed in the Targets database for use in the static analysis process. The static analysis takes the program's source code, and the specified target as input, then outputs the instrumented program binary, CFG, and BranchPruner data, which provides basic block level distance. In terms of dynamic analysis, the hybrid fuzzer which includes AFL, TOCE, and Input Prioritization, takes the target binary, an initial seed, target sites, and BranchPruner information as the inputs. The main result of the dynamic analysis is the test-inputs that cause the program to crash.

Bug Report Analyzer
The bug report analyzer identifies and extracts vulnerable functions from bug reports. More precisely, unique vulnerabilities of the software after discovery are often shared with big communities. Since the CVE bug report is well-formatted, it is feasible to utilize NLP tools to extract vulnerability-related information (e.g., unsafe function name) for other target programs with some extra effort. CVE [22] provides a reference method for publicly known security exposures and vulnerabilities, publishing information such as vulnerability type, unsafe functions, and affected versions.
As mentioned above, we implemented the bug report analyzer module based on the NLP techniques. In addition, the working process of this module is straightforward and includes information retrieval and vulnerable function extraction. Figure 2 illustrates the workflow of the bug report analyzer. In order to analyze CVE bug reports and extract unsafe function names automatically, first, this method retrieves the information by utilizing Sentence Segmentation, Tokenizing, Syntactic Parsing, Part-of-speech (PoS), and Chunking techniques. The output of the first step is the Parse Tree that recognizes the PoS tag of each word. Next, we reanalyze the parse tree in the vulnerable function name extraction step. In this step, we convert the parse tree to string to extract the vulnerable function names. To do this, we implemented three methods that can easily extract unsafe function names. Although all three approaches' exact purpose is homogeneous, their performance indicators and time consumption for extraction are different.

Static analysis
Dynamic analysis

BranchPruner
Instrumented binary Bug Report Analyzer Neighbor approach. When one looks up CVE reports, it is easy to realize that the CVE has some fixed description patterns, such as:  One possible way is to summarize or learn the pattern of the CVE descriptions and then extract the possible function name for each category of the CVE description. To explain this approach clearly, we provide one bug report (CVE-2018-11237) as an example. Its vulnerability description is as follows: "An AVX-512-optimized implementation of the mempcpy() function in the GNU C Library (aka glibc or libc6) 2.27 and earlier may write data beyond the target buffer, leading to a buffer overflow in mempcpy_avx512_no_vzeroupper" [47]. In the given example, the vulnerable function mempcpy is located between the two words "the" and "function". If the CVE descriptions are carefully looked up, it is clearly seen that 80% of CVE bug reports were written in this same structure. One of the easy ways to extract an unsafe function name in this method is that, first, we must find the word named "function". Second, we have to check whether "the" article is located before the word "function" or not. If the first word is "the" and the third word is "function", the unsafe function name is located between those two words.
Punctuation marks approach. This method is slightly different from other approaches and it is easier to use. In 15% of the CVE descriptions, punctuation marks are featured. Considering that the unsafe function and the punctuation mark come side by side, for example, getcwd() and realpath(), it is easy to extract the unsafe function name in this method. We first search for the punctuation mark "()". Next, if this round bracket symbol is located inside the CVE description, the vulnerable function name is mainly located before the round bracket symbol.
Dictionary approach. This method differs significantly from other methods, despite its use of NLP techniques. To be more specific, words inside the CVE bug description have their own meanings in English but function names, such as clntudp_call, posix_memalign, printf, and strcpy, do not have any English meanings. In the third method, we extract unsafe function names with the help of an English dictionary which includes more than 20,000 words. This process is also straightforward. After retrieving the CVE bug description, the NLP tool parses it, then tokenizes and generates a parser tree. After that, the program automatically separates all NP, NN (noun) nodes of the parse tree, and then separated words are compared with the words in the English dictionary. If the words do not match the dictionary words, extracted function names are placed in the database.
In addition, Table 1 illustrates a part of the extracted unsafe functions which can cause an attack, and vulnerability, such as format string, buffer overflow, multiple command injection, and DOS overflow. For instance, gets() function is always vulnerable, so it seems reasonable for a static analysis tool to report all uses of gets(). The strcpy() function can be used safely but is often the source of buffer overflow vulnerabilities [48].

Static Analysis
The static analysis determines the software vulnerability by checking the program without execution. The static analysis examines the application in a variety of ways, although its analysis source code is straightforward and can be tested even if there is a defect in writing the program. In this way, utilizing static analysis gives us a chance to make assertions of pretty much all conceivable program executions as opposed to simply the experiment execution. From a security perspective, this is a critically preferred standpoint [48].
Graph Extractor. Directed fuzzers are dependent on statically generated CFGs for distance calculation. To achieve accurate distance, we first instrument the program and construct the CFG. The CFG is defined as a graph of basic code blocks, and it helps to find the path to the bug location. With the assistance of lightweight angr [49], we build the CFG. Angr generates the CFG relying statically on the analysis of the target program, starting from every function block, and searching the jump edges in the graph.
Branch Pruner. In target-oriented testing, it is pointless to analyze paths other than the path that leads to the specified target location, which can cause inefficient completion of the software testing process. This, in turn, complicates the bug hunting process and also causes path explosion issues. Therefore, to overcome these challenges, we implemented a straightforward method named BranchPruner that includes ShortestPathFinder and Branch-Pruner modules. To get the branch pruner locations, first, we need to identify the shortest path from the PUT's entry point to the specified target. The ShortestPathFinder gets each control flow and specified target site to calculate the inter-procedural distance for each node. The use of the target sites allows the ShortestPathFinder module to seek their strings in the recovered CFG. Specifically, it disassembles the PUT binary to identify the address of the specified target function. If it gets a specified target function address, it highlights this address as a target destination in the recovered CFG. For example, if a vulnerable function such as gets() is found, the branch address of this function can be a target destination point. After that, ShortestPathFinder gains addresses of basic blocks appeal to a target function and repetitively gather addresses of parent basic blocks until reaching the PUT entry point. When the ShortestPathFinder process is complete, the BranchPruner process begins. It collects the addresses of all branches not related to the shortest path. The collected branch pruning data are sent for dynamic analysis, and TOCE utilizes this data to inform the fuzzing with the right path to the target location.
The workflow of the branch pruner is illustrated in Algorithm 1. It takes the target program binary PUT, the CFG of the target program, and the specified target function S TF from the target database as inputs. It returns branch pruning addresses B PA , specified target branch address S TB , and entry point of the PUT (E P ) as outputs. The branch pruning process starts from loading the PUT and identifying the program's entry point (lines 2-3). Then, it loads the CFG to extract all branch addresses (lines 4-5). Next, it disassembles the target program T P to get the branch address of the S TF . After the S TB address is identified (line 7), it finds the shortest path and collects all branch addresses related to the path (line 8). Note that we utilized the Dijkstra [50] algorithm for finding the shortest path. The basic blocks that are not related to the ShortestPathList are put into the branch pruning addresses (B PA ). Otherwise, the process continues (lines 9-15).

Dynamic Analysis
Comparing to static analysis, dynamic analysis analyzes a program when it is running. Generally speaking, dynamic analysis allows the tester to see much more about the program, and it captures the concrete behavior of the software during a natural execution. There are different dynamic analysis approaches such as the fuzzer, SE, CE, and the hybrid fuzzer. Due to the effectiveness of hybrid fuzzing tools, they are currently widely used to test software applications.
Hybrid Fuzzer. Hybrid fuzzing as a research topic has gained enough popularity and made great contributions to software vulnerability detection. For instance, almost all the winners in the DARPA Cyber Grand Challenge [51] employed hybrid fuzzing tools. In comparison with plain fuzzing, hybrid fuzzing features and extra symbolic or concolic execution provide an opportunity to analyze the fuzzed paths, deal with the path conditions, and attempt to discover new uncovered paths. BugMiner combines AFL++ [52] and TOCE engine, which let us dive deeper into the program binary. These software testing approaches provide three seed queues with various priorities and add new efficient test inputs to the seed pool. In the following paragraphs, we explain each component of the BugMiner dynamic analysis.

Fuzzing approach.
Although there is a number of effective fuzzing tools such as AFL [9], AFLFast [10], and LibFuzzer [53], we prefer to utilize AFL++ [52] fuzzer. The reason for our preference is that this approach is highly valued due to the efficiency and integration of the outstanding program bug hunting technique features [54]. With these features, AFL++ can be a more distinctive and proficient fuzzer than other modern fuzzers. The fuzzer receives the initial inputs to fuzz the PUT and stores a newly generated input to the T-Pool 1 queue. On the other hand, the TOCE analyzes the PUT with the inputs from T-Pool 1 queue, then generates new effective inputs by solving constraints and inserts input into T-Pool 2 queues. As a result, the fuzzer can cover deeply hidden paths by using inputs from the T-Pool 2 queue. Algorithm 2 illustrates the greybox fuzzing process of BugMiner and the seed prioritization method. The fuzzer takes the PUT instrumented program and I initial input which can be built manually. It outputs crashing inputs that trigger the bug and newly generated test-inputs. The software testing process continues until the budget is exceeded or the abort signal is triggered. An initial input I is selected (line 4). Next, energy is assigned to the input (line 5), and then the mutation process of the fuzzer starts for the test-input (line 7). The fuzzer starts to execute the PUT with the mutated test-input T c ' . If PUT gets crashed by generated input T c ' , this test-input will be stored into the crash inputs C I (line 8-9). Otherwise, if the generated test-input has a new branch coverage, it will be put into the T-Pool 1 (lines 10-11). If a test-input T c ' that cannot crash the PUT does not have any new branch coverage, it will be stored into the T-Pool 3 (line 13).
Input Prioritization. Not all the inputs are equally important for the software vulnerability testing tools. Unfortunately, the fuzzer also generates useless seeds that fail to cover new paths. Consequently, this makes the fuzzer time-consuming and reduces its efficiency. To mitigate this problem, BugMiner provides three test-case pools (T-Pool) that create an opportunity to add newly generated inputs into different categories based on the priority. The inputs in the T-Pool 1 will be taken first, followed by inputs in the T-Pool 2 , lastly T-Pool 3  T c ← SelectInput (I) 5: Target-oriented concolic execution. It is true that CE plays a key role in the software testing process. It picks partially suited seeds as input, expecting to produce effective seeds with big coverage. In contrast, the fuzzer produces inputs by randomly mutating them, and although effective inputs that can cover more paths are mutated by the fuzzer, it does not ensure that the newly produced inputs reach the target site. CE collects the input execution trace data and then creates inputs to execute uncovered paths through constraints solution. Figure 3 illustrates sample execution paths covered by the CE approach. All these covered basic blocks build a concolic execution tree. More specifically, the CE approach tries to cover all the program basic blocks and constructs the completed path tree. However, if the program is heavyweight or it has a significant number of branches to cover, tremendous concolic runs will be required or the software testing process will stop without success. This issue is one of the biggest limitations of CE is called the Path Explosion problem [55]. For instance, the grep program contains more than 15K LOC and 8.200 basic blocks. Trying to cover all these basic blocks leads to a path explosion problem.  To tackle this issue, we use the BranchPruner module information. Specifically, in targetoriented testing, executing the branches and paths that cannot lead to the specified target location is meaningless. The goal is to reach the deeply hidden specified target branch, solve the constraints based on the execution path, and generate efficient test input for the fuzzer. Therefore, pruning the branches that cannot lead to the specified target and then avoiding covering these unnecessary basic blocks makes the bug hunting process effective. We not only overcome the path explosion issue, but we can also decrease the software testing time by using BranchPruner information.
Algorithm 3 demonstrates BugMiner's target-oriented concolic execution. TOCE takes target program PUT, test input T I from T-Pool 1 , branch pruning addresses B PA , specified target branch address S TB , and the entry point of the PUT E P as input. It returns new target-oriented test-inputs as output. PUT is executed with T I (line 2), and the entry point E P is set (line 3). Then, it starts looping from the E P until the specified destination-point S TB (line 4). If B PA is not contained node address (lines 5-6), then collected constraints are sent to the constraint solver engine to generate new test-inputs NewTestInput until it reaches the specified target location S TB (lines 7-9). Finally, each generated NewTestInput is put into the T-Pool 2 queue.

Implementation
BugMiner comprises three main components, Bug Report Analyzer, Static Phase, and Dynamic Phase.
The following section aims to describe the implementation of these components.
Bug report analyzer. We implemented this model for the sake of extracting unsafe functions that are likely stored in bug reports. In this model, we employed the NLP machine learning tool, which enables the machine to realize the human language, operate, and analyze it. NLP allows us to get bug reports and extract the unsafe functions.
Static phase. The static phase involves the Instrumentation, the Graph Constructor, and BranchPruner. In the instrumentation, LLVM Instrumentation was employed with aflclang-fast++. Specifically, the LLVM compiler transfers the target program's source code to LLVM IR (Intermediate Representations). The compiler environment variable is installed in afl-clang-fast++ that encourages the building of the instrumented binary. The Graph Constructor module of BugMiner produces CFG. To do this, BugMiner utilizes a lightweight alias analysis, data flow tracking, and integrating pre-defined strategies provided by angr. In the ShortestPathfinder module, BugMiner acquires each control flow and target site to calculate the inter-procedural distance for every basic block. After the shortest path is found, we can gather addresses that will be pruned in further steps. We implemented the BranchPruner module in Python language based on the DijKstra [50] algorithm, and, for parsing the CFG, we utilized the networkx library.
Dynamic Phase. The dynamic phase contains the fuzzer and the TOCE engine. More specifically, AFL++ 2.60d [54] was utilized as a fuzzer. In addition, we designed an input priority mechanism with three different T-Pools. The fuzzer selects an input according to its priority from a specific queue. In addition, to generate highly efficient target-oriented test inputs, we design and implement TOCE based on angr.

Performance Evaluation
This section is devoted to the evaluation of the prototype implementation in BugMiner and is also intended to examine the efficiency of our approach in the solution of bottlenecks.

Evaluation Setup
Research questions. In order to highlight the effectiveness of our approach, we address four practical research questions:

1.
Is it beneficial to employ bug report analyzer and branch pruner components in our approach? 2.
Do our suggested methods increase the software bug hunting process successfully? 3.
How does our suggested dynamic strategy affect BugMiner's performance? 4.
What is the role of BugMiner in achieving the deeply hidden target sites?
Evaluation dataset. To assess the effectiveness and usage of our TOHF, we carry out several experiments with different real-world programs. • Lava-M dataset [24] is seen as an effective experimental vulnerable programs dataset to detect hard-to-reach bugs in the PUT. To evaluate the bug hunting tools, most of the software security researchers utilize this dataset. • Binutils [25] is a combination of binaries used in the GNU/Linux system. This dataset also tested by a variety of famous research studies [10,11,17]. • LibPNG [26] is the official PNG reference library. It employs almost all PNG features, which have been widely used for over 23 years to evaluate software testing tools. • OpenSSL [27] is a library for programs that provide communication security over computer networks as opposed to eavesdropping. • Eight popular real-world applications can assist in evaluating the software bug detecting tools.

Experimental setup.
All the experiments were evaluated on a machine with 16 GB of RAM and a 2.7 GHz Core i5-6400 processor. We used Ubuntu 16.04, 64 bit OS.
Evaluation tools. We made comparisons of BugMiner with the following fuzzing tools: is a commonly utilized state-of-the-art fuzzer. • AFLFast [10] is an efficient greybox fuzzing tool that has optimized AFL with a new proposed power schedule algorithm. • AFLGo [11] is modern, efficient DGF tool that relies on AFL. In comparison with AFL, it provides information about node distance. • Hawkeye [17] is also a DGF tool that instruments the program to measure the distance of a certain test input to the target sites. Moreover, Hawkeye advances AFLGo by adding an indirect function call that facilitates the distance calculation of AFLGo. • QSYM [28] is a current efficient undirected hybrid fuzzing tool that uses the CE to customize unnecessary computations in symbolic interpretations and improve the efficiency of constraint emulation. • ParmeSan [29] refers to a "Sanitizer-Guided Greybox Fuzzing" (SGGF) tool which applies tripwire sanitization to direct the fuzzer and reveal violations earlier.

Bug Report Analyzing
We demonstrate the evaluation result of the bug report analyzer approaches that extracted vulnerable functions from CVE bug reports. To do this, we collected the CVEs demonstrated over the last seven years and obtained 450 CVEs that related to the GNU C library (glibc), Binutils [25], LibPNG [26], and OpenSSL [27]. There are also CVEs that contain vulnerabilities detected by well-known bug hunting tools [10,11,17,28] in the obtained 450 CVEs. These bug reports include the most common vulnerability types, such as buffer overflow, buffer over-read, use-after-free, DOS, exec code overflow, and others. Bug report analyzer, a part of the BugMiner, successfully extracted all 450 CVE unsafe functions. Figure 4 illustrates the comparison of the bug report analyzing approaches. Figure 4a shows the accuracy of bug report analyzers. It can be clearly seen from the diagram that the Dictionary method reached the highest accuracy with 94%. However, this method is more likely to produce more false-positive results than the other methods. In addition, the average elapsed time of Neighbor, Punctuation mark, and Dictionary methods for bug report analysis was 1.7, 1.9, and 1.69 s, respectively. Figure 4b illustrates the number of extracted unsafe functions from 450 CVE bug reports. The Neighbor approach extracted 409 unsafe functions from 450 CVEs. Considering that the descriptions of other 41 CVEs bug structures are different, this approach can exactly identify and extract unsafe function names via pattern recognition if the description of the bug report structure is the same as "the" article + "vulnerable function name" + word of "function". In the Punctuation marks approach, we extracted 41 CVEs' vulnerable function names from 450 CVEs. However, as mentioned in the neighbor approach, the punctuation mark approach extracts unsafe functions based on CVE's bug description structure. Due to the other 409 CVE reports, which do not contain punctuation marks, this method stayed at the lowest place.

Bug Reproduction
Software users often suffer from vulnerable programs, which cause huge damages. Most software applications are implemented with bug reporting techniques that report to developers when the crash happens. However, bug reports are limited to providing input data that causes crashes, and it mainly includes call stacks. According to the reported call stacks, developers are required to fix bugs in the program.
We evaluate the ability and efficiency of BugMiner's guidance by comparing it with AFL, AFLFast, and QSYM. In this evaluation, we calculate and compare the average time of BugMiner and baseline approaches to reproduce triggered bugs in the LAVA-M dataset. The LAVA-M dataset consists of four buggy programs md5sum, uniq, who, and base64.
We selected this dataset because it contained many complicated bugs, and also most of the efficient fuzzing tools to evaluate the performance of their proposed approaches by testing this dataset. Additionally, Table 2 demonstrates the LAVA-M dataset details. We select two bugs for each PUT to reproduce these bugs. We then executed BugMiner and the above-mentioned four fuzzers for three hours with the same seeds. Reported numbers illustrated in Table 3 are the average result of eight measurements. The first column indicates the dataset programs, and the second column shows TTE results, which calculates the duration of the fuzzing time until the specified vulnerability occurs. As can be seen from Table 3, BugMiner detected all target bugs faster than other competing fuzzers, except bug 1 and bug 2 of who. In a word, BugMiner is more efficient due to the fast test input generation compared to other state-of-the-art fuzzers, and it serves to increase time efficiency.

BugMiner vs. Directed Fuzzers
In this evaluation, we demonstrate the comparison results between our implementation and the state-of-the-art guided greybox fuzzing tools [11,17,29]. We reproduce a number of benchmarks covered by AFLGo and Hawkeye to represent the fares of Bug-Miner in a traditional directed setting. The CVE vulnerabilities that are utilized in this evaluation are given in Table 4. Note that the source code of Hawkeye is publicly unavailable, and, hence, we compare other fuzzers against results provided by Hawkeye authors. Furthermore, we also calculated the time execution spent on preprocessing analysis in each DGF to prove the efficiency of BugMiner's static analysis.  Table 5 shows a comparison of AFLGo, Hawkeye, ParmeSan, and BugMiner on bug reproduction of known vulnerabilities in OpenSSL, LibPNG, and Binutils. The first column indicates CVE identification number, the second column shows the fuzzing tool, and the next one is the number of fuzzing executions that successfully activated bugs. The TTE values are stored in the fourth column, and the last column presents the time consumed in preprocessing analysis. We repeatedly experimented with each CVE 20 times and illustrated the average results. In addition, each experiment ran for eight hours.
As indicated in Table 5, BugMiner shows better performance in all bug reproduction cases than the state-of-the-art directed fuzzers. It also performed 3.1, 4.3, and 1.9 times faster in the bug reproduction than Hawkeye, AFLGo, and ParmeSan, respectively. As shown in the last column, BugMiner spent less time on preprocessing analysis compared to other fuzzers. To sum up, BugMiner also highly decreases the preprocessing time by proposed BranchPruner, ShortestPathFinder methods, and improves the bug hunting performance.

Vulnerabilities Exposure
The directed fuzzing is the most prevalent tool that attempts to detect the bug in suspicious locations that could be vulnerable. To evaluate the capability of BugMiner's exposure to vulnerability in real-world programs, we opted for the most common software listed in Table 6 [35]. We allocate 8 h to run each experiment, and we continuously tested 10 times for each. Figure 5 illustrates TTE results in eight real-world programs. In this evaluation, we compare BugMiner's bug detection ability with AFL, AFLGo, AFLFast, QSYM, and ParmeSan. It is obvious from the figure that BugMiner is more effective than other fuzzers as it detected a vulnerability in the cert-basic program that other software testing tools failed to find. In addition, our implementation spent less time to reach the specified target than other fuzzers.
In addition, to highlight the accuracy of BugMiner, we calculated the false positive rate (FPR) and the false negative rate (FNR) based on the FPR and FNR calculation method of FuzzGuard [45]. Figure 6 illustrates accuracy results of BugMiner on eight real-world programs. The average FPR and FNR of BugMiner for all specified targets are 0.89% and 0.021%, respectively. The reachability accuracy of test-inputs that produced by BugMiner is 99.18% on average. According to the FuzzGuard [45] paper, it achieved 98.7% average accuracy by filtering out unreachable test-inputs. If we compare the accuracy of both techniques, we can see that our method has 0.48% slightly higher accuracy than FuzzGuard.    The reasons for the high accuracy of BugMiner and the low FPR, FNR are as follows. In general, if the number of constraints or complex nested checks in the path from the entry point to the specified target site is significantly higher, the reachability accuracy of the generated test-inputs will be less. In BugMiner, solving constraints and complex nested checks using TOCE ensured high accuracy. More precisely, the proposed TOCE generates productive test-inputs through a constraint solving engine and provides the fuzzer with these test-inputs. In addition, using BranchPruner data from static analysis, TOCE restricts exploring branch addresses that do not lead to the specified target site and focuses only on the path where the specified target is located. As a result, TOCE generates reachable test-inputs.

Replies to Research Questions
Based on the experiments illustrated in Tables 3 and 5, and Figure 5, they enable us to respond to the research questions given above.

1.
We believe that Bug Report Analyzer and BranchPruner methods are worth applying. As we can see in Table 5, the BranchPruner method is a better option considering the time cost while fuzzing. To be more precise, AFLGo, ParmeSan, and BugMiner spent an average of 28.6 min, 17.2 min, and 9.4 min, respectively, for preprocessing analysis. The proposed ShortestPathFinder reduces preprocessing time. Instead of making a specified target database manually, retrieving vulnerable functions from the bug reports automatically by utilizing the machine learning technology not only improves the performance of directed testing but also reduces preprocessing time.
The dynamic analysis methods utilized in BugMiner are highly effective. It is proven that BugMiner outperforms other fuzzing tools in all experiments we conducted.
In particular, the experiment involving comparisons with the DGFs. Table 5 and Figure 5 indicate that the combination of fuzzing, TOCE, and input prioritization methods increase BugMiner's speed, which gives an advantage over DGFs, hybrid fuzzer, and greybox fuzzers. In addition, BugMiner achieved 99.18% accuracy on average.

4.
Based on the results in Figure 5, we believe that BugMiner affords an opportunity to achieve specified targets rapidly. In addition, our implementation is scalable and can analyze both coreutil programs and real-world programs.

Conclusions
In this research, we propose a novel target-oriented hybrid fuzzing tool named Bug-Miner that combines fuzzing and TOCE that enhances the directed fuzzing process. We also propose a set of novel approaches to overcome the path explosion problem of CE and improve the effectiveness of the bug hunting process. More precisely, we implemented the machine-learning-based module Bug Report Analyzer to build a specified target database automatically, and it successfully extracted all 450 CVE unsafe functions. In addition, we reduced the preprocessing time using the BranchPruner method to 19.2 and 7.8 min compared to AFLGo and ParmeSan. In addition, we proposed the Input Prioritization module that categorizes the test-inputs to trigger the deeply hidden vulnerabilities.
We validated the scalability of BugMiner by carrying out several experiments with different datasets and real-world programs. It is obvious from experimental results that Bug-Miner is considered more effective than state-of-the-art DGFs, such as Hawkeye, AFLGo, and ParmeSan, in bug reproduction and preprocessing analysis. Particularly, BugMiner detects software vulnerabilities with considerably lower TTE than state-of-the-art fuzzers. In the bug reproduction, BugMiner performed 3.1, 4.3, 2.9, 2.0, 1.8, and 1.9 times faster than Hawkeye, AFLGo, AFL, AFLFast, QSYM, and ParmeSan, respectively. We believe that the approaches proposed by BugMiner, such as Bug Report Analyzer, BranchPruner, Input Prioritization, and TOCE can improve the performance of the software testing process. In the future, we aim to optimize BugMiner with deep learning methods to generate more efficient seeds that can increase the code coverage.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: