1. Introduction
Fuzzing is an efficient and effective testing method by generating numerous inputs to reveal the vulnerabilities in the software-under-test (SUT). Recent efforts have been seen to port one of the most popular fuzzing tools AFL [
1] to fuzzing Java programs. Different from binary programs, Java programs runs on Java Virtual Machine (JVM) and every public method can be tested directly with a driver class to provide basic runtime environments. Normally, the software developers write driver classes to test certain functions in the method-under-test (MUT). Their hand-written driver classes are filled with constant inputs that are only able to exercise limited paths, leaving a large part of the program not tested. AFL-based Java fuzzing tools [
2,
3] solve the coverage problem by making use of fuzzing techniques to generate inputs to exercise more paths. However, these tools fail to address the problem of automatic driver class generation. Both Kelinci [
2] and JQF [
3] rely on driver class written by testers to direct testing. This makes them not convenient for testing large-scale software. In addition, the AFL-based fuzzing tools employ files to store input, and this hinders the generation of the driver class for ordinary methods not processing files by requiring additional statements to converting the input file to correct variables.
Our goal is to build driver classes automatically for AFL-based Java fuzzing tools. Except for providing basic runtime environments for the MUT, the generated driver class should be able to mutate the status of the class instances as well as method parameters so as to exercise all paths in the MUT with the input file generated by the fuzzing tool. The status of the class instance is decided by the fields in the class, and some class fields can only be modified through invoking methods that change the method. Thus, for each MUT, the driver class should contain method sequences to change the instance’s status, statements to prepare runtime time environments for the method sequences and statements to parse the data from the input file.
We face the following challenges: the first challenge is how to build method sequences for a target method. In Java, both member fields and method parameters can affect the branch statements. The member fields declared with keyword private and protected can only be modified by member methods. We need to know what member fields are accessed by the MUT and what methods can modify them so that we can build method sequences that are able to change the status of the instance. The second challenge is how to build instances to make the method sequences work. Instances need to access member fields and invoke member methods. To instantiate classes defined in SUT as well as built-in classes properly, we need to have the knowledge of what methods can be used to create instances. The last challenge is how to handle the input file. AFL-style fuzzing tools employ file to store input, so the methods in the driver class should process the input file, and prepare basic runtime instances for the target method with data extracted from the file.
We design and implement JDriver [
4], an automatic driver class generation framework for AFL-based fuzzing tools. It employs dependency analysis to build method sequences that can modify method parameters as well as the field values. It collects knowledge and uses it to instantiate classes. JDriver supports making driver classes for general methods with input-file oriented driver class assembling methods, which can handle different method parameters properly. To summarize, we make the following contributions:
- (1)
First study on automatic driver class generation for AFL-based Java fuzzing tools. To the best of our knowledge, we are the first to study how to make driver classes for AFL-based fuzzing tools.
- (2)
A novel approach to automatic driver class generation based on dependency analysis. The approach consists of a dependency analysis based method to make method sequences, a knowledge assisted method to generate class instances and an input-file oriented method to assemble driver classes.
- (3)
An open framework for driver class generation. We implement JDriver, an open framework that aims to support driver class generation for different purposes. Evaluation results show we are able to generate 99 driver classes containing 422 driver methods for common-imaging.
The remaining paper is organized as below:
Section 2 introduces related works.
Section 3 describes our approach,
Section 4 depicts the implementation, and
Section 5 shows our evaluation results. We illustrate our thoughts on future work in
Section 6, and conclusions are given in
Section 7.
3. Approach
3.1. Overview
More than providing a basic runtime environment to invoke target method and determining the execution result, AFL-based fuzzing tools require their driver class to change the instance status as well as method parameters to exercise more paths with the file generated by them. To reach this goal, we need to build method sequences that are able to change the instance status and make class instances for the method sequences with data resolved from the input file generated by the fuzzing tools.
The method sequences are designed to explore all the branches in the method under test. In Java, the branch statements may contain values derived from method parameters as well as fields. Thus, we need to get the knowledge of which fields are accessed by the method. The method sequences should contain methods to modify the accessed fields. We put forward
dependency analysis based method sequences generation, which employs static analysis to extract dependency information (
Section 3.2) and build method sequences according to the dependency analysis results (
Section 3.3).
The method sequences require class instances to make them work. Instances can be generated from various sources: constructors and factory methods are the most common ways. However, some special classes are not easy to find proper methods to get instances, e.g., built-in classes that require additional helper methods. We put forward
knowledge assisted instance generation, which builds knowledge through collecting method information in the SUT as well as the user’s programming knowledge. A Helper Class is also generated to store the methods used for creating class instances (
Section 3.4).
AFL-based fuzzing tools generate files to store input data. This requires our driver method to interpret the file and make instance with the interpreted data. We propose
input-file oriented driver class assembling, which operates differently on methods processing files and ordinary methods not processing files. For ordinary methods, we assemble all the method parameters and make statements to recover typed values for different method parameters (
Section 3.5).
3.2. Dependency Information Extraction
In Java, public member fields can be modified directly, while private/protected member fields can only be modified by member methods. In addition, some methods may access fields directly, while some methods access fields indirectly through invoking other methods. Thus, method calls should be taken into consideration for dependency information extraction. In our approach, we extend method call graph and define two directed graphs, Access Graph and Modify Graph, to store the dependency information. The vertexes are either methods or fields, and the edges in the graph are either method calls or accessing/modifying operation. Specifically, in Access Graph , edge from method to method indicates method invokes a call to method , edge from method to field indicates method accesses the field . In Modify Graph , the edge from field to method indicates that it is modified by method , and the edge from method to method indicates that method is invoked by method .
We use static analysis to extract dependency information. Algorithm 1 illustrates how Access Graph and Modify Graph are built. The algorithm begins with initializing accessGraph and modifyGraph with the methods and fields in the class (line 2–11). Afterwards, it loops over all the methods and walks over all the instructions in methods to build the graphs (line 12 to 29). If instruction inst is a method call instruction, we resolve its call target callee, and add an edge from method to callee in accessGraph. Differently, we add edge from callee to method in modifyGraph. If instruction inst is a field related instruction, we retrieve its target field. An edge from field to method is added to modifyGraph if it is a field-write instruction (lines 21–22 ) while an edge from method to field is added to accessGraph if it is a field-read instruction (line 24–25). In this way, both the method call relationship and the relationship between method and field are written to the two graphs.
Algorithm 1 Analyzing dependency |
- 1:
procedureanalyze() ▷ is class under test - 2:
- 3:
- 4:
for in do - 5:
- 6:
- 7:
end for - 8:
for in do - 9:
- 10:
- 11:
end for - 12:
for in do - 13:
▷ stores all the instructions in the method - 14:
while is not empty do - 15:
- 16:
if is method invoke then - 17:
▷ resolve - 18:
- 19:
- 20:
else if is field-write operation then - 21:
▷ is the target of - 22:
- 23:
else if is field-read operation then - 24:
- 25:
- 26:
end if - 27:
▷ remove from the instruction set - 28:
end while - 29:
end for - 30:
end procedure
|
Regarding the graph theory, we come to the following two theorems:
Theorem 1. Method accesses field if and only if the two nodes and are connected inAccess Graph.
Proof of Theorem 1. There are two situations in which method accesses field : direct and indirect access. In the direct situation, according to our definition of Access Graph, the direct access will be represented as an edge from to which means and are connected directly. In the indirect situation, method accesses field indirectly through method calls, which means that there are call sequences from method to method and method access field . Method accesses field directly, so and are connected (1). The call sequences from to indicate that there is a path from to which indicates that and are connected (2). Combining (1) and (2), and are connected. Thus, if method accesses field , the two nodes are connected in the Access Graph. Reversely, if and are connected, there is a path between and . If the length of the path is 1, it means the method access field directly. If the length is bigger than 1, there are more than two vertexes in the path, namely , ..., . The vertex and are connected directly meaning method access field directly (3). The path from to indicates that there are call sequences from method to (4). Combining (3) and (4), we get that method accesses field indirectly. Thus, if and are connected, method access field . ☐
Theorem 2. Method can modify field if and only if the two nodes are connected inModify Graph.
Theorem 2 can be proved in the same convention of Theorem 1. Theorem 1 explains how we can find the member fields accessed by the given target. While Theorem 2 provides us with a way to find the member method that can modify target member fields. We use to represent the set of fields that are accessed by method m and to be the set of methods that can modify field f. Actually, is made up of all the field nodes that are connected with the specified method m and is made up of all the method nodes that are connected with field f.
3.3. Method Sequence Building
The method sequences are used to modify the status of the instance. Apart from invoking methods to change target fields, the public fields can also be changed by assigning values directly. Thus, we extend method sequences to include field to indicate that the field can be modified directly.
We build method sequences on dependency information. For the MUT, we can get its accessed fields set with Access Graph, and the Modify Graph assists us with retrieving a set of methods that can modify the target field. Algorithm 2 illustrates how we build method sequences with Access Graph and Modify Graph. An empty array ms is initialized to store method sequences. For the given method mut, we first resolve the mut’s accessSet (line 2). Then, we build the method sequence incrementally by iterating over the (line 4–10). For every field in the , we retrieve its and add chosen items to ms (line 5–9). The item is returned by the select procedure, which is used to define the policy of how we select methods to build method sequences.
Policy to select method. As static analysis is conservative, the extract dependency information may not be accurate. Thus, we need to implement different policies to get better performance. In our implementation, we design a policy to prioritize the field item and select the method whose method parameters are simplest to make. In the procedure select, we first check if the field is public and its type is primitive. If it is, we return it directly. If not, we examine whether the existing methods in the method sequences can modify the target field (lines 16–20). If such methods exist, null is returned to avoid duplicate modification. If not, we sort the method in the (line 21) and return the first method (line 22). In our case, the methods are sorted by the simpleness of method parameters, which is measured by the number of primitive parameters in the method.
3.4. Knowledge Assisted Instance Generation
Normally, instances are created by the constructor of the specified class. In addition, factory methods that make instance as its return value can also be used to generate instances. However, for built-in classes provided by The Java Platform, Standard Edition (Java SE), e.g., String, it is not easy to find proper constructors or factory methods, they require additional methods to make instances. We name these methods to create instances as knowledge. Our knowledge assisted instance generation method builds knowledge through collecting related methods for the SUT as well as making methods from the users’ knowledge. As instance are frequently used in driver class, we build a Helper Class to store all the instance generation methods.
Algorithm 2 Building method sequences |
- 1:
procedurebuildMethodSequence(,, )▷ is method under test, and are used to store dependency information - 2:
- 3:
- 4:
for in do - 5:
- 6:
- 7:
if is not null then - 8:
- 9:
end if - 10:
end for - 11:
return - 12:
end procedure - 13:
procedureselect(, , ) - 14:
if is public and primitive typed then return - 15:
end if - 16:
for in do - 17:
if then - 18:
return null - 19:
end if - 20:
end for - 21:
- 22:
return - 23:
end procedure
|
Collecting instance generation methods. We define a type table to store factory methods and class constructors for the SUT. The type table uses class type as the key, and the value of the key is a set of methods. We build type table by walking all the methods in the SUT as illustrated in Algorithm 3. For each method in the SUT, we first resolve its return type returnType (line 5). Afterwards, we decide if returnType has already existed in the type table. If it has, we add it to the corresponding method set (line 7). If it has not, we create a new set, and put it into the new set, and add an item to the type table (lines 9–11). Apart from SUT, we also build type table for its dependent libraries.
Algorithm 3 Building type table |
- 1:
procedurebuildTypeTable() ▷ is the software under test - 2:
- 3:
for in do - 4:
for in do - 5:
- 6:
if then - 7:
- 8:
else - 9:
- 10:
- 11:
- 12:
end if - 13:
end for - 14:
end for - 15:
end procedure
|
Knowledge for built-in classes. We add knowledge for classes provided by the Java Platform. Specifically, we cover most of the classes defined in the
java.util package, which contains the container classes such as
Set, the
java.lang package, which defines classes that are fundamental to the design of the Java programming language such as
String, and the classes in
java.io package, which contains classes to handle system input/output [
26].
Figure 1 shows two sample knowledge methods. Method
get_String returns a
String instance which comes from the input parameter
arg0. Method
get_File returns a
File instance, which is created by the
new expression with the method parameter
arg0.
Building Instance Helper Class. Instantiating class instances are frequently used during testing. To avoid generating it repeatedly, we build an Instance Helper Class to handle the generation of instances. Algorithm 4 shows the building process for Instance Helper. It starts with initializing typeSet to include all the buildable classes (lines 2–7). Afterwards, it builds instance helper methods with buildInstanceHelper (line 8). When the processing finishes, InstanceHelperClasses assembles all the methods, and adds miscellaneous codes to build a compilable Helper Class (line 9). In the buildInstanceHelper method, it initializes unprocessed as a copy of typeSet, and then it walks over all the types in typeSet to build helper methods (lines 13–18) with buildType. Procedure buildType builds helper methods for each type and returns the number of generated helper methods. If buildType builds more than one helper method successfully, then the type is removed from unprocessed (line 16). Since some class constructors rely on other classes, it is necessary to build helper class recursively to cover these classes (lines 19–20). In buildType, it firstly resolves the methodSet for the given type type. Then, it iterates over all the methods in the methodSet to build helper method (lines 28–34). Each time a helper method is generated, it is appended to InstanceHelperClasses. Note that, in order to simplify the method inputs, our helper method only employs primitives or String as method parameters.
3.5. Input-file Oriented Driver Class Assembling
For a class-under-test (CUT), we build driver method separately for each public method and assemble the driver methods into a driver class. AFL-based fuzzing tools generate files to store the input data. If the MUT processes files directly, we can pass the file directly as a method parameter. However, in most cases, the methods don’t do so. For these methods, their driver methods need to process the input file and present the data to make variables for the methods. Our input-file oriented driver class assembling method works differently on ordinary methods not processing file and methods processing file.
Testing if method processes file. As file processing methods use built-in classes such as File to handle files, we design the following heuristic to determine whether the method processes files directly: (1) the method parameters contain file related class instances such as File; and (2) there is a String typed method parameter which flows to a file opening method.
Ordinary methods not processing files. Our driver method starts with extracting the input file to a byte array, and then it resolves the values for the method parameters sequentially from the byte array. Algorithm 5 shows the building process. Method makeStatements begins with declaring a variable position to mark the position in the byte array, and it iterates over the items in the method sequences to make statements. If the item is a field, it makes an assigning statement directly with makeField (lines 35–40). If it is a method, it makes statements to declare variables as well as the statement to invoke the method with makeMethod (lines 11–34). For each method parameter, method makeMethod applies different rules according to their types. There are two categories of types in Java: primitive types and reference types. (1) Primitive types. Primitive typed data has fixed sizes, and we can make it directly from the input bytes. Method makeVariableStatement generates statements like this: int a = Helper.getInt(inputs, position). (2) Reference types. Class types and array types are two reference types. For class types, we firstly get a helper method from InstanceHelperClasses (line 18). If helperMethod is not null, we make statements for the helper method (line 20). For array types, if it is primitive array, we resolve its element type (line 25), and its array length (line 26). If the length is not specified, we will use a random number to replace it. We make statements with its element type etype (line 27). If we can’t get a proper or the array is not a primitive array, we make statements with built-in knowledge (lines 21, 28). Method makeMethod ends with making statements to invoke the (line 32), and returning the to avoid retrieving bytes from the same position (line 33). Method makeField works similarly as the primitive type in makeMethod.
Algorithm 4 Building instance helper |
- 1:
procedurebuild() ▷ is the type table generated for SUT - 2:
- 3:
for in do - 4:
if then ▷ test if the type buildable - 5:
typeSet.add(type) - 6:
end if - 7:
end for - 8:
buildInstanceHelper(typeTable, typeSet) - 9:
InstanceHelperClasses.write() ▷ save the generated helper methods to file - 10:
end procedure - 11:
procedurebuildInstanceHelper(, ) ▷ is the set of class to build - 12:
- 13:
for in do - 14:
- 15:
if then - 16:
unprocessed.remove(type) - 17:
end if - 18:
end for - 19:
if then ▷ test if future test is necessary - 20:
▷ building unprocessed types - 21:
else - 22:
return - 23:
end if - 24:
end procedure - 25:
procedurebuildType(, ) ▷ is the type for building - 26:
- 27:
- 28:
for in do - 29:
▷ build a helper method for - 30:
if not null then - 31:
▷ save to - 32:
rtn += 1 - 33:
end if - 34:
end for - 35:
return rtn - 36:
end procedure
|
Method processing files. For these methods, we identify which method parameter is used to specify the filename, and then present the file path to the method directly. If there exist other method parameters, we use random generators to generate values.
Algorithm 5 Making statements to recover method parameters |
- 1:
proceduremakeStatements() ▷ are the generated method sequences. - 2:
▷ is used to mark the position the byte array - 3:
for in do - 4:
if is method then - 5:
position = makeMethod(, ) - 6:
else - 7:
position = makeField(, ) - 8:
end if - 9:
end for - 10:
end procedure - 11:
proceduremakeMethod(, ) - 12:
- 13:
for i in do - 14:
if i is primitive then - 15:
makeVariableStatement(i, ) - 16:
+= Type.getSize(i) - 17:
else if i is class type then - 18:
- 19:
if is not null then - 20:
position = makeMethod(, ) ▷ make statements for - 21:
else makeStatementWithKnowledge(i); - 22:
end if - 23:
else if i is array type then - 24:
if i is primitive array then - 25:
▷ is the element type of the array - 26:
- 27:
position = makeArray(, , ) - 28:
else makeStatementWithKnowledge(i); - 29:
end if - 30:
end if - 31:
end for - 32:
makeMethodStatement() - 33:
return - 34:
end procedure - 35:
proceduremakeField(, ) - 36:
- 37:
makeVariableStatement(, ) - 38:
+= Type.getSize() - 39:
return - 40:
end procedure
|
6. Discussion and Future Work
Instance Generation. Although JDriver has generated hundreds of driver methods for commons-imaging, it fails to make correct instances for the following classes: (1) interface classes. Class ImageFormat is an interface, which should be initialized through classes that has implemented this interface. However, JDriver has no knowledge for generating class instance for interface, thus it fails on interface classes. (2) classes containing types not covered in knowledge base. The constructor of ByteSourceArray has String and byte[] as method parameters, but we missed the byte[] type in our helper class. This makes JDriver fail to generate instance for class ByteSourceArray. (3) classes whose constructor involves multiple String objects. The constructor method of class ByteSourceInputStream takes an InputStream and a String typed parameters. Our algorithm detects that the helper method get_InputStream can be used to make instance for class InputStream and method get_String can be used to make String instance. However, it skips the methods that use multiple String objects because the driver code only has one String object as input, so it fails to make ByteSourceInputStream instances. In addition, some instances we make are meaningless. For example, method getBufferedImage in class verb|JpegImageParser| accepts a HashMap instance and uses it to store items. Normally, we need to initialize an non-empty HashMap. However, if we assign a null to that parameter, it may continue the execution but won’t reach our target branch. To summarize, we need more smart knowledge to build correct instance. In the future, we will continue working on: covering advanced Java features like subclassing, interfaces into knowledge base, covering more built-in classes, restructuring helper methods, and learning knowledge from the source code of the SUT.
Fuzzing Scheduling. Actually, fuzzing every method in Java programs is neither plausible nor necessary. (1) indirectly accessible methods should be skipped. Java puts access control attributes in every method, and methods declared with public and private can’t be accessed directly. In addition, abstract classes can’t be instantiated. These codes should be skipped. (2) methods that don’t produce errors should be excluded. In Java, some methods are used to do simple work, they have single branches and never throw exceptions. For example, the getter methods only contain a single statement to return a field value; they can never produce exceptions. (3) methods that have been exercised are not necessary to test alone. Some methods are repeatedly implemented when fuzzing other methods; testing these methods is useless and wastes a lot of time. A proper way to schedule methods for fuzzing is to use program analysis techniques to identify the methods listed above. Static analysis can help identify the methods that have no branches and won’t produce exceptions. Dynamic analysis techniques could track the execution of the methods and find out what methods have already been used.