1. Introduction
Automated Program Repair is a rapidly developing field and has been gradually drawing increasing attention from people. It is a common understanding in academia-industry to eliminate program defects as soon as possible. The process of automatic repair includes defect location, candidate patch generation and patch validation. The program repair method obtains the suspected error locations through the defect location technology, and generates the candidate patches. Finally, the patched program is verified by the test set, and it is manually judged whether to accept the repair patches. Program defects are by-products of the software development process that cannot be eliminated. According to statistics, the program contains an average of 3–10 defects per thousand lines of code after compilation and traditional software tests. Static analysis tools are effective means to detect program bugs; alarms are detected by current static analysis tools that include real defects and false positives. The false-positive rate is between 35–91%, and the confirmation of the alarm and the repair of the defect requires a lot of manpower. According to the repair strategy, automatic repair is divided into two categories. One is semantic-based program automatic repair [
1,
2,
3,
4], and the other is search-based program automatic repair [
5,
6,
7,
8,
9]. Nguyen et al. [
1] proposed SemFix, which uses Tarantula technology to detect defects and uses program synthesis to generate repair statements. However, SemFix is not enough to repair multiple lines of code errors. Le et al. [
5] proposed that GenProg uses a test suite to encode defects. The results show that it fixes 55 of 105 defects, and Qi et al. [
6] found that only two of the 55 patches is semantically correct.
A large number of false positives will appear in the defect report generated by the defect-detection tool. High false-positive rate will consume a lot of energy in manual confirmation, and increase the probability of introducing new errors when eliminating false positives, thus greatly increasing the difficulty of automated program repair. Alarms include bugs and false positives. For real bugs, we want the repair tools to automatically repair the program without human involvement and achieve the desired function. For false positives, it is hoped that the defect-detection tools can ignore them, then smoothly execute the following report without changing the function and semantics of statements related to false positives. False alarms can cause incorrect repairs in defect-detection tool [
10], which will bring unnecessary semantic effects and unexpected semantic changes, making the program unable to complete the specified behavior within the specified time. At the same time, after repairing false positives, the execution path of program may be changed, which makes the program violate the initial expectation of developers. New bugs may also be introduced during the repair process, resulting in secondary rework by developers, increasing the development time significantly and delaying the project’s final completion date.
To avoid manual determination of alarms, realizing the full automation of manual repair, we propose a strategy of automatic program-semantic defect repair and false positives [
11] elimination without side effects. Based on the safe constraint mechanism, semantic defect patches of Java programs are generated, and DTSFix, a program defect repair tool without side effects, is developed. The repair tool does not need to manually confirm defect report, and complete repair before the real defect triggers. For false positives, the tool can be successfully implemented to eliminate false positives without manual confirmation. The results show that the success rate of the tool for alarm repair is 58.13%, which can correct the real defects, reduce the risk of false positives, and realize the full automation of manual repair.
The second section of this paper introduces the research progress of automatic repair methods. describes program-semantic defect,
Section 3 specifies the relevant states specifications of program points, and concretizes program point states in combination with the control flow diagram.
Section 4 summarizes the impact of side effects [
6,
12] and proposes automatic program repair without side effects.
Section 5 describes patch synthesis based on safe constraints.
Section 6 uses repair methods without side effect to repair semantic defects in five projects, and evaluate the performance of the method for real defect and false-positive repair.
Section 7 summarizes the article.
3. Program-Semantic Defect
This paper mainly studies the most typical and common semantic defect repairs related to program state, such as Null Pointer Dereference (NPD), Illegal Calculation (IAO), Resource Leak (RL), Uninitialized variable (UVF), Out of Bound (OOB) and so on.
There is an example of a null pointer dereference, it is taken from the BorderArrangement.java file of the open-source project JFreeChart.
Figure 1 is an example of a null pointer dereference. The L1 line declares that
content may be empty after the L3 line, and it dereferenced directly on lines L5 and L6 causing NPD.
To fix defects and eliminate false positives more accurately, it is necessary to further deepen the understanding of program. Below we will describe the program symbolically. A program P can be represented by a six-tuple <V, S, L, →, entry, exit>. V represents a set of program variables, S represents program statements, L represents program points, → represents program migration status, entry represents the entry of program, and exit represents the exit of program.
It is reliable to represent the migration of each state under the program-semantic system of abstract interpretation. The description of safe constraints and program state is given below.
Definition 1. (Control Flow Graph CFG): It is a directed graph that reflects the logic control flow of a program. The possible flow of block execution CFG is a directed graph . N represents the node sets (a basic block corresponds to a node in the graph), and the edge sets may be executed immediately after }, entry and exit represent the program’s unique entry and exit, respectively.
Each statement in program represents a node in CFG, and the nodes represents a linear block codes with no jumps or jump targets. Program points are different from nodes. There are program points before and after nodes in control flow graph, and the edge from one node to another can be represented as a path.
Definition 2. (Path): Paths can be represented by node-to-node edge sets in the CFG, represents the path from node to node , represents the set of paths from node to node .
For example, in
Figure 1, L2 to L3 is a true branch path judged by
if, which is expressed as
. The path of the L5 to L6, expressed as
. The set of paths from L2 to L6 is expressed as
.
Definition 3. (Program State): program state is a function from program variable to value. The set of active variables at program point l is , l ∈ L, = <v,> represents the binary of active variables v and the variable value , :, ,⋯ represents the states of program point l , i.e., the set of all the active variables and their values at the program point.
A program has one start state, one error state, and several intermediate states. and while represents the intermediate state. Except for , the rest of the program states can be called security state , i.e., the program in a safe state does not have defects. The start state of program . For example, the possible state of a program at the program point L1 in the example above is . The automatic detection tool reports an error (which is actually a false positive) at the point L7, the possible program state is
Definition 4. (Safe Constraint): We propose safe constraint with help of the definition of domain. Defines the safe range and constraints of a variable, function, or expression, that is, a limited set of constraints on them in a program. Variable values in the safe constraint means that the current value of the variable meaningless. represents safe constraints at program point l active variables v. V if , i.e., if the set of values and the set of security constraints do not intersect, the program will not generate illegal memory access defects [15]. Conversely, , i.e., the set of values and the set of security constraints do not intersect, the program will cause a defect at l. For example, in the above false-positive example, the safe constraint of the variable content in line L6 is = {content|content = null} , and the value set is = {content|content ≠ null}. In line L5, Add non-empty assertions to content, the set of value variable in the L6 line, i.e., it is impossible to take the value to be null.
Definition 5. Definition 5(Program-Semantic Defect): Program-semantic defect is the program state does not satisfy the requirements of the security attribute of the program, causing abnormal execution of the program, and even causing abnormal termination of the program. represents a program-semantic defect occurring at the program point l, if variable , the program will migrate to , and the defect-detection tool will report a defect at l, i.e., if value of variable intersects with safe constraint of variable, defect will be caused. For example, NPD will triggered when the defect-related variable is null before dereferenced.
4. Automatic Program Repair without Side Effects
The existence of defects will affect program quality seriously, but the existing program automatic repair methods and tools [
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27] cannot guarantee that the repaired program is correct. For example, the automatic repair tool GenProg [
5] repairs the defect in
Figure 2 mistakenly.
Figure 2a is a buggy program, and
Figure 2b is side-effect repair from GenProg. In
Figure 2a, when
, line 206 will report
array index out of bounds. Genprog updated line 206 to
. If
ars is not defined, it will be assigned the default value of Scriptable class. However. The original program needs to get the corresponding subscript value of String array, so the modified semantics is inconsistent with the target semantics of the original program. This repair has side effects. Side effects not only fail to achieve goal of automatic repair, but also increase amount of manual confirmation of patches.
We summarized four situation of side effects as below:
The fix method uses conditional constraints extracted from test sets as program specification that determines the quality of the candidate patches, resulting in semantic bias and producing a large number of suspected correct patches. However, due to the lack of distinction in the program specification, the correct patch cannot be generated, and the potential error patches cannot be further distinguished. Particularly complex repair operations are more likely to modify correct semantics of the original program, increasing the probability of introducing new bugs.
Figure 2a was repaired with no side effect method, and the results are shown in
Figure 3. In
Figure 3, the value space of
getshort(icode,pc+1) is limited. The safe constraint of
getshort() is
. Following the path <206,207> through the
if statement, the state on line 206 is
, which value is within the safe constraint, so
, Does not migrate to
. The program does not violate the semantics of the original program because it still gets the same value as the original program expected at line 208, so
Figure 3 is correct repair. Moreover, the execution function before and after repair is consistent, and the state change is within the safe range of equal program semantics, which is a repair without side effect.
The value space of
is limited in
Figure 3, The safe constraint of
getshort() is
. Following the path <206,207> through the if statement, the state on line 206 is
, which is within the safe constraint, so
In order to avoid the program migrating to
, and the program does not violate the semantics of the original program because it still gets the same value as the original program expected at line 208, so
Figure 3 is correct repair. Moreover, the execution function before and after the repair is consistent, and the state change is within the safe range, which is a repair without side effect.
We used test set-based test equivalence [
28] to verify the repair effect. If two programs have the same result for the same test sets, they are test-equivalence relations based on the test sets.
Definition 6. (Equivalent Relations Test): ∏ represents a set of procedures, T represents a set of tests, if , then , either both pass or fail T, at this time, a set of equivalence relations about T (reflexive, symmetrical, transitive) is test-equivalence.
Consider three different programs in
Figure 4, and insert the highlighted statements into different locations in the program corresponding to
Figure 4a–c. Our algorithm determines that
Figure 4a is test-equivalence to
Figure 4b. The test equivalent relationship is
. First, they only during test execution on the right side of the assignment. The highlighted statements take the same value during the test execution. Secondly, it can be determined that
Figure 4b,c are also test-equivalence relations. Because they insert the same assignment at different program locations, both locations are executed by the test, because the variables
b and
a are not overwritten/modified during test execution. We call this kind of test equivalent relationship
Finally, merge the two analysis results, according to transitivity
Figure 4a,c are test-equivalence relations.
Test-equivalence relations can be divided into two categories: value-based test-equivalence ; dependency-based test-equivalence relation . We only care about the former in this paper.
Definition 7. (Value-Based Test-Equivalence Relation ):Two programs p and differ only in expressions, during the executions of p and with test T,
the expressions e and are evaluated into the same values. The relationship between p and is value-based test-equivalence, denoted as . We use definition function containing a set of expressions to represent the value-based test-equivalence relation: represents input state and represents output state.
To get a clearer understanding of value-based test-equivalence relation, the following examples are given. Consider a program p1 defined as:
And a test
, These programs are value-based test-equivalence relation for the test
t, since they differ only in the
if-condition and the following relation:
where the input state and the output state .
Based on the above equivalence relation, we will verify effect of repair without side effects. Due to the different repair strategies for real defects and false positives. The following is a description of repair method without side effects.
When real defect is repaired and any tests cannot trigger defects, if the same input has the same output, there is no difference in function. The specific description is:
p and are original program and patched program, respectively, i.e., the same input has the same output. Meanwhile, test-equivalence relation is satisfied, .
When a real defect is repaired and a test triggers defects, the test case caused an error in the program before repair, and the error of executing the same test no longer occurs after repair.
On the basis that defects do not occur again, is satisfied.
When eliminate false positives, tools report defects before repair, and they are no longer considered to be defects by after repairing.
First, The elimination of false positives first satisfies Equation (
1) and equivalence relation of test, secondly it satisfies semantic consistency. Semantic consistency in this paper means that state of the corresponding program point is same before and after repair. If two programs are test-equivalence and semantic consistency, this paper is considered equivalent as a program equivalence. The statement changes are recorded. After repair, program point closest to defect line that does not change is monitoring point
, and the corresponding position of this point is
a; the program point closest to defect line that does not change after statement is monitoring point
. Similarly, the position of
b is determined.
represents states of before repair, and
represents states after repair, comparing states of two monitoring points, if
=
&&
=
, after patched, does not change monitoring point state. Program records the execution path from
along data flow
, analysis point state. On the basis that the semantics of the monitoring point have not changed, if
, there is no semantic difference. At the same time,
, we call this repair no side effect of false positives.
Algorithm 1 describes the functional equivalent verification algorithm for false positives elimination.it first verifies whether functions are consistent before and after repair. If not, it does not satisfy functional equivalence and program equivalence. Otherwise, the control flow graph continues to be generated. Executing along the path of control flow from start state. Each execution updates state of relevant program points until end of program. Ensure that next node of path execution and program state at that node is consistent. If
is consistent and the function is equivalent before and after repair, it is an automated program repair without side effects.
Algorithm 1. The equivalent verification algorithm of false positives elimination function without side effects |
Input: Programs before and after repair |
Output: Program equivalent or not |
1 ; |
2 if (p(output)! = (output)) then |
3 return; |
4 Generate the corresponding control flow graph |
5 if |
6 while (n! = exit) do |
7 if ( == && ! = ) then |
8 k = a; |
9 if (! = && == ) then |
10 m = b; |
11 while (n! = exit) do |
12 if then |
13 if then |
14 flag = true; |
15 flag = false; |
16 update along the data stream |
17 if (flag == true) then |
18 return true; |
19 return false; |
5. Defect Repair and False-Positive Elimination Based on Safe Constraints
To achieve the goal of automated program repair without side effects, we propose automatic repair based on safe constraints. In this section, we formalize defect. On this basis, we combine safe constraints to synthetic repair conditions and add security transfer statements to repair program.
In this paper, automated program repair method based on safe constraints is proposed. An instance of a defect state machine is created according to the defect characteristics, and then a state migration operation is performed on the defect state machine. Our fix goal is to add statements so that program state is never migrated to . . If value of variable does not intersect with the restricted interval, it will not cause bugs, defect-detection tool also does not detect an alarm. Repair methods based on safe constraints are more granular and can avoid many suspected correct patches. The resulting patches are more targeted.
When patch is synthesized, defect reports of static detection tool is read to determine the defect-related variables. The synthesis conditions are determined according to safe constraint rules in
Table 1. Then, safe patches are generated according to synthesis conditions, as shown in Algorithm 2.
Algorithm 2. Patch Synthesis Algorithm Based on Safe Constraints |
Input: Defect file, Defect report |
Output: Patch for defect program |
1 ResultSet ← Defect Information from Defect Report order by File |
2 ResultSet// : Defect file; : Defect report; : Safe constraint. |
3 while (. hasDefect()) do{ |
4 ;//: Defect-related variables; |
5 ;// : Safe constraint rules; |
6 ;//: Patch synthesis conditions |
7 } |
8 return patch; |
The repair for NPD is shown in
Table 2. We determine the state at row where defect occurs, Defect occurs in
Class c = (Class) stream.readObject(), the state is
:{
:c = null || c! = null}. It is found that the point state is
. If the reference is dereferenced at this point, it will migrate to
. To avoid the state migration to
:{
content = null}, we need to add a safe transition condition. Then we determine the defect correlation variable by defect report, it is
c. The defect correlation operation is a reference to a potentially empty variable,
c corresponds to
in the security constraint rule in
Table 1.
. The repair synthesis condition is
, then safe constraint of
c is
, its synthetic condition is
. Adding repair statement
before the migration condition is satisfied, so that it does not satisfy conditions for migrating to
.
6. Experiment
To verify whether the method in this paper can realize defect repair and false-positive elimination without side effect, DTSFix is implemented based on static detection tool DTS (Defect Test System) [
29,
30], DTSFix detect locations of suspicious statements by DTS, and produce defect reports [
31]. Partial defect reports of JFreeChart detected by DTS are shown in
Table 3.
The defect report produced by DTS contains defect type, defect ID, the file where defect is located, variables associated with defects, start line of defect, occurrence line of defect, and description of defect.
Figure 5 illustrates an overview of the DTSFix approach. First, defect reports and buggy program are input. Then, according to safe constraint rules, patch conditions are synthesized and inserted into candidate patch locations. Finally, candidate patches are generated, and candidate patches are retested by DTSFix. If a patch does not produce new bugs and the original defect has been repaired, it is considered to be a correct patch. Otherwise skip and validate next patch.
Our evaluation aims to answer the following research questions:
To answer the two RQs, we use Defects4J dataset which is widely used as a benchmark for Java-targeted APR research. The dataset contains 393 bugs and their corresponding developer fixes. DTSFix attempts to fix 5 projects JFreeChart, commons-lang, Log4j, commons-math, commons-rhino and artificial confirmed the number of real defects, false positives and successful repair. The defect report of JFreeChart detected by DTS is shown in
Table 3. Results showed that a total of 855 alarms were taken as experimental objects by the five project, and detailed data such as false alarm rate were also given in
Table 4. Among them, the number of alarms was the sum of defects and false positives detected by DTS, and the number of successful fixes included repair of false positives and real defects.
Alarms is the number of alarms detected by DTS, including real defects and false positives. FP and RD are the number of false positives and real defects, respectively. FPA is the percentage of false positives in defect reports. NS is the number of successful repairs. DTSFix repaired 497 of 855 alarms correctly, successful repair rate is 58.13%. Manual validation time is 17.0 h. It can be seen that manual validation alarms take a lot of time, increasing workload of developers greatly and delaying progress of project seriously.
6.1. Performance in Repairing Alarms
To answer RQ1, we verify the correctness of the candidate patches for each bug. A patched program executes use cases that failed before, if it passed, regression testing is performed. If it also passed, we do not think the patch introduces any new bugs. Otherwise, skip and validate the next patch. DTSFix continues to generate and validate candidate patches over a specified period of time. After all patches have been validated, we collect plausible patches that can make the buggy program pass all test cases successfully, and then manually analyze patches to see if their semantics are equivalent to those submitted by developer. If semantics are same, the patch candidate will be considered to be a correct patch.
ID in
Table 5 are obtained from defect reports generated by DTS, each of them is unique. DTSFix generated plausible patches for 213 bugs in total among all the 393 real bugs, successful repair of real bugs accounted for 52.16% of the total number of alarms. The details of some bugs are listed in
Table 5, the total number of patches generated is affected by the size of error source files, and in some cases DTSFix generates multiple correct patches, for example, Math79 generates two correct patches and it needs to insert a statement to repair. However, both patches are the same as those submitted by developers, because the required statements can be inserted in two adjacent locations.
Time(s) shows that the time required for DTSFix to generate correct patches is between 2.88 and 32.59 s.
Table 5 shows that DTSFix generates three reasonable repairs for two defects.
Before repair in
Figure 6 refers to the number of real bugs, and
after repair is the number of real bugs confirmed manually by DTSFix, The number of real bugs before repair in the 5 projects is 393, and DTSFix is unable to repair the real defects of 47.84% (188/393), because 1/3 of the remaining 188 defects are defects with too complex calling relationship, such as pointer referring to the released complex data, and 2/3 are infinite loop.
To study the ability of DTSFix to repair false positives, we manually confirmed the effect of the repair.
before repair in
Figure 7 is the number of false positives before fixing,
after repair is the number of false positives after fixing. DTSFix eliminates 381 false positives, there are 81 false positives not eliminated. This is because some false positives are beyond the scope of DTSFix , such as a pointer to complex data type reference is released. Successful false positives have been eliminated and manually checked, there are no side effects.
Table 6 is DTSFix false-positive elimination instance, defect report shows that
may have illegal calculation on
, whereas line 215
determines that the expression cannot have illegal calculation on line 241. When DTSFix repairs, the security constraint of
on line 241 is
=
, adding repair statement
. The regression test cases are executed, this alert was not appear again in the defect report and no new bug was introduced, i.e., the repair is successful. Monitor point
=
,
=
,
=
,
=
.
The states of monitor point before repair:
:{ , .
The states of monitor points after repair:
:{ , .
Compare the states before and after repair, = && = , so the fix does not change program semantics. We observed before and after repair program execution paths, it is not hard to see . From what has been discussed above, the elimination of false positives is no side effects.
6.2. Validity of Patches Generated by DTSFix
We evaluate whether the patches produced by DTSFix are valid for automatic repair, and once DTS produces a list of suspect statements, the DTSFix repairs each suspect statement according to the security constraint rules. After the candidate patch is generated, it is applied to the buggy program. If it can pass all test cases successfully, the candidate patch will be considered to be a plausible patch. Furthermore, DTSFix stops looking for other candidate patches for the bug, otherwise the patch generation steps are repeated until all defect reports are processed. Specifically, we only evaluate the correctness of the first generated reasonable patch for each error. For all plausible patches, We further manually check these differences with the patches provided in Defects4J. If they have the same semantics as those provided by the developer, we called them correct patches, otherwise remaining plausible.
The results of DTSFix were compared to the seven latest APR tools, and the repair performance was evaluated with the Defects4J benchmark. The parts with a gray background color represent the results obtained by our method.
Table 7 shows the results of the ratio of plausible and correct patches. Overall, we found that DTSFix generates the correct patches for the 26 bugs in the Defects4J benchmark successfully. We provide
x/y number:
x is the number of bugs fixed correctly;
y is the number of bugs for a plausible patch. Accuracy (P) represents the accuracy of bug fixes. For example, 76.1% of DTSFix plausible patches are actually correct, while 63.8% and 62.9% of respectively ACS and SimFix plausible patches are correct. The data of GenProg and Nopol were extracted from the experimental results of Martinez et al. [
32], and the results of the remaining tools were all obtained from their papers SimFix [
1], ACS [
7], CapGen [
8] and SketchFix [
9]. In fact, only CapGen has achieved similar performance with high probability (84%) patches, and CapGen high performance confirms their intuition, the security constraints we provided without side effects are critical to improve the correctness of patch.