Incremental Formula-Based Fix Localization

: Automatically ﬁxing bugs in software programs can signiﬁcantly reduce the cost and improve the productivity of the software. Toward this goal, a critical and challenging problem is automatic ﬁx localization, which identiﬁes program locations where a bug ﬁx can be synthesized. In this paper, we present AgxFaults, a technique that automatically identiﬁes minimal subsets of program statements at which a suitable modiﬁcation can remove the error. AgxFaults works based on dynamically encoding semantic of program parts that are relevant to an observed error into an unsatisﬁable logical formula and then manipulating this formula in an increasingly on-demand manner. We perform various experiments on faulty versions of the trafﬁc collision avoidance system (TCAS) program in the Siemens Suite, programs in Bekkouche’s benchmark, and server real bugs in the Defects4J benchmark. The experimental results show that AgxFaults outperforms single-path-formula approaches in terms of effectiveness in ﬁnding ﬁx localization and fault localization. AgxFaults is better than program-formula-based approaches in terms of efﬁciency and scalability, while providing similar effectiveness. Speciﬁcally, the solving time of AgxFaults is 28% faster, and the running time is 45% faster, than the program-formula-based approach, while providing similar fault localization results.


Introduction
Debugging is an essential and yet the most expensive task in software development [1,2]. It includes a labor-intensive process of locating and fixing faulty code in a buggy program. This process consumes about 50% of the total software development costs, of which the majority is spent on fault localization [1]. Automatic techniques that reduce manual effort in debugging can significantly impact software costs and productivity [3].
Many automatic techniques have been proposed for supporting developers in various debugging activities (e.g., Reference [4][5][6][7]). Most fault localization approaches (e.g., spectrum-based or mutation-based methods [8,9]) focus on computing a statistical measurement of suspicious to rank program statements by their likelihood of being faulty. However, to be useful, these methods require a test suite containing many passing and failing executions with high code coverage [10,11]. Such a high-quality test suite is often not available in practice. Moreover, these methods provide a ranked list of suspicious statements solely without any explanation; thus, developers still need high inspection efforts to examine these statements for localizing and fixing faults [3].
Formula-based fault localization (FFL) is particularly a promising approach as it not only logically identifies possible fault locations but also provides additional information that helps to explain and fix the faults [5]. Consider a buggy program and a failing test case that, in its execution trace, called error trace, demonstrate an error. FFL techniques work by constructing an unsatisfiable logical formula called error trace formula, that is a symbolic representation of the error trace, and using an automatic solver to find the causes of this formula unsatisfiability. Based on the solution obtained from the solver, possible faulty statements in the program can be logically identified. Existing FFL techniques differ in how they construct the error trace formula and how they use the automatic solver to manipulate the formula for locating the fault.
In Reference [12][13][14][15], a static analysis technique was performed to construct a logical formula called program-formula that semantically equal to the input program (with regard to a certain unwinding bound). Specifically, every satisfiable assignment to the programformula corresponds to a feasible execution in the program, and vice versa every program execution (with regard to a certain bound) corresponds to a satisfiable assignment of the formula. This program-formula is then extended in conjunction with clauses encoding the input values and the assertions of a failing test case to form an unsatisfiable formula. They then feed the extended formula into a pMaxSAT solver, which finds an assignment to the formula's variables that maximize the number of satisfied clauses. The set of unsatisfied clauses, called a minimal correction subset (MCS) of the formula, indicates a corresponding minimal set of program statements that can be modified to correct the considered error execution. In addition, the obtained variable assignments correspond to a feasible correct execution in the angelically fixed program, which replaces all statements in the MCS with suitable angelic values. As a result, they can provide developers with potential minimal fix location, together with a successful angelic execution, as an explanation.
A key limitation of these methods is that they are extremely computationally expensive and has scalability issue even with small programs. This is because they represent all possible execution paths in a program into a formula. It may easily lead to a very large and complex formula that cannot be handled or is difficult to handle by recent solvers. Jin [16] proposed an on-demand formula computation (OFC) technique to construct a smaller formula that encodes only program parts relevant to a given test case. Their experimental evaluation showed that OFC formulas are much simpler than program-formulas but still sufficient to produce the same result as the program-formulas. This method, however, requires to compute all MCS for multiple intermediate formulas before obtaining the final formula. Enumerating all MCS of many formulas might outweigh the benefits of generating a simpler formula.
In Reference [17][18][19]'s approaches, they work with a formula encoding the semantics of a single execution path, we refer to this formula as single-path trace formula. A singlepath trace formula is semantically equal to a straight-line program that contains program statements in which execution produced an error, although these single-path trace formulas are simple and, thus, easy to solve. However, because the formula does not contain information related to the control dependence among statements in the original program, the MCS obtained from these formulas may not correctly correspond to an angelic fix in the original program. As a result, these methods may fail to identify some angelic fix location, and they may also report invalid angelic fix candidates.
In this paper, we present AgxFaults, an incremental formula-based fault localization method to overcome the above limitations. Our method is based on two main components. First, instead of based on a static formula that encodes the entire program (like programformula) or just encodes a single execution path (like single-path formula), AgxFaults is based on an error formula that is constructed and extended dynamically and effectively in an on-demand manner. This formula, called angelic error trace formula, encodes only program parts that relevant to a specific failing test case, and it over-abstracts all unrelated program parts by angelic non-deterministic executions. This encoding results in compact error trace formulas that are easier to solve but that are still sufficient to identify both data and control-related faults. Second, because the angelic formula is incremented dynamically, instead of multiple calls to a MaxSMT solver to solve multiple formulas separately (like OFC approach), we adapt the incremental core-guide MaxSMT algorithm [20] to manipulate the formula and compute the MCS incrementally.
We implemented our method in a tool named AgxFaults by extending the Java Path Finder (a NASA model checking tool) for localizing faults in Java programs. The input of AgxFaults is a buggy program and failing test cases (can be given as a jUnit test case, or a pair of input and expected post-condition in a configuration file). AgxFaults outputs a set of minimal angelic fix candidates (MFC), and each MFC is a pair consisting of a minimal fix location set and an angelic execution path to show that modifying these statements can make the given failing test case pass.
We evaluated our methods using various programs of different kinds and sizes. These programs include several sample programs provided by Bekkouche [19], 41 faulty versions of a commercial traffic collision avoidance system (TCAS), and several large and complex real-world programs in the Defects4J benchmarks. The experimental results showed that AgxFaults succeeded in reporting actual fault localization for all bugs in both the Bekkouche and the TCAS benchmarks. AgxFaults outperformed single-path formula approaches both in terms of the success rate and the accuracy of fault localization. AgxFaults provided similar results compared to the program-formula approach with better efficiency and scalability. Specifically, AgxFaults has a 28% faster formula solving time and 45% faster running time than the program-formula approach when applied to the TCAS programs. Furthermore, when the complexity of the program increased (e.g., loop unwinding bound increased), the formula solving time of the program-formula-based method increased exponentially, while that of AgxFaults increased significantly slower.
In summary, we have made the following contributions in this paper: • We proposed a technique to dynamically encode the semantic of a partial program that is related to a specific test input into a formula in an increasingly and on-demand manner. • We present an iterative algorithm for enumerating minimal angelic fix candidates by manipulating and solving the constructed error formula in an incremental manner. • We implement our proposed method in a tool, AgxFaults, and it is public as opensource software. • We perform experiments on various public benchmarks and open-source projects to show the effectiveness of our proposed approaches.
The rest of this paper is organized as follows. We first provide a basic background in Section 2. We then describe the detail of the proposed method in Section 3. We describe our experimental setup in Section 4 and discuss the experimental results in Section 5. We review related work in Section 6. Finally, we give our conclusions in Section 7.

Background
In this section, we describe the fault localization problem and provide a running example.
Then, we explain the basic background of maximal satisfiability-based fault localization.

Fault Localization Problem
Fault localization is the problem of identifying program statements that responsible for an observed failure in a software program. Without knowing the correct program in advance, it is impractical to automatically pinpoint the faults with absolute accuracy. Indeed, any program statement or subset of program statements, that, if suitably modified, can remove the failure, is considered possibly faulty [12,14,16]. Since checking the existence of an actual syntactic fix for a program is extremely computation expensive [7,21], we check for the existence of an angelic fix that removes the failure in an angelically way by replacing program expressions with suitable non-deterministic values (i.e., angelic values).
In the context of this paper, we consider fault localization to be the problem of finding angelic fix candidates for an observed failure in a software program. An angelic fix candidate (AFC) consists of (1) a fix locations set (i.e., a set of suspicious statements) and (2) angelic values (i.e., set of values that, if substituted for these statements, would make the given failure execution become a success). Essentially, the angelic values represents an angelic execution that is diverted from the original program execution by replacing the value of specific variables at the fix locations with their corresponding angelic values. Because there may exist a large number of angelic fix candidates, providing the developer with a set containing only minimal angelic fix candidates (i.e., angelic fix candidates that have a minimal fix location set) is more referable. Thus, given a faulty program and a failing test case that, in its execution, demonstrates a failure, we produce a set of minimal angelic fix candidates (MFCs) that make the given test execution become a success. Figure 1 shows a method foo and a unit test method testFoo, which checks if foo returns a certain value when it is called on a particular input. The unit test method testFoo calls foo with an input x = 3, y = 5 and asserts that the output is equal to −2. However, because of a fault at line 3, where the assignment (a = 2 * x) is accidentally written as (a = x), the method foo returns 8, thus violating the assertion, and the test fails. After running the test case testFoo, we know there is fault in the program foo. However, without knowledge about the correct program, a program statement can be considered possibly faulty if there exists a suitable replacement for these statements to make the observed error disappear.
We use x = (L, V) to denote an angelic fix candidates x, where L = {l 1 , ..., l n } is its fix locations set, and V = {"v 1 " = val 1 , ..., "v m " = val m } is the corresponding set of angelic values. In reality, the size of set L and set V may different, e.g., a statement may have multiple instances in an execution. However, for simplifying the representation, we assume m = n; thus, the angelic values V[i] corresponds to the statement at fix location L[i]. Each item "v i " = val i in an the angelic values set V is a mapping from an angelic value val i to variable v i at the fix location point i.
Given the program and a failing test case in Figure 1, our approach produces five minimal angelic fix candidates, which are: Consider MFC m f c 3 = ({3}, {"a 1 " = 5}). This MFC states that the failure can be removed by modifying the assignment (a = x) at line 3 such that it assigns the angelic value 5 to the variable a. Indeed, assigning 5 for the variable a at line 5 will change the value of condition expression in the if-statement at line 6 (i.e., a >= y) from " f alse" to "true"; thus, the execution flow of the error trace is flipped into the true-branch. As a result, the statement (b = x − y) at line 7 is executed, and the final value of variable b at the return statement is −2; thus, the test assertion is satisfied.
An angelic fix candidate is said a feasible fix candidate if substituting the value of variables at the fix locations by their corresponding angelic values actually results in a successful program execution, i.e., the corresponding angelic execution results in a success. Otherwise, it is said to be an invalid fix candidate, or an infeasible angelic fix candidate. All the five MFCs above are feasible fix candidates because their corresponding angelic executions are feasible.
An angelic fix candidate is a correct fault location if all statements in its fix locations set are actually faulty statements. For example, the m f c 3 = ({3}, {"a 1 " = 5}) is a correct fault location because all statements in its fix locations set, i.e., statement at line 3, are actually faulty. Let us consider another MFC m f c = ({3, 1}, {"a 1 " = 5, "b 1 " = 3}), for example. This MFC is a feasible fix candidate because replacing value of variable "b" at line 1 by value 3 and replacing the value of variable "a" by value 5 actually make the test execution a success. This MFC is not a correct fault location because its fix locations set contains a statement at line 1 that is not a faulty statement.
An angelic fix candidate is said a correct fix candidate if it is both a feasible fix candidate and a correct fault location. For example, the m f c 3 = ({3}, {"a 1 " = 5}) is a correct fix candidate, as it is both feasible and also a correct fault location. Let us consider another MFC m f c = ({3}, {"a 1 " = 0}), for example. This MFC is a correct fault location. This mcs is not a feasible fix candidate because replacing the value of variable "a" at line 3 by value 0 does not make the test execution success. Thus, this mcs is not a correct fix candidate.

Formula Satisfiability and Solvers
The formula satisfiability (SAT or SMT) is the problem of determining if there exists an assignment to variables in a given logic formula such that the formula evaluates to true. If such an assignment (called model) exists, the formula is called satisfiable (SAT); otherwise, it is called unsatisfiable (UNSAT).
Maximum satisfiability (MaxSAT or MaxSMT) is an optimization version of the SAT problem where the goal is to find a model for a given formula such that maximizes the number of clauses satisfied together. Such a maximized subset of clauses is called a maximal satisfiable subset (MSS). The complement of MSS is called a minimal correction subset (MCS), which is a minimal subset of clauses that, if removed, can make the remainder formula satisfiable again. Partial MaxSAT (pMaxSAT) is an extension of the MaxSAT, in which clauses are marked as either "soft" or "hard". The goal in pMaxSAT is to find a model that satisfies all "hard" clauses and maximizes the number of satisfied "soft" clauses.
Although the SAT problem was known to be NP-complete, recent automatic solver algorithms have shown that they can solve large SAT formulas encoding practical industrial problems. SAT solvers are software programs that accept a logic formula in conjunction normal form (CNF) and decide if the formula is satisfiable. If the formula is satisfiable, the solver returns "SAT" and provides a satisfiable model for the formula. Otherwise, the solver returns "UNSAT" and may produce an unsatisfiable core, which is a subset of clauses in the formula that cannot be satisfiable together, as an explanation for the unsatisfiability. Some recent solvers support incremental SAT solving, which facilitates solving a series of closely related formulas. Incremental SAT solvers remember already learned information after checking the satisfiability of one input formula and utilize this information to avoid repeating redundant work in further satisfiability checks of additional formulas.
Generally, MaxSAT solver algorithms perform a succession of SAT solver calls; after each call, they add additional cardinality constraints to reach an optimal solution. The state-of-the-art MaxSAT solvers are based on the Core-guided MaxSAT algorithm, which leverages the unsatisfiable core produced by the SAT solver. Specifically, after each call to the SAT solver, they relax clauses in the unsatisfiable core by associating a relaxation variable with each such clause. To reach optimal solutions, they add cardinality constraints to constrain the number of relaxed clauses.

MaxSAT-Based Fault Localization
MaxSAT-based fault localization approaches [12,14,15,19,22] reduce the fault localization to the maximal satisfiability problem, which finds a variable assignment for a logic formula such that the number of satisfied clauses is maximized. Given a buggy program P and a failing test case T(inp, as) that exposes a bug in the program, these approaches perform as follows.
First, they use a bounded model checking or symbolic execution tool to construct a logical formula, called error trace formula, that semantically represents the error execution of the buggy program and the failing test case. Essentially, this error trace formula is the conjunction Φ ≡ ϕ inp ∧ ϕ t f ∧ ϕ as , where ϕ inp encodes the test input, ϕ t f is called trace formula encoding the semantics of program execution trace induced by the given input, and ϕ as encodes the test assertion that the program must be satisfied (or the expected output that the program must produce for the given test input). Since the program fails the test, thus, obviously, this error trace formula is logically unsatisfiable. Because the test input and the test assertion are correct by definition, thus, the clauses encoding test input and assertion are not responsible for the unsatisfiability of the error trace formula. Therefore, the causes of this formula unsatisfiability are account for clauses in the trace formula ϕ t f . It is exactly the situation that faulty statements in the program are responsible for the test failure.
Second, they treat the constructed error trace formula as an instance of a partial MaxSAT problem, in which the clauses encoding the test input ϕ inp and test assertions ϕ as are marked "hard", and the clauses encoding program statements in the formula ϕ TF are marked "soft". They then feed the formula into a pMaxSAT solver to obtain a set of MCSs of the formula. Intuitively, the set of clauses in an MCS indicates a corresponding set of minimal fix locations set, and the maximum satisfiability model of the MCS provides angelic values for these fix locations. Thus, as a result, they can produce a set of minimal fix candidates that can make the given test execution become a success.
Consider our example in Figure 1. Program-formula-based approaches, such as Bug-Assist [12] and SNIPER [14,15], first inline all function calls and unwind all loops in the program foo up to a given bound to obtain a loop-free and function-call-free program. They then transform the flatted program into a semantically equal program in the static single assignment form (SSA) [23], in which each variable in the program is assigned, at most, one time. Figure 2 shows the SSA form of the method foo where the number of loop unwinding is two. Each statement in the SSA program is then represented as a logic clause. These clauses are then conjoined to form a program-formula, the formula TF shown in Figure 3. This program-formula is semantically equal to the original program with respect to a specific unwinding bound. Specifically, every satisfiable assignment to the programformula corresponds to a feasible execution in the program, and, vice versa, every program execution (with regard to the unwinding bound) corresponds to a satisfiable assignment of the program-formula. The program-formula is then extended in conjunction with clauses encoding the test input values and clauses encoding the test assertions of the failing test case to form an error trace formula I N ∧ TF ∧ AS, shown in Figure 3. By applying a pMaxSAT solver to this error trace formula, they obtain, totally, five minimal correction subsets: mcs 1 = {c 9 }, mcs 2 = {c 6 }, mcs 3 = {c 3 }, mcs 4 = {c 1 }, mcs 5 = {c 2 , c 4 }. Each MCS indicates a corresponding set of minimal fix locations set, and the maximum satisfiability model of the MCS provides angelic values for these fix locations. As a result, they identify and report to developer following five minimal fix candidates: i f ( x0 >=0) 3 . a1 = x0 ; 4 . e l s e 5 . a2 = −2 * x0 ; . a3 = Phi ( a1 , a2 ) ; 6 .

Proposed Method
In this section, we provide details of our proposed fault localization method, AgxFaults.

Overview
AgxFaults takes a buggy program and a failing test case that demonstrate a program failure as an input. It outputs a set of pairs, each pair consisting of a minimal set of suspicious statements with an angelic execution that explain how the failure can be removed by replacing these suspicious statements with angelic values. The fault localization process of AgxFaults is iterative and incremental on-demand. Figure 4 provides a high-level view of AgxFaults and its main components. Below, we first briefly describe the main components and then explain the overall fault localization process of AgxFaults. •

Angelic DCFG (Dynamic Control Flow Graph) :
The Angelic DCFG is the dynamic control flow graph of an angelic program [24]. This angelic program acts as an abstraction of the input buggy program such that only program parts relevant to the error are represented precisely, while irrelevant parts are represented abstractedly as angelic non-determinisms (i.e., executions that can produce non-deterministic values such that the program execution succeeds). • Error Trace Formula: The error trace formula is essentially formula I N ∧ TF agx ∧ AS, where I N represents the test input, AS represents the assertions of the given failing test case, and TF agx is semantically equal to the current version of the Angelic Program. The error trace formula encodes the fault localization problem of the current angelic program with the given failing test case. Each MCS of this angelic formula corresponds to an angelic execution in the angelic program. • Angelic Execution The angelic execution is a correct execution of the angelic program. This execution is obtained by diverting the original error trace such that the output of specific statements is dynamically replaced with proper values, i.e., angelic values, to make the test execution success. • On-demand Program Explorer and Encoder component is responsible for refining the angelic program and the error trace formula in an on-demand manner.
• Incremental formula solver is responsible for computing the minimal correction subset (MCS) of the angelic formula incrementally. • MCS analyzer analyzes the obtained MCS of the angelic formula to determine possible faults in the program. In addition, it determines which abstract parts of the angelic program need more refinement to provide a more precise result.
The core idea of the AgxFaults is to work with an angelic program incrementally instead of a program formula encoding all semantics of the original buggy program, which may lead to very complex and expensive computation. In the beginning, only statement instances that executed in the original failing trace are presented precisely in the angelic program. The angelic program is expanded dynamically in an on-demand manner, after each iteration, to provide more precise results.  Figure 4. Overview of AgxFaults.

Overall Fault Localization Algorithm
The overall fault localization process of AgxFaults is described in Algorithm 1. In the algorithm, P agx represents the dynamic control flow graph of the angelic program, and solver is an instance of an Incremental Partial MaxSat Solver. Additional hard and soft constraints can be added into the solver via method addHard() and addSoft(), respectively. The method Check() of the solver returns True if it finds an MCS for the current formula; otherwise, it returns False.
In the beginning, the algorithm initializes P agx as a pure angelic program (which does not contain any specific statements, line 1). The formula solver, solver, is initialized with the set of soft-constraints empty and the set of hard-constraints containing clauses encoding the test input and its assertion (lines 1 to 3).
The main process of the algorithm is the loop from line 4 to line 15. It is an iterative process comprising the following steps: At the beginning of each iteration, it asks the solver to find an MCS for the current formula, line 4. If there is no more MCS, then the loop is terminated. Otherwise, the iteration starts and the solution of the formula is stored in model, line 5. Then, based on model, it determines the corresponding possible faulty statements mcs and an angelic execution Π agx in the angelic program, line 6. If the angelic execution Π agx contains unspecified executions (i.e., the if condition at line 7 is true), then the angelic refinement process is performed to refine the angelic program and angelic formula at these angelic-branches, line 8. Otherwise, Π agx is a feasible angelic execution; thus, the possible fault location mcs, together with the angelic execution path Π agx , are reported to the developer, line 12. In addition, a blocking constraint is added to the solver, line 13. This blocking constraint guarantees that the current MCS will not be encountered again in subsequent iterations. Specifically, the blocking constraint of the MCS is {c|c ∈ mcs}. The blocking constraint states that all program statements in the mcs are not simultaneously containing faults.

On-Demand Program Explorer and Incremental Formula Encoder
On-demand Program Explorer and Encoder (OPEE) is responsible for dynamically constructing and refining the angelic program, as well as the angelic formula in an incremental and on-demand manner. Algorithm 2 shows its details. It takes as input a program P, a concrete input inp, a set of angelic modifications mcs, and an angelic execution path Π that contains angelic-branches. Each angelic modification, m ∈ mcs, specifies a statement instance st and a value val, called the angelic value. mcs[st] represents the angelic value of statement instance st. The abstract execution path Π contains a sequence of branch decisions. Π[st] represents the decision at branch instance st.
The procedure AngelicRefinement( ) in Algorithm 2 describes how our the on-demand program explorer works. The OPEE component runs the program with the given test input and dynamically substitutes the value produced by statement st ∈ mcs with its corresponding angelic value mcs[st] to explore the program execution path specified in the abstract path Π. During program execution, it treats each statement instance, st, differently, depending on whether st corresponds to a non-deterministic or concrete statement instance in the angelic program P agx . If st corresponds to a concrete statement instance of P agx , the condition at line 21 is true. If st is an angelic location, then the execution engine perturbs the program memory M such that the output of st is replaced with its corresponding angelic value (line 23). In addition, the execution engine also checks whether the current execution has diverted from the expected abstract path by comparing the actual branch decision with the angelic branch decision (line 25). If the diversion happens, then the execution stops. If st corresponds to a nondeterminism statement in P agx (i.e., the condition at line 21 is false), the angelic program is refined by making this statement instance concrete (line 31), and the angelic formula is simultaneously updated to encode this statement instance (line 32).
Let st be the statement instance that is being encoded into the angelic formula. Depending on the type of st, it is encoded in the angelic formula differently. The procedure UpdateFormula( ) in Algorithm 2 shows the details. Specifically, if st is an assignment statement [v = expr], we represent st as an equivalence relation between variable v on the right-hand side and expression expr on the left-hand side (line 47). For a conditional statement, [i f cond], we add an extra variable guard to represent the branch predicate, and we represent the conditional statement as an equivalence relation be-tween the guard variable and the conditional predicate expression (line 45). If st is a phi statement instance, [v = Φ(guard cs , expr)], it is presented as an implication constraint (guard cs =⇒ (v = expr)). This implication constraint essentially states that, if the execution trace reaches the statement st by going through the branch guard cs of the conditional statement, then the value of variable v is equal to expr; otherwise, the value of v is unconstrained. By encoding Phi statements as implication constraints, it allows tightening the constraints of the joined variable by additional constraints when executing other branches of the conditional statement.
Algorithm 2 On-demand Program Explorer and Formula Encoder 16: procedure ANGELICREFINEMENT(mcs, Π agx , P agx ) 17: 19: while st = null do 20: (st ssa , π ssa ) = convertToSSA(st, π ssa ) 21: if st ssa ∈ P agx then 22: if st ssa ∈ mcs then 23: 27: break execution is diverted from the angelic path Π agx 28: end if 29: end if 30: else // st ssa / ∈ P agx 31: P agx ← P agx ∪ {st ssa } refine the angelic program 32: updateFormula(st ssa , Φ hard , Φ so f t ) refine the angelic formula 33: end if 34: (st, M) ← executeStatement(P, M) 35: end while 36: return (P agx , Φ hard , Φ so f t ) 37: end procedure 38: Because if a statement st is faulty, it would be replaced by a different statement. Thus, in that case, the constraints that constrain the value of variables in st is invalid and should be relaxed. To reasoning on the faultiness of program statements, we associate with each statement st in the original program a boolean variable ab st as a fault predicate. Specifically, the variable ab st indicates that the statement st is faulty (or correct) if it is evaluated to true (or f alse, otherwise). We encode each conditional and each assignment statement instance st into the angelic formula by adding a clause (ab st ∨ ϕ val ) to the hard constraints set and adding a clause ¬ab st to the soft constraints set (line 49-51), where ϕ val is the constraint representation of the statement instance. Since phi statements are fake statements that are introduced to explicitly represent the dependence of variables on branch decisions in the execution trace, the faultiness of a phi statement instance may account for faults in its preceding assignment or conditional statement instances. Thus, we add the constraint representing a phi statement instance into the angelic formula as a hard constraint (line 41).
To summarize, the constructed angelic formula is an instance of a partial maximum satisfiability problem in which the hard constraints represent semantic of the input and assertions of a test case and the constructed angelic program; the soft constraints contain a set of fault predicate of statements in the buggy program that have been represented in the angelic program. Each MCS of this angelic error trace formula indicates a minimal set of program statement stmt, in which its corresponding fault predicate ab is evaluated to true.

Incremental Formula Solver
After each iteration of the Angelic Refinement Loop, a new set of constraints are added into the angelic formula. Instead of considering each angelic formula after each iteration as an independent max-sat problem and invoking a Max-SAT solver to find MCS, we consider all generated angelic formula so far as a sequence of a similar max-sat problem, i.e., an instance of a sequential maximum satisfiability problem [20]. The Formula Solver uses a sequential max-sat solver to compute MCS of the angelic formula incrementally.

MCS Analyzer and Report Writer
Given an MCS, mcs, and a satisfiable model of the angelic error trace formula, the MCS Analyzer responsible for reconstructing an angelic execution trace Π agx in the angelic program that corresponds to the MCS in the angelic formula. The angelic execution trace Π agx is reconstructed by traveling the dynamic control flow graph of P agx beginning from the entry point as follows. The entry point of the program is the first element of Π agx . All assignment statements in the path are added in Π agx . When traveling to a conditional branch node cs, the guard expression of cs is evaluated using the model to determine the selected branch, which we will call b. If the selected branch b is not included in the angelic program P agx , (b is a non-deterministic branch), then the process moves to the corresponding Phi statement instance of cs. Otherwise, it moves to the first statement in the selected branch and continues the process until it reaches the terminal node.

Illustrative Example
We illustrate how our proposed method work using the running example in Figure 1. Figure 5 shows the progress of AgxFaults in the firsts 4 iterations. In the figure, the green nodes represents program parts that are assumed to be correct; thus, they are encoded into the formula as hard constraints.
After the initialization steps, the angelic program P agx contains only an entry point and a normally exit point (i.e., successful terminated). The solver, solver, is initialized when the soft-constraints set is empty, and the set of hard-constraints encode the entry point and the normal exit points which represent the successful terminator. In the first iteration of the loop, the algorithm do nothing important but invoke the AgelicRefinement method (line 8) with input an empty angelic fix location set and an empty angelic path. Thus, the AngelicRefinement method just runs a given test case and encodes the executed statements into the formula. Figure 5b shows the state of algorithm after the first iteration. Specifically, the angelic program precisely presents only executed statements in the original error trace, while all other parts are abstracted. Figure 6 shows the set of encoded constraints in the first iteration (We eliminated the fault predicate variables ab from the clauses for simplification). hard : In the second iteration, first, the solver solves the current angelic formula and returns an MCS φ mcs = { f oo = b 3 }, which corresponds to the return statement at line 8. Then, the MCSAnalyzer component analyzes the mcs and corresponding model to identify fix location set and corresponding angelic path, (line 6 in Algorithm 1). The constructed angelic path Π agx is the path (s, 1,2,6,8,9,OK) in the CFG of the angelic program. This path is feasible because all edges in the path are deterministic. Thus, the algorithm goes to line 12 to output the newly found minimal fix candidate, and then it adds a constraint to block this solution that will occur in the next iterations. In other words, from this time, the solver considers the statement at line 8 as correct.
In the third iteration, the solver returns an MCS mcs = {b 1 = x 0 + y 0 }, which correspond to assignment [b = x + y] at line 1. The constructed angelic path for this mcs is the path (s,1,2,6,8,9,OK), which is a feasible path in the angelic program. Thus, similar to the second iteration, the algorithm outputs the newly found MFC, adds blocking constraint for line 1, and goes to the next iteration. Figure 5c shows the current state of the angelic program and the angelic formula in the solver. Because the blocking constraint that are added in previous iterations, the statements at line 1 and line 8 will not be considered as possible faults. Thus, the solve only needs to search in a smaller search space for the next MCSs.
In the fourth iteration, the solver found an MCS mcs = {g 2 = (a 1 >= y 0 )}, and the angelic value is {"g 2 " = True}. This mcs indicates the branch condition at line 6 is the fix location. The constructed angelic path for this mcs is the path (S,1,2,3,6,?, 8,9,OK). Because the angelic path contain an non-deterministic branch, thus, the AngelicRefinement method is invoked with input mcs = {line 6}, and its corresponding angelic value is True to explore the true-branch of the if-statement at line 6. After the refinement, the assignment [b = x − y] is presented in the angelic program, and additional constraints in Figure 7 are added into the solver.
After the refinement, the procession continues until timeout or all minimal fix candidates have been found (i.e., the solver.Check() return UNSAT).

Evaluation Setup
To evaluate our proposed method, we performed experiments on three different set of benchmarks and compared against existing formula-based fault localization techniques including: techniques that are based on program-formula (e.g., BugAssist [12], Sniper [14]); techniques that are based on single-path control-flow insensitive formulas (e.g., Reference [17,19]); and single-path control-flow sensitive formulas (e.g., Reference [18]). All experiments are performed on a computer with a 4.0 Ghz Intel Core i7 CPU and 8 GB RAM. In this section, we describe the setups of our evaluation.

Implementation
We have implemented our approach in a prototype tool, named AgxFaults, that automatically localize faults for Java programs. Unfortunately, all the formula-based fault localization techniques, which provide tools or source-code online, are targeting C programs. Thus, for comparison, we also implemented three different existing formulabased fault localization techniques into the tool AgxFaults, specifically, the programformula (PF) approach used in Reference [12,14], the flow-insensitive trace formula (FI)based approach used in Reference [17], and the flow-sensitive trace formula (FS) used in Reference [18] are implemented into AgxFaults.
We implemented the AgxFaults tool as an extension of NASA's model checker Java-Path-Finder (JPF). The inputs of AgxFaults are a buggy program and a failing test case (given as a JUnit test method or a pair of input and expected post-condition in a configuration file). The output is a list of minimal angelic fix candidate (MFCs), where each MFC contains (1) a set of suspicious statements, together with (2) angelic values that these statements should have produced to make the test execution, which originally fails, become a success. AgxFaults includes in itself a customizable formula-builder and a constraint solver for solving formulas. We implement the formula-builder by using extension mechanisms of the JPF and adapted from the implementation of jDart [25], a concolic execution engine for Java programs. Specifically, we use the bytecode factories and listeners extension mechanisms of JPF to (1) create fresh symbolic variables on-the-fly when executing assignments and branch conditions instructions, (2) dynamically perturb concrete program state and force the program execution to follow a specific path, and (3) manipulate and propagate symbolic values along different program execution paths and collect constraints for building error trace formulas. As inherited from jDart, AgxFaults also uses the constraints library jConstraints [26] as an abstraction layer for constraint solvers. Since the jConstraint does not support solving pMaxSMT/MaxSMT and sequential MaxSMT problems, we implemented the Fu & Malik's core-guided max-sat algorithm [27] and the incremental core-guided MaxSMT solving algorithm [20] into the jConstraint library. We use the Z3 (https://github.com/Z3Prover/z3) SMT solver as our back-end constraint solver.
We implemented the program-formula approach (PF) following the full-flow sensitive formula encoding approaches in Sniper [14] because it was experimentally showed more effective than Bug-Assist. To construct a program-formula, we force the on-demand program explorer and formula builder to explore all paths up a certain bound in the program and encode all of them into the formula, instead of operating on demand. We implemented the single-path flow-insensitive formula approaches by following error trace formula encoding described in Reference [17]. The single-path flow-sensitive formula approach is implemented by following the error trace formula encoding described in Reference [18]. The source code of AgxFaults, which contain our implementation of all above techniques, as well as our benchmark programs, are available online for open access at http://bit.ly/agxfaults.

Research Questions and Evaluation Metrics
We applied each of these four implemented techniques to several buggy programs to empirically investigate the following research questions: RQ1: How effective is AgxFaults in finding angelic fix candidates, compared to programformula and single-path formula approaches?
A technique is said more effective in finding angelic fix candidate if it can find more feasible fix candidates. A technique is said more precise in finding angelic fix candidate if it always produces feasible MFCs where the ratio of the number of its found feasible MFCs to the total number of its reported MFCs is high. Thus, to answer RQ1, we identify two metrics: (1) average number of feasible MFCs that the technique found for each run (i.e., number of feasible MFCs found), and (2) the number of feasible MFCs in the total number of MFCs that the technique reported for each run (precision).
RQ2: How effective is AgxFaults in localizing fault location, compared to program-formula and single-path formula approaches?
To evaluate the fault localization effectiveness of a technique, we identify three metrics: (1) the number of runs the technique were able to report the correct fault locations in the total of its runs (i.e., successful fault localization rate), (2) the number of runs the technique were able to report a correct fix candidate in total number of its runs (i.e., successful fix localization rate), and (3) the percent of program code lines that the developer need to examine before identifying the first faulty statement (i.e., EXAM score).
RQ3: How efficient and scalable are AgxFaults, compared to program-formula and single-path formula approaches?
We use the CPU time as a metric to measure the efficiency and scalability of a technique. Specifically, for each technique, we measure: (1) the CPU time spent by the solver to solve formulas (i.e., Formula solving time), and (2) the total running time the technique for each run (i.e., Running time). RQ4: Can AgxFaults be applied to real bugs in large software projects?

Benchmarks
For our evaluation, we use several buggy programs selected from three different benchmarks. The first benchmark consist of a set of example programs provided by Bekkouche [28]. The size of these program ranges from 17 to 130 lines of code. Each program contains one to three faults that were specifically injected to evaluate fault localization techniques. The second benchmark contains 41 faulty versions of a real traffic collision avoidance system (TCAS) from Siemens [29], which is popularly used in software testing and fault localization researches. The third benchmark we selected is real-world buggy programs in the Defects4J [30].
We selected the programs in the first and second benchmark for two main reasons. First, these programs are small, so they allow us to find all minimal angelic fix candidates for the programs. This is generally impossible for large and more complex programs. By finding all minimal angelic fix candidates for the programs, we can precisely compare the efficiency of the techniques in term of complexity reduction and compare the effectiveness of the techniques in terms of success rates. Second, the TCAS programs are commonly used to evaluate state-of-the-art formula-based fault localization techniques, including BugAssist, Sniper, and LocFaults. Thus, we can directly compare our results with those techniques since the program in the first and the second benchmarks are small size and contains only artificial faults. The third benchmark contains non-trivial open-source programs with real bugs.
We obtained Java versions of the TCAS programs from the SIR website. Each faulty version of the TCAS program has a size of 180 lines and contains one to three artificiallyinjected faults. These programs also come with a total of 1576 test inputs and a fault-free version. For each buggy version, we manually compare it with the fault-free version and consider the set of different statements to be the actual faulty statements. To obtain failing test cases for each buggy version, we ran all test cases using the fault-free version to obtain the expected output of the test cases. Then, for each buggy version, we ran all the test cases and matched the results with the expected output to identify the failing test cases.
The third benchmark we considered is several faulty programs in the Defects4J, a benchmark containing real bugs from large and complex open-source projects. We randomly select several buggy versions of open-source projects in the Defects4J. These programs include JFreeChart, Commons-Codec, Commons-Compress, Common-CSV, Common-Lang, Common-Math, and Mockito. In this experiment, we use the failing test cases that are already provided in the Defects4J as the input for the AgxFaults tool.

Study Protocol
We ran each of four techniques methods implemented in our tool on a buggy program multiple times, each time inputting a different failing test case and outputting a list of minimal angelic fix candidates (MFCs). We examined all MFCs produced for each run, in the generated order, and determined the validity and accuracy of the results. Specifically, a run was considered successful fault localization if its output contained a correct fault location MFC, i.e., all suspicious statements in the MFC were actually faulty statements. A run was considered successful fix localization if its output contained an correct fix location MFC, i.e., an MFC that is both a feasible fix candidate and performs fix at the correct fault location.
To evaluate the performance of the techniques, we record the average CPU time of each technique for processing each failing test case and the amount of which consumed by the solvers, the number of MFC each technique generated and the number of which are feasible fix candidates, the number of code lines included in the report, whether the generated report contained the actual fault, and how many lines of code developer will have to examine to identify the actual fault location.
We did not set timeout when running the techniques on the programs in the TCAS and the Bechouche benchmarks. Thus, the tool finished after it had reported all MFCs that it could find. When the techniques were ran on the programs in the Defects4J benchmark, we set a timeout of five minutes and the deepest nested loop was 100.

Result of RQ2: Effectiveness in Fault Localization
To evaluate the fault localization effectiveness, we report and compare (1) the number of successful fault localization runs, the number of successful fix localization runs, and the average EXAM score of each technique for each buggy program. For recall, a run of a technique was said successful fault localization if it found an MFC that fixes at only faulty statements (i.e., all fix locations in the MFC are actually faulty statements). A fault localization run was considered successful fix localization if it outputs an MFC that is both a feasible fix candidate and a correct fault location. EXAM score is the percent of program code lines that the developer needs to examine before identifying the fault. EXAM score is computed as the ratio of the number of lines of code that the developer examined before reaching an actual faulty line to the total number of code lines in the program. Table 2 shows the experimental results for the TCAS programs in terms of fault localization effectiveness. The first section of the table shows the version name (Ver) and the number of fault localization runs (Ftc) of each technique for each buggy version. The columns in the first section, "#Succ. fault localization", of Table 2 show the number of runs that each technique succeed in reporting correct fault location. We obtained the result of Bug-Assist (BA) and LocFaults (LF) from Reference [19,22]. In a total of 2156 runs of each technique, the number of runs that successfully output the correct fault locations of the AgxFaults, PF, and FS techniques is 2156 (100%) runs, while those of Bug-Assist (BA), LocFaults (LF), and FI is 2087 (96%), 1345 (62%), and 121 (5.6%) runs, respectively. Specifically, FI reported the correct fault location only for version v36, in which the faulty statement is data-dependent on the program output.
The columns in the second section, "#Succ. fix localization", of Table 2 show the number of runs that each technique can output an MFC that is both correct fault location and feasible angelic fix candidate. As the result showed, all 2156 in a total of 2156 runs of the AgxFaults and PF are successful fix localization runs, while FS and FI techniques succeed in 2027 and 1345 runs, respectively.
The columns in the last sections of the table show the EXAM score of each technique. On average, the EXAM score for Agx was 6.8%, PF was 6.9%, FS was 11.5%, FI was 17.7%, and execution slice was 17.6%.

Result of RQ3: Efficiency and Scalability
To answer RQ3, we compare AgxFaults with PF, FS, and FI techniques based on their computational expensive. We use the CPU time as a metric for measure the computational complexity. Specifically, we measure and report the CPU time spent on the solver to solve formulas and the total running time of the techniques for each run. Figures 8 and 9 show the formula solving time-the CPU time spent by the solver to solve formulas-and the total running time of PF, Agx, FS, and FI formula techniques for the 41 buggy versions of the TCAS program. Both the formula-solving time and total running time of Agx were significantly smaller than those of PF for most of the versions. On average, the Agx approaches was 28% faster than PF, but three time slower than FS and 51 time slower than FI at formula solving. The Agx approaches was 45% faster than PF, but 9.4 time slower than FS and 9.4 time slower than FI at total running. Because the computational complexity of AgxFaults and PF approaches are proportional to the loop unwinding bound that limits the maximum number of nested iterations for each loop in the target program, indeed, the computational complexity increased when the loops unwinding bound increased. Thus, we used the set of programs containing loops in Bekkouche's benchmark to further evaluate the scalability of fault localization approaches with respect to loop unwinding bound. These programs included various variations of the SquareRoot, Sum, and BSearch programs. SquareRoot is a program that finds the integer part of the square root of an integer number. Sum is a program that computes the sum of all natural numbers from one to a given input value. BSearch is a program that implements the binary search algorithm to search in an increased array of integers. We run a fault localization technique multiple times for each program and each failing test case. We varied the maximum number of loop unwinding for the program execution trace from 10 to 100.  Figure 10 shows the average formula solving time of Agx and PF approaches when applied to programs with increasing loop unwinding bounds. As shown in this graph, for a small loop unwinding bound, the formula solving time of PF and Agx are similar. However, when the number of loop unwinding bounds was increased, the time it took the PF approach to solve the problem increased exponentially, while that of the Agx approach increased at a significantly slower rate.

Result of RQ4: Real Software Bugs
This experiment is to evaluate the capability of AgxFaults on real bugs in large and complex projects. Table 3 has details of the projects and characteristics of the bugs that are used in this study. Columns "Name and "LOC" in the table show the project name and the number lines of the Java code in these projects. Columns "Bug ID" and "Description" show the unique id that identifies the bug and a description of each bug. Columns in the "Patch size" section of the table show the complexity of the patch written by the developer to fix the bugs. Specifically, columns "Add", "Del.", and "Edit" show the number of lines that the developer has added, deleted, and edited to fix the bugs. Table 4 shows the results of our method for each bug in the benchmark. Column "#MFC shows the number of angelic fix candidates that AgxFaults found for a failing test case. All these generated MFCs are feasible. Column "#Susp. Lines" shows the number of distinct lines reported in the list of MFCs. Column "Found actual fault?" describes whether the reported lines contain the actual faulty statements or not ("yes/no"). Column "Exam lines" shows the number of lines of code that the developer needs to examine to identify the first fault location. "Solver time" and "Run time" shows the time spent on SMT solver and the total running time of AgxFaults for each buggy program.  Let us consider bug Chart 5, for example. Figure 11 shows how the developer fixed the bug. To fix the bug, the developer has (1) changed the condition expression of the if statements at line 548 and (2) added additional code at line 544. AgxFaults found 6 MFCs for this bug, shown in Figure 12, and all of these MFCs are feasible (replacing the value of the suspicious expressions with the corresponding angelic value actually results in a successful execution). There are a total of 7 lines of code reported in all MFCs. The actual faulty lines (i.e., the "if statement" at line 548) are reported in 4 MFCs, which are mfc3, mfc4, mfc5, and mfc6. All these MFCs contain the buggy line, together with one additional statement. This result indicates that modifying the buggy if-statement alone is not enough to make the failing test case pass. The developer should modify both the if-statement at line 548, together with one additional statement, as reported in mfc3, mfc4, mfc5, and mfc6. For example, mfc3 shows that modifying the if statement at line 548, together with the assignment "return = 0" at line 203 of the file XYDataItem.java, can make the failing test case a success. The number of lines that the developer has to examine before identify the faulty line is 3, as he needs to examine two lines in the mfc1 and mfc2 before checking the mfc3. The total running time of AgxFaults is about 3 seconds, of which the SMT solver accounts for 0.47 s.  In a total of 22 bugs in this study, there are 9 bugs that AgxFaults reported with the actual faulty line at the first candidate to the developer. A developer needs to examine less than 5 lines to identify the actual faulty line in all cases, except for the bug Codec18. The total running time for each run is a few seconds, which is acceptable.

Comparison with Existing Techniques on Real Bugs
Because the program-formula approach crashed or timed out without generating any MFCs when it was ran on the bugs in the real bug benchmark, we can only compare the result of AgxFaults with those of the single-path formula approaches FS and FI. Table 5 shows the comparison of the results generated by AgxFaults with those generated by the FS and FI approaches on the real bugs benchmark. Column "Trace size" shows the number line of code that is executed in the error trace of the bug. The "#MFC columns show the number of minimal angelic fix candidates that each technique AgxFaults, FS, and FI produced for a failing test case. The "#Susp. Lines" columns show the total number of lines that each techniques reported as suspicious. The "#Exam lines" columns show the number of lines of code that the developer needs to examined to identify the first fault. Empty value in the "#Exam lines" means that the developer would not find any faulty statement in the list of suspicious statements produced by the tool, i.e., the tool cannot report any actual faulty statement in the buggy program.
As the result shows, in total 22 bugs, AgxFaults outperforms FS and FI in terms of fault localization effectiveness since, as shown in Table 5, in a total of 22 bugs in this study, AgxFaults successfully reported the actual faulty line in 19 bugs, while FS succeeded in 9 bugs, and FI succeeded in 5 bugs.

Threats to Validity
The most important internal threat to validity in our evaluation is that we implemented the existing techniques that we compare against. Since we are targeting Java program, unfortunately, all techniques that we compare against target C or C++ programs. Another internal threat is the possibility errors in our implementation of the core-guided incremental sequential partial MaxSMT algorithms. To reduce the threats, we have made all the source codes of our implementation available online in an open-source repository.
The main external threat to validity is that we performed our evaluation on two simple programs and some bugs in real open source projects. They do not necessarily represent all types of programs and bugs; thus, our results may not generalize. Another external threat is that the SMT solver Z3 and the JConstraints library, which we used in our implementation, may contain bugs.

Related Work
For decades, many automated techniques have been proposed for automatic fault localization. We refer the interested reader to the survey work of W. Wong et al. [6] for a systematic literature review. In this section, we give a brief overview of the most popular fault localization techniques, such as spectrum-based, slicing, mutation-based, and, especially, focus on formula-based fault localization, which is closely related to our work.

Spectrum-Based Fault Localization
Most existing automatic fault localization techniques are spectrum-based fault localization (SFL) [4,6]. SFL profiling the buggy program with a given test suite and count the number of passing and failing tests that cover a statement. Based on this coverage information, they compute for each statement a suspicious score which measures the likelihood of them being faulty. The output to developers a list of statements ranked by their suspicious score. SFL techniques require lightweight computation; thus, they can be applied to a very large program. However, these techniques usually return a long list of program entities with no context information. Moreover, in order to rank the actual faulty statement at the top of the suspicious list, they require to have a comprehensive test suite that contains sufficiently many passing and failing executions. These limitations limit the usefulness of their fault localization result. Our approach required only a single failing test case, and it can return a small sets of suspicious statements, where a suitable modification can make the test become passing.

Program Slicing Based
Slicing-based techniques [31] use program dependence information to reduce the suspicious scope to a subset of statements that might affect the wrong value of variables at the failure site. Since all statements, together with the corresponding dependencies, are taken into account, slicing-based techniques often return an imprecise list of suspicious statements. B. Hofer and F Wotawa [32] combine dynamic slicing with constraint solving to produce a more precise list of suspicious statements compared to dynamic slicing.

Mutation-Based Fault Localization
Mutation-based fault localization (MBFL) [8] is a recent direction that utilizes mutation analysis in fault localization. These techniques first use a set of syntactic change operations (i.e., mutation operations) to mutate the program code in order to generate several variant programs, called mutants. They then run these mutations with test cases and measure how the test execution results change when a code element is mutated. Based on this information, MBFL techniques statistically infer program elements that are highly relevant to the fault. A limitation of MBFL techniques is the huge mutation execution cost [9] because they need to generate a large number of mutants, combine them, and run these mutants with many test cases.

Formula-Based Fault Localization
Bug-Assist [12,22], SNIPER [14,33], and F. Wotawa [13] construct a formula that semantically represents all possible executions in a buggy program (unwinding to a given bound) and extend this formula in conjunction with clauses encoding input and expected output of a failing test case to form an unsatisfiable error trace formula. Bug-Assist [12,22] and SNIPER [14,33] treat this error trace formula as an instance of a partial MaxSAT problem, in which the clauses encoding test input and expected output is marked "hardclause", and the clauses encoding program statements are marked "soft-clause". They use a pMaxSAT solver to obtain the MCS and report the program statements that correspond to clauses in the MCS as possible faults. To reduce the formula solving time, S. Lamraoui et al. [15] determine correct basic blocks (CB), which are basic blocks that do not participate in any failing executions, and set all clauses related to statements in these CBs as hard-clause. Instead of using MaxSAT solver to obtain MCS directly, F. Wotawa [13] derives the MCS by computing irreducible infeasible subsets (minimal hitting set) of the error trace formula. Our approach is similar to these MaxSAT-based approaches in finding minimal sets of program locations where an angelic fix may exist. However, our approach differs from these approaches in several aspects. First, while these approaches are based on a static error trace formula, which may lead to over-complex or insufficient for reasoning, our approach is based on a formula that is constructed dynamically on-demand in order to gain the trade-off between efficiency and complexity. Second, instead of using a MaxSAT solver, we adapt the core-guided maxsat algorithm to manipulate and solve the formula incrementally. E. Ermis [17], U. Christ [18], O. Chanseok [34], and M. Bekkouche [19] work on error trace formulas that represent a sequence of program statements in which execution produced an error; however, we referred as single-path formula. E. Ermis [17] and U. Christ2013 [34] leverage Craig Interpolation to find Error Invariants for every point in the error trace, where an error invariant for a position in a trace is a condition that the error will occur if the program is continued from that position. Based on Error Invariants, they can semantically remove all irrelevant statements from the error trace, thus resulting in a shorter error trace which is easier to localize bugs. Compared to our approach, both approaches can output a reduced error trace which contains error-relevant statements only (in our approach, the reduced error trace can be reconstructed by sort all statements in MCS by the executed order); in addition, our approach also provides a suggestion about how to fix these bugs. Moreover, while our approach uses incremental SMT solving, which is commonly supported by recent SMT solvers, these error invariant approaches require an interpolation solver, which is not popular.
Bekkouche et al. [19] encode assignment statements on a given error path into an error trace formula. They used a MaxSAT solver to compute the MCS of this formula and reported the corresponding statements as possible faults. An attempt is made to divert at most k conditional branch decisions on the error path to find alternative corrected paths. For each corrected path found, diverted conditional branches and also the MCS of the trace formula constructed on the path that reached the first diverted condition are reported as possibly faulty. Similar to this approach, we also diverted the counterexample path to find corrected executions. However, there are several difference between our approach and that of Bekkouche et al. [19]. First, our approach finds corrected paths by diverting not only branch decisions but also assignment statements. Second, instead of trying to check all possible diverted paths by exhaustively diverting bounded subsets of conditions on the counterexample path, we encode the possible effect of diverting operations into the trace formula to leverage the search capacity of the solver. As a result, by analyzing MCSs obtained from the solver, a much smaller number of diversion attempts is needed to derive correct paths than is required using Beckkouche's approach.
W. Jin and A. Orso [16] proposed two techniques called on-demand formula computation (OFC) and clause weighting (CW) to mitigate the computational expensive and improve the accuracy of formula-based fault localization. Specifically, OFC (1) encodes only statement instances in the original failing trace into the error trace formula, (2) computes all MCS of the constructed formula. (3) If there is a conditional statement st such that (i) st is found in an MCS and (ii) a branch b of st is still not encoded in the formula, then the OFC expands the formula by encoding all statement instances in the branch b and go back to step (2). Otherwise, the obtained MCSs are reported as the final output. OFC and our proposed ATF encoding similar in encoding only partial program into formula in an increasing manner. There are two main differences between the OFC and our method ATF. First, the formula in OFC approach is expanded to encode all branches of conditional statements that occurred in an MCS, while the ATF formula is expanded to refine abstracted conditional branches that are included in an angelic execution of the angelic program. Second, our approach does not require computing all MCS of the intermediate formula, as the OFC does, instead, it computes one MCS at a time and stops computing MCS when the formula needs more refinement.
In the Angelic Debugging approach [21], program expression(s) in suspicious scope (provided in advance) are replaced with a nondeterministic expression (i.e., an angelic choice which can return arbitrary value). Then, they use symbolic execution to find a successful execution in the transformed program with the input is fixed to a given failing test input. If such a successful execution exists, the translated suspicious expression is considered a fix candidate, and the concrete values of the non-deterministic expressions is reported as angelic values, or a suggestion to fix. The limitation of this method is that it performs on each expression separately; thus, they need to run the symbolic execution many times, each time for checking one expression. Thus, they have to call to SMT solver many times for solving different formulas. One limitation of this method is that it does not return minimal results. Indeed, it can output a successful angelic execution by replacing all statements in the suspicious scope with angelic value.

Automatic Program Repair
Automatic program repair (APR) [7,[35][36][37] is a hot research topic in software engineering currently. These techniques try to provide the developer with actual patches that can make the buggy program pass a given test suite, which it originally fails. Automated program repair techniques usually start by using a fault localization or a fix localization [37,38] to identify a subset of code elements at which a patch can be applied. The effectiveness of the fix localization task is critical important to the effectiveness, as well as the reliability of automatic program repair [39,40]. The fix localization components in the semantic-based APR approaches (such as Angelix [37] and Nopol [36]) share the same objective with our approach that is finding angelic execution paths that make the failing test case to be a pass. Our method differ to these techniques in several points. The angelic fix localization in Nopol find angelic values only for conditional expression, they assume a single modification, and they do not use solver. Our approach similar to the angelic forest extractor component in the Angelix, as they find angelic values for both assignments and conditional expressions. However, our method produce angelic execution path by modifying minimal locations, while Angelix does not constraint the size of the fix locations.

Conclusions
In this paper, we presented AgxFaults, a formula-based fault localization method that aims to automatically find minimal sets of program locations where a bug fix might exist. We implemented AgxFaults as an extension of the Java Path Finder for automatic localizing fault in Java programs. We used AgxFaults to localize faults in various benchmarked programs of different sizes and compared the performance of AgxFaults to existing formulabased fault localization approaches. The experimental results demonstrated that our proposed method outperformed single-path formula approaches in terms of effectiveness. AgxFaults was comparable to the Program Formula approach in terms of effectiveness but was better in efficiency and scalability. We also demonstrated the capability of AgxFaults when applied to bugs from large real-world software projects.  Data Availability Statement: The implementation and benchmarks data are publicly available at https://bit.ly/agxfaults.