Patch It If You Can: Increasing the Efﬁciency of Patch Generation Using Context

: Although program repair is a tremendous aspect of a software system, it can be extremely challenging. An Automated Program Repair (APR) technique has been proposed to solve this problem. Among them, template-based APR shows good performance. One of the key properties of the template-based APR technique for practical use is its efﬁciency. However, because the existing techniques mainly focus on performance improvement, they do not sufﬁciently consider the efﬁciency. In this study, we propose EfﬁGenC, which efﬁciently explores the patch ingredient search space to improve the overall efﬁciency of the template-based APR. EfﬁGenC deﬁnes the context using the concept of extended reaching deﬁnition from compiler theory. EfﬁGenC constructs the search space by collecting the ingredient required for patching in the context. We evaluated EfﬁGenC on the Defects4j benchmark. EfﬁGenC decreases the number of candidate patches from 27% to 86% compared to existing techniques. EfﬁGenC also correctly/plausibly ﬁxes 47/72 bugs. For Future work, we will solve the search space problem that exists in multiline bugs using context.


Introduction
An automated program repair(APR) can reduce the debugging costs by automatically fixing a buggy code [1,2].Moreover, the template-based APR technique is one of the techniques showing good performance among the APR techniques [3][4][5].It generates a template from the commit history.FixMiner [6] collects patch history from open-source repositories.It used a rich edit script to capture the structure of the AST and then used it to generate the patch pattern.TBar [3] verifies templates from existing template-based APR.It then checks the patches generated using such templates.
For patch generation, template-based APR approaches additionally leverage various context information about the buggy code.ConFix [7] uses the AST node near the modification point as a context to efficiently explore the patch history and changes.CAPGEN [8] uses genetics, variables, and dependency similarities between suspicious codes and candidate patches as context.Furthermore, it utilizes patch prioritization to increase performance.
The main metrics of existing template-based APR methods focus on a performance evaluation [9].Liu et al. [9] showed that the performance of the APR technique has steadily improved.However, the efficiency, which is a key property for the practical use of the APR technique, has not improved.
To improve efficiency, APR require an effective search strategy for search within a reasonable amount of time.Among the benchmark Defects4j bugs, Figure 1 shows the developer patches for the Lang-24 and Closure-125 bugs.Both patches can be generated using the Mutate Conditional Expression template proposed by TBar, one of the latest template-based APR techniques.However, in the case of TBar, the same patch can be generated only for Lang-24, and not for Closure-125.The major difference between the two patches is that, in the case of Lang-24, only one variable hasDecPoint is added, and in Closure-125, fnType.hasInstanceType(), a method invocation, is added.Compared to Lang-24, Closure-125 has an exponential search space because it requires an additional search of the class and method.If TBar can effectively search the ingredient search space, it will produce a patch equivalent to that of the developer of the closure-125 bug within a given amount of time.
In this paper, to improve the efficiency of the template-based APR, we propose Effi-GenC (Increasing the Efficiency of Patch Generation using Context) that efficiently explores the search space of ingredients using context.EffiGenC considers the statement related to the target statement as context.For this, we extend the concept of reaching definition in compiler theory.Reaching definition for a given statement is the closest earlier statement whose target variable can reach it without an intervening assignment.EffiGenC obtains the statements and methods that are the context of target statements through reaching definition.It explores the context and collects the patch materials needed to generate a patch.This experimental study on five state-of-the-art template-based APR systems demonstrate that, overall, EffiGenC can reduce the number of candidate patch by up to 86%.Even when we extend the search space from file to project, the number of candidate patches increased by only 29% compared to the exponential increase of ingredients.
The contributions of this study are as follows.
• New context concept through extended reaching definition.

•
An APR technique to efficiently explore the patch ingredient search space.

•
Evaluation of APR performance and efficiency through Real java dataset.
The rest of this paper is organized as follows.The following Section 2 summarizes the terms for understanding the proposed approach.Section 3 presents the detailed process of the proposed technique.Sections 4 and 5 present our experimental setup and results.Section 6 discusses the limitation of our approaches.After surveying the related studies in Section 7, we provide some concluding remarks in Section 8.

Terminology
Sensical patch versus nonsensical patch.A sensical patch can successfully compile a buggy program.A nonsensical patch cannot successfully compile a buggy program [9].
Plausible patch versus in-plausible patch.A plausible patch allows a buggy program to successfully compile and pass all test cases in the available test suite.An in-plausible patch still allows the buggy program to successfully compile but fail to pass certain test cases in the available test suite [9].
Correct patch versus incorrect patch.A correct patch is semantically equivalent to the developer-provided patch, based on a manual examination.An incorrect patch is a patch that is incorrect [10].
Patch ingredient.APR use identifiers or operators to create a template for a concrete patch.For example, variables and method names.

Approach
Figure 2 shows the overall process of EffiGenC.The first step is a fault localization.EffiGenC calculates a list of suspicious statements for the buggy project in this step.The next step is to select a fix template.Next, EffiGenC constructs the context of the suspicious statement using the buggy project.The fourth step is patch generation by exploring the context to obtain the patch ingredients and make the template into a concrete patch.The last step is the validation, which runs the preparation of the test suite and obtains the valid patch.

Fault Localization
In the fault localization step, EffiGenC derives a ranked list of suspicious statements using test cases for the buggy project.Among the different fault localization techniques, EffiGenC then uses the spectrum-based fault localization technique Ochiai [11].APR studies [3,7,9,12] used Ochiai to calculate suspicious statements.

Select Fix template
In this step, EffiGenC selects the fix template for patch generation.EffiGenC uses 15 templates introduced in the existing studies on template-based APR [3].EffiGenC selects a template by exploring the AST of the suspicious statement.EffiGenC identifies the node type for each AST node.It selects an available template based on whether it matches the node type of the template.In addition, it selects templates for all nodes belonging to the AST of suspicious statement.During the patch generation, EffiGenC generates a candidate patch by applying a template from the root node.

Context Construction
EffiGenC constructs the context for the suspicious statement from the buggy project.Among the existing APR techniques, there is a technique [13] that identifies the part to be fixed together using reaching definition.Reaching definition is one of data-flow analysis.It can statically determine which definitions may reach a given point in the source code.EffiGenC extends this concept to collect ingredients related to suspicious statements.EffiGenC constructs context for the suspicious statement.Algorithm 1 shows the context construction process.EffiGenC extracts the identifier by exploring the AST of the suspicious statement (Line 1).It checks whether the context element appears in the AST of the statements appearing in the buggy file to which the suspicious statement belongs (Line 4-7).If appears, the corresponding statement is added to the list (Line 7-10).Based on the configured statement list, EffiGenC collects the method name and parameter information to which the statement belongs (Line [13][14][15][16][17][18][19].If the list already has the method information, do not include it to avoid duplicates.Finally, it returns the statement list and method list as the context.for Element e ∈ ContextElementList do 6: ElementList = extractIdentifier(S)  3a shows the developer patch of closure-10.And Figure 3b is an example of constructing context for the 1417th line where the patch is applied.In this example, context element is allResultsMatch, n and MAY_BE_ STRING_PREDICATE.Based on these, EffiGenC compute the related statement in the file and present the four statements in the example.Also, EffiGenC computes related methods.statements #4 is an assignment of the global variable, so the example shows only three methods list.EffiGenC generates a patch using a total of seven lists including the statement and method as context.

Patch Generation
In the patch generation step, EffiGenC generates a patch using a template and context for a suspicious statement.Algorithm 2 shows the patch generation process.EffiGenC first checks the template requires ingredients (Line 1).Among the types of templates, MoveStatement and MutateDataType, do not require the ingredient.Therefore, in the case of templates that are not required, it is possible to generate candidate patches only with suspicious statements and templates (Line 13).If the ingredient is required, initialize the ingredient set of variables, methods, and expressions(Line 2-4).After extracting variables, methods, and expressions from statements belonging to Context, sets V, M, and E are generated, respectively (Line 5-8).EffiGenC searches the AST tree of the statement, checks each node type, and includes it in the set.Ingredient consists of three sets (variable, method, expression), and finally, by inserting ingredients into the template, it goes through the concretization process to make concrete patches (Line 10-11).3b shows how to extract variables, methods, and expressions.There are a total of eight lists including statements and methods.For example, in the case of statement #2, n and parent are variable.We can check the method getParent.Finally, n.getParent, which is a MethodInvocation, is extracted as an expression.In the concretization process, EffiGenC generate a patch using the whole expression rather than splitting it (e.g., update the expression) Combining the list of ingredients for each statement results in a set like the bottom of the example.We can observe anyResultsMatch is included in the method list, which is necessary when generating the correct patch.

Validation
After patch generation, EffiGenC validates the candidate patch by running the test suite.If the candidate patch passes all of the prepared test cases, EffiGenC treats the candidate patch as a valid patch, and the EffiGenC is terminated.
If the candidate patch cannot pass the test cases, EffiGenC discards the patch and validates the next candidate patches.If all candidate patches fail to pass the test suite, EffiGenC generates the patch from the next template.If there is no other template to apply the patch, EffiGenC applies the next rank of the suspicious statement.The EffiGenC is terminated if a valid patch is generated, the program execution time reaches the specified timeout, or the number of generated candidate patches reaches the specified maximum number of candidate patches.

Research Question
The following research questions are investigated: We propose a hit ratio metric to evaluate whether our proposed context is of high quality.The hit ratio is a metric that checks whether the ingredient pool contains the ingredients required for the correct patch.Figure 4a shows the developer patch of the Chart-20 bug.To generate the same patch as the developer patch, outlinePain and outlineStroke variables are required.Figure 4b presents an example of extracting ingredients for a suspicious statement, context, and file.For suspicious statements, the hit ratio is zero because there are no ingredients required for the correct patch.In the case of the context and file, each contains one and two, and thus the hit ratio will be 0.5 and 1, respectively.

Evaluation Dataset
For the evaluation, we used Defects4j 2.0.0 [14].Defects4j is a framework that collects real bugs of Java projects and is used for evaluation in many existing studies [3,7,8].For the same comparison with previous studies, we experimented on 6 projects and 395 bugs among the bugs of Defects4j 2.0.0.Table 1 shows a list of the projects and the number of bugs per project.Column #Bugs shows the number of buggy versions in the project.
Column #Tests and LOC refer to the number of JUnit tests and lines of code available within the latest version of each project.

Implementation
For this experiment, we implemented EffiGenC on top of TBar.EffiGenC leverages the GZoltar [15] framework to automate the execution of the test cases for each buggy program.We use the Ochiai metric to compute the suspiciousness scores of the statements for fault localization.We set the maximum number of candidate patches to 20,000.The timeout is three hours.We run the experiment on Ubuntu 20.04.We use an Intel Core i5-10600 @3.30GHz CPU and 32 GB of RAM.To verify the quality of our proposed context, we evaluated the context for 125 bugs [9] that the existing APR techniques could generate a patch for among the Defects4j bugs.We construct the context based on the statement to which the developer patch is applied.In the same way as the patch ingredient extraction of EffiGenC, we construct an ingredient pool for suspicious statements and files.
Number of Patch ingredient.Table 2 shows the average number of ingredients in each group.group sus.had an average of 3.1 patch ingredients, group context had an average of 38.7, and group f ile had an average of 176.1 patch ingredients.Taking group f ile as 100% and calculating the proportions of each group, group sus.was 1.8%, and group context was the only 22%. Figure 5 shows the distribution of patch ingredients in each group for 125 bugs as a box plot.We found that group f ile had the most patch ingredient, and group context had less distribution overall than group f ile .
Hit Ratio.Table 3 presents the average hit ratio for each group.The average hit ratio of the group sus.was 25.2%, group context was 61.3%, and group f ile was 73.2%.There was a patch ingredient that it could not find even if it looked at the entire file.Because some patches need identifier belonging to other packages or classes, or it needs new variables.As a result of calculating the hit ratio considering only the case where group f ile was able to find it, group sus.reached 34.4%, and group context reached 83.7%.Ratio of Perfect Case.In order to generate a correct patch, APR requires all ingredients for patch generation.We additionally calculated the frequency of perfect cases; the cases that had all the ingredients for the correct patch for each group.Table 4 shows the number of perfect cases for each group and the ratio of each group to group f ile .There were 20, 62, and 77 perfect cases for each group, and when converted to a percentage for group f ile , group sus.was 26%, and group context was 80.5%.Finding 1. Through the context, even a small ingredient pool can be sufficient to include correct ingredients.Moreover, the perfect case is 80.5%.It can be effective in reducing the patch ingredient search space.

RQ2: Efficiency of EffiGenC
Following the previous study, [9], we compared the efficiency of publicly available template-based APR techniques [3,6,12,16,17].We use the Numbero f patchcandidate (NPC) as an efficiency metric, in which the existing study presented as an APR efficiency comparison [9].We calculate the NPC score as the sum of the number of nonsensical patches, in-plausible patches, and valid patches.For the results of the existing technique, we refer to existing studies [9].
Figure 6a shows the NPC score comparison results between EffiGenC and the templatebased APR techniques through a boxplot.In this experiment, we computed the number of candidate patches until a valid patch was generated.The number of candidate patches on the x-axis is a log scale.EffiGenC generated lower candidate patches compared to all template-based APR techniques.When we compare the average values, EffiGenC reduced the NPC score from a minimum of 27% to as much as 86%, compared to existing techniques.In addition, we can observe that EffiGenC is effective in most cases because the overall distribution is decreased, not just the mean or average value.
Figure 6b shows the result of comparing the number of nonsensical patches.EffiGenC generates the lowest number of nonsensical patches except for SimFix.When we compare the average value, it reduces the nonsensical patches from at least 53% to 87%. Figure 6c shows the result of the number of in-plausible patches.EffiGenC generates fewer inplausible patches than kPAR, SimFix, and TBar.
However, it still produces more in-plausible patches than AVTAR and FixMiner.We can see that EffiGenC is implemented based on TBar.EffiGenC can be extended to any other template-based APR.Therefore, if we extend the EffiGenC to FixMiner and AVATAR, we can check the search space reduction based on the context.Although SimFix does not generate nonsensical patches, it is less efficient than EffiGenC because all patches it generates are in-plausible.
Finding 2. EffiGenC can generate valid patches with only a small number of computations through the proposed context.The average NPC scores, nonsensical and in-plausible patches are smaller than most template-based APR techniques.Therefore, EffiGenC can increase the efficiency of patch generation through the context.

RQ3: EffiGenC Space Reduction
To check the search efficiency of EffiGenC, we compare the number of NPCs and correct patches according to the search space.The vanilla version of EffiGenC selects the related statement and related method in the search space in which the suspicious statement is included.We expand this scope to the package and project.
Table 5 shows the NPC score and number of correct patches according to the search space.The rows present the NPC score results.The last row presents performance results.We manually examine the patches generated by EffiGenC and consider a patch correct if it is semantically the same as the developer patches.When we set the search space to File, EffiGenC generates an average of 37.1 candidate patches.Moreover, it can generate the correct patch for 47 bugs.When we expand the search space to package and project, the number of candidate nonsensical patches and in-plausible increased.In addition, EffiGenC can increase the number of correct patches than existing techniques.
Figure 7 shows the total amount of ingredients that can be extracted from context according to the search space.We calculated the ingredient for bugs that EffiGenC can generate valid patches.We calculated the total amount of ingredients as the sum of the number of variable, method, and expression set elements.The x-axis represents the number of ingredients on the log scale.When the only file was targeted, the number of ingredients was the smallest, followed by the package and project.The average number of ingredients was 60.2 for the file, 1210.1 for the package, and 2189.7 for the project.We confirmed that the most optimized version is the vanilla version when considering the search space, efficiency and performance.Also, when the search space is increased to package and project, even if the ingredient increases, EffiGenC can generate a small number of candidate patches through efficient exploration.
Finding 3.Although EffiGenC has expanded the scope of collecting patch ingredients to package and project, we can observe that it explores efficiently through context compared to the growing search space.Moreover, we suggest the best search space as the file when both computation cost and performance are considered.

RQ4: Comparison with the State-of-the-Art
In this section, we investigate the overall performance of our default EffiGenC.We manually validate the correct patch, such as RQ2.For the results of the existing technique, the results of the previous study [3,9] were referred to.
Table 6 shows the number of correct and plausible patches each technique generated for the Defects4j project.In the table, the first number of result denotes the number of correct patches generated by the technique, and second number denotes the number of plausible patches.EffiGenC generated 47 correct patches, which was the largest number of correct patches.Figure 1b is a case in which the existing APR techniques cannot generate correct patches, but EffiGenC succeeds.In the case of the bug, f nType is the context element, and EffiGenC computes the related statements based on this.Of the many ingredients in the file, only the necessary ingredients including f nType.hasInstanceType()were extracted, so EffiGenC can generaete the correct patch.Finding 4. EffiGenC was able to efficiently explore patch ingredient search space, and generate the correct patches for bugs that the existing template-based APR techniques failed to generate.

Threats to Validity
Benchmark overfitting patch.The validity can be threatened by the benchmarks used in the evaluation.Although Defects4j is a high-quality Java project bug framework, there is a threat in which the patches generated by each APR only overfit that bug [18], and there is a risk because the framework does not cover all bug types.However, many APR studies have evaluated the performance of patch generation using benchmarks [14,19,20].
Additional computing cost.EffiGenC can efficiently generate patches by reducing the patch ingredient search space.We also show this through the NPC score.The computational cost of constructing the context and collecting ingredients from such context does not appear in NPC score.However, as a result of running TBar and EffiGenC in the same environment, it took an average of 661 s for TBar and 580 s for EffiGenC to generate the correct patch.Therefore, we can observe that the context construction and patch generation process of EffiGenC are sufficiently efficient.
Scalability.For the experiment, we implemented EffiGenC on TBar.Therefore, it can be observed that the patch ingredient search space construction method of EffiGenC is limited to TBar.The context construction of EffiGenC is a method that can be applied to any technique that uses suspicious statements regardless of TBar.In addition, if it is template-based APR, the concretization process that inserts ingredients to make the template a candidate patch is a common process.Therefore, the process of extracting the patch ingredient of EffiGenC can also be sufficiently generalized.EffiGenC can efficiently generate patches regardless of the technique.

Related Work
Research related to APR has been actively conducted [1].APR is largely divided into search-based APR and semantic-driven APR.A search-based APR generates a candidate patch by defining and exploring a space in which a candidate patch exists.GenProg [21] generates a candidate patch by manipulating the existing buggy source code using genetic programming.By contrast, ARJA [10] generates candidate patches for Java programs using multi-objective genetic programming.Unlike genetic programming, which uses stochastic elements, EffiGenC generates candidate patches based on templates collected from previous patch history.
Semantic-driven APR is a technique for generating correct patches using semantic information such as a symbolic execution or the satisfiability modulo theory.SemFix [22] generates a correct patch using symbolic execution, constraint solving, and program synthesis.Angelix [23] generates a patch by introducing the concepts of an angelic path and an angelic forest.Furthermore, Angelix alleviates the problem of scalability, which is a problem in semantic-driven APR.
There are studies using the patch history to increase the number of correct patches PAR [24] generates a new correct patch for the target project that fails to generate an existing correct patch by creating a template with the pattern found by manually analyzing the patch manually generated patch.Prophet [25] generates a machine-learning model that extracts the correct patch characteristics from a human-written patch in an open-source software repository project.The model was used to prioritize candidate patches and increase the rank of the correct patch.EffiGenC also creates patches by exploring the search space more efficiently through the context, rather than using only the patch history.
As research on search-based APR remains active, empirical analyses of the search space and algorithms have been conducted.Wen et al. [26] revealed that the quality of the search space significantly influences the performance of search-based APR when analyzing the search space explored through existing APR techniques.In addition, the quality of the patch is dependent on test cases.A technique for sampling only good test cases is needed to generate the correct patch with a high performance and high efficiency.Fan Long et al. [25] analyzed the density of plausible and correct patches in a space explored through the APR approach and showed that there are plausible patches other than the correct patch.
Owing to the problematic performance and efficiency of search-based APR, studies using context have continued to efficiently explore the search space.SimFix [22] extracts high-level abstract changes from the past patch histories.Based on this, the correct patch is generated by applying a patch to a suspicious statement.In addition, CapGen [8] proposed a patch-prioritization technique to generate more correct patches with an efficient patch validation.To prioritize the patch, the genealogy, variable, and dependency context scores between the suspicious statement and the past patch history were calculated and prioritized based on this technique.ConFix [7] considered the context by extracting the parent and sibling nodes from the previous patch history.When a suspicious statement identified, ConFix extracts the context and applies only the change in the same context existing in the database to more efficiently generate the correct patch.EffiGenC uses the same context as previous techniques.However, we redefine the context using an extended reaching definition.In addition, EffiGenC effectively reduces the ingredient search space required for patch generation.

Conclusions
The existing template-based APR did not sufficiently consider the search space for patch ingredients.We presented the concept of a context based on the extended reaching definition to contain patch information for the target statement.We proposed EffiGenC, which generates patches based on the proposed context.The proposed context contained enough patch ingredients to generate the correct patch.Experiments with Defects4j showed that EffiGenC produced fewer candidate patches.We can see that EffiGenC can explore efficiently a large patch ingredient search space.For future work, we plan to use context to solve the search space problem that exists in the multiline bug, and study techniques for generating patches for more complex bugs.We are currently focusing only on the identifier as an ingredient.We will conduct research that can reduce the search space for identifiers and change actions.In addition, there is an issue about performance deterioration due to out-of-vocabulary in deep learning-based patch generation.We try to solve this issue by collecting identifiers related to bugs from other projects through the proposed context.

Figure 3 .
Figure 3. Concept of context and searching patch ingredient.(a) Developer patch of Closure-10; (b) Example of constructing context and extracting patch ingredient.

Figure
Figure3bshows how to extract variables, methods, and expressions.There are a total of eight lists including statements and methods.For example, in the case of statement #2, n and parent are variable.We can check the method getParent.Finally, n.getParent, which is a MethodInvocation, is extracted as an expression.In the concretization process, EffiGenC generate a patch using the whole expression rather than splitting it (e.g., update the expression) Combining the list of ingredients for each statement results in a set like the bottom of the example.We can observe anyResultsMatch is included in the method list, which is necessary when generating the correct patch.

Figure 4 .
Figure 4. Example of Calculating Hit ratio.(a) Developer patch of Chart-20; (b) Formula of hit ratio and example.

Figure 6 .
Figure 6.Result of NPC score.(a) Distribution of number of candidate patches; (b) distribution of number of nonsensical patches; (c) distribution of number of in-plausible patches.

Table 2 .
Number of elements in ingredient pool.

Table 3 .
Result of hit ratio.

Table 4 .
Number of Perfect Case.

Table 5 .
Efficiency and performance changes according to the ingredient pool.

Table 6 .
Number of Defects4j bugs that are correctly/plausibly fixed by APR tools."C, CL, L, M, Moc, T" represent Chart, Closure, Lang, Math, Mockito and Time.