A Change Recommendation Approach Using Change Patterns of a Corresponding Test File

Change recommendation improves the development speed and quality of software projects. Through change recommendation, software project developers can find the relevant source files that they must change for their modification tasks. In an existing change-recommendation approach based on the change history of source files, the reliability of the recommended change patterns for a source file is determined according to the change history of the source file. If a source file has insufficient change history to identify its change patterns or has frequently been changed with unrelated source files, the existing change-recommendation approach cannot identify meaningful change patterns for the source file. In this paper, we propose a novel change-recommendation approach to resolve the limitation of the existing change-recommendation method. The basic idea of the proposed approach is to consider the change history of a test file corresponding to a given source file. First, the proposed approach identifies the test file corresponding to a given source file by using a source–test traceability linking method based on the popular naming convention rule. Then, the change patterns of the source and test files are identified according to their change histories. Finally, a set of change recommendations is constructed using the identified change patterns. In an experiment involving six open-source projects, the accuracy of the proposed approach is evaluated. The results show that the accuracy of the proposed approach can be significantly improved from 21% to 62% compared with the existing approach.


Introduction
Software systems constantly evolve to improve their quality and extend their lifetime [1][2][3].As a software system evolves, its source files are inevitably changed [4][5][6][7][8].A new source file is added, and the code of existing source files is modified.In modification tasks of software project developers, a change of a source file may affect other source files according to dependent relationship among the source files.If such change impacts cannot be immediately taken into account, software project developers may face unexpected errors in the near future [9,10].Hence, software project developers should catch all the change impacts related to their modification tasks to avoid unexpected errors and reduce maintenance costs of a software system.However, it is typically difficult to manually identify all change impacts related to particular changes of source files in development of a large software system.
Change recommendation can reduce the efforts for identifying the change impacts [11][12][13][14][15]. Through change recommendation, software project developers can immediately identify relevant source files that they must change.The change-recommendation approach is based on a static analysis or a change-history analysis.The basic idea of change-history analysis for change recommendation originates from the concept of association rule discovery [16][17][18].Association rule discovery is a data mining method and an unsupervised machine learning algorithm.Its objective is to extract association patterns between items in a large dataset.
In previous studies [11,12], a change-recommendation approach based on association rule discovery was proposed.Given a source file as a query for change recommendation, the change-recommendation approach analyzes the change history of the source file to derive the change patterns of the source file.In the change-recommendation approach, a change pattern indicates a change association between two source files, and the reliability of a change pattern is determined by change coupling between source files.The change coupling is typically computed by the co-change frequency of source files.For example, two source files have high change coupling if they have been frequently changed together.In contrast, two source files have low change coupling if they have been rarely changed together.According to this concept, the change recommendation approach requires sufficient and also clear change history for source files as much as possible to identify meaningful change patterns.This constraint is a major cause of the degradation of the applicability of the change-recommendation approach [19][20][21].For example, if a source file has insufficient change history that is very short to identify its change patterns, meaningless change patterns of the source file may be identified.In addition, if a source file has been accidentally changed with functionally unrelated source files, the reliability of the change pattern extracted from the change history cannot be guaranteed in the change recommendation [22,23].
In this study, the co-evolution relationship between source and test files is taken into account to resolve the limitation of the existing change-recommendation approach.Generally, in software system development, source files and their corresponding test files evolve together [24][25][26][27].When a new source file is added or an existing source file is modified, the test files related to the added or modified source files are modified to validate the added and modified code [28].A test file is obligated to test several related source files.Hence, we believe that some of the closely related change patterns of a source files can also be identified in the change history of the corresponding test file.The existing change-recommendation approach only considers the change histories related to given source files, not the change histories of the test file corresponding to the given source files.
Based on the aforementioned idea, a novel change-recommendation approach is proposed.The proposed change-recommendation approach considers not only the change history of a given source file, but also the change history of a test file corresponding to the given source file to make a change recommendation.Generally, in software development, source files are paired with their corresponding test files according to a specific naming convention rule [29].The naming convention rule can be often used to trace pairs of source and test files that are explicitly related [30].Given a source file as a query for change recommendation, the proposed approach first identifies a test file corresponding to the given source file by using a source-test traceability linking method based on the naming convention rule.Then, the change histories related to the source and the test files are extracted from a source-code repository.Finally, the change patterns of the source and the test files are identified from the extracted change histories, and then a set of change recommendations is constructed using the change patterns.In our experiment involving six open-source projects, we evaluated the recommendation accuracy of the proposed change-recommendation approach.The experimental result shows that the accuracy of the proposed change-recommendation approach can significantly improve the accuracy of the existing change-recommendation approach in the open-source projects.On average, the proposed change-recommendation approach improved the accuracy from 21% to 62% compared with the existing change-recommendation approach.Therefore, we believe that the proposed change-recommendation approach is useful for real-world software project developments.
The remainder of the paper is organized as follows.Related works are introduced in Section 2. The proposed change-recommendation approach is described in Section 3. The experimental results are reported and discussed in Section 4. The study is concluded in Section 5.

Association Rule Discovery and Change Recommendation
Association rule discovery is a data-mining method for identifying meaningful association patterns from a large-scale data set [16,17,31].An association pattern indicates a specific association between items [32].A representative example of an association pattern is a purchasing pattern on an online shopping site, such as Amazon.com.For example, a common purchasing pattern between televisions and Blu-ray players can be inferred from the fact that televisions and Blu-ray players are frequently purchased together by customers on Amazon.com.The common purchasing pattern indicates that in general, customers who purchase a television also want to purchase a Blu-ray player.Amazon.comcan use this pattern to recommend a Blu-ray player to customers who add a television product to their shopping cart.
Formally, an association pattern in association rule discovery is defined as follows: Given a set of items I = {I 1 , I 2 , . . . ,I n } and a set of database transactions T = {t 1 , t 2 , . . . ,t m }, where n and m are the total numbers of items and transactions, respectively, an association pattern between two item sets is formed as {A → B}, where t i = I i,1 , I i,2 , . . ., I i,l , l is the total number of the items involved in t i , I i,j ∈ I.A and B are subsets of I, where A ∩ B = ∅.
In the aforementioned example for Amazon.com, a television and a Blu-ray player are included in a set of items I.The purchasing pattern between televisions and Blu-ray players is formed as an association pattern {television → Blu-ray player}.The left-side and right-side itemsets in an association pattern are called antecedent and consequent, respectively.
An association pattern is evaluated according to support and confidence.Support is an indication of how frequently an itemset occurs in given database transactions.The support value of an itemset A is computed with the number of transactions involving the itemset A in the transaction database, as follows: Confidence is an indication of the reliability of an identified association pattern.The confidence of an association pattern is computed by dividing the support value of the union set of the antecedent and consequent by the support value of the antecedent in an association pattern.It is defined as follows: The reliability of an association pattern is judged according to its confidence value.A higher confidence value indicates higher reliability of an association pattern; in contrast, a lower confidence value indicates lower reliability of an association pattern.
The concept of association rule discovery can be applied to identify change patterns between source files from a source-code repository of a software system.A source-code repository such as Git records the change history of all the source files of a software system.Ying et al. [11] and Zimmerman et al. [12] presented a change-recommendation approach based on change patterns that can be identified by mining software repositories.Given a set of source files as a query for change recommendation, the change-recommendation approach identifies the change patterns of the given source files according to the change history of the given source files.The identified change patterns that have confidence value more than the threshold are recommended as a set of change recommendations.The change-recommendation approach has a limitation for identifying meaningful change patterns for source files that have a significantly short change history or have been frequently co-changed with functionally unrelated source files [22,23].In an experiment involving six large software systems, the change-recommendation approach found that only 25% of the change patterns were meaningful [19].
This study began to resolve the limitation of the existing change-recommendation approach.We found a solution in the co-evolution relationship between source and test files.

Coevolution of Source and Test Files
Software testing is essential to develop high quality software systems [24].Unit tests and integrated tests are used to receive feedbacks and identify potential bugs immediately.In addition, written test code allows software project developers to understand a software system [26].For such advantages, the software testing is widely utilized in many recent software project developments.
A source file and its corresponding test file co-evolve [32].When a new source file is added or an existing source file is modified, their corresponding test files are created and modified.Testing comprises 30%-50% of the efforts in software project development.Previous studies [24,27,28] investigated the change impacts between source and test files.Zaidman et al. developed a visualization tool for displaying the change relations among source and test files and then analyzed the evolution history of source and test files in two open-source projects using the tool [24,27].They showed that the evolution of the source and test files may differ according to the testing strategy employed in the software project.Marsavina et al. investigated the fine-grained code evolution of source and test files in five open-source projects using ChangeDistiller [33].Their experiments showed that the change of a source file is often followed by the changes of its corresponding test files.In addition, they observed that closely-related source files often evolve with the same test file.
This study is inspired by the previous works.We believe that it is reasonable to consider the change history of the corresponding test file for identifying the change patterns of a source file.In our study, we focus on extracting file-level change patterns that can be obtained from white-box testing changes not module-level change pattern that can be obtained from black-box testing changes.We did not consider black-box testing changes in our approach due to the following two reasons:

•
Typically, it is thought that file-level change patterns are more appropriate for assisting software project developers' tasks than module-level change patterns because a file is a basic task unit of developers in software project development.

•
In a software project that has very few modules, file-level change patterns are more applicable than module-level change patterns.

Our Approach
In this section, we describe the overall steps of the proposed change-recommendation approach.The workflows of the proposed change-recommendation approach are briefly presented in Section 3.1, and each step is detailed in the following subsections.

Overview of Proposed Change-Recommendation Method
The basic idea of the proposed change-recommendation approach is to consider not only the change patterns of a given query source file, but also the change patterns of a test file corresponding to the given source file.Figure 1 shows the overall workflows of the proposed change-recommendation approach.The proposed change-recommendation approach consists of four steps: identifying a source-test pair, extracting commit histories, identifying change patterns, and comprising change recommendations.Given a source file as a query for a change recommendation, the proposed approach first identifies a test file corresponding to the source file using a heuristic method based on the naming convention rule between source and test files and then extracts the change histories of the source and test files by tracing the entire commit history of a Git repository.Then, the change patterns of the source and test files are identified according to the extracted change histories.Finally, a set of change recommendations is constructed using the identified change patterns.

Identifying Corresponding Test File
Software project developers typically follow a naming convention rule to identify specific test files.Under the naming convention rule, a test file is named by combining the name of its corresponding source file and the string literal "Test".For example, a test file corresponding to a source file "Foo.java" is named with a name "FooTest.java".The naming convention rule allows linking a source file and its corresponding test file easily.Several studies [29,30] presented a source-test traceability linking method that is based on the naming convention rule and validated the accuracy of the source-test traceability linking method.

Identifying Corresponding Test File
Software project developers typically follow a naming convention rule to identify specific test files.Under the naming convention rule, a test file is named by combining the name of its corresponding source file and the string literal "Test".For example, a test file corresponding to a source file "Foo.java" is named with a name "FooTest.java".The naming convention rule allows linking a source file and its corresponding test file easily.Several studies [29,30] presented a source-test traceability linking method that is based on the naming convention rule and validated the accuracy of the source-test traceability linking method.
The proposed change recommendation approach employs the source-test traceability linking method to identify a test file corresponded to a given query source file.The source-test traceability linking method considers name and path of source and test files to identify a corresponded pair of a source and test file.For example, given a source file "org/java/main/model/data/foo.java" as an input, then a test file "org/java/test/model/data/fooTest.java" is identified as the corresponded test file.

Extracting Commit Histories
Once a corresponding test file is identified, the change histories of the given query source and the corresponding test file are extracted from commit history of a source code repository of a software system such as Git repository.Git repository is a distributed version control system and is commonly used to manage changes of files in a software system.
In a Git repository, a commit history is represented by a commit metadata involving change information of several files.A commit metadata consists of a SHA-1 (Secure Hash Algorithm) hash value, committed date, commit author, a commit message and diff information.The SHA-1 hash value is an identifier for distinguishing between commits.The diff information represents several pieces of change information for changed files such as change type, previous path, new path, and textual content of changed code.Figure 2 shows an example of commit metadata represented by JSON (JavaScript Object Notation) format.
source and test file.For example, given a source file "org/java/main/model/data/foo.java" as an input, then a test file "org/java/test/model/data/fooTest.java" is identified as the corresponded test file.

Extracting Commit Histories
Once a corresponding test file is identified, the change histories of the given query source and the corresponding test file are extracted from commit history of a source code repository of a software system such as Git repository.Git repository is a distributed version control system and is commonly used to manage changes of files in a software system.
In a Git repository, a commit history is represented by a commit metadata involving change information of several files.A commit metadata consists of a SHA-1 (Secure Hash Algorithm) hash value, committed date, commit author, a commit message and diff information.The SHA-1 hash value is an identifier for distinguishing between commits.The diff information represents several pieces of change information for changed files such as change type, previous path, new path, and textual content of changed code.Figure 2 shows an example of commit metadata represented by JSON (JavaScript Object Notation) format.
For this study, we implemented a Git commit history extractor using JGit API (Application Programming Interface) [34] to extract specific commit histories from a Git repository.The JGit API is a java library implementing all the commands of the Git.Given a set of files as input, the Git commit history extractor extracts all the commit histories related to the changes of the given files by retrieving a Git repository from the most recent commit to the first commit.For each file in the given files, if a commit contains the entire path of the file in diff information of its commit metadata, the commit is extracted.The proposed change-recommendation approach uses the Git commit history extractor to extract all the commit histories related to the changes of the query source and the corresponding test file.For this study, we implemented a Git commit history extractor using JGit API (Application Programming Interface) [34] to extract specific commit histories from a Git repository.The JGit API is a java library implementing all the commands of the Git.Given a set of files as input, the Git commit history extractor extracts all the commit histories related to the changes of the given files by retrieving a Git repository from the most recent commit to the first commit.For each file in the given files, if a commit contains the entire path of the file in diff information of its commit metadata, the commit is extracted.The proposed change-recommendation approach uses the Git commit history extractor to extract all the commit histories related to the changes of the query source and the corresponding test file.

Identifying Change Patterns
The basic concept of the association rule discovery can be applied to change recommendation [11].An association pattern of an itemset is interpreted as another itemset that has been frequently occurred in database transactions.Similarly, in the context of the change recommendation, a change pattern of a source file is interpreted as another source file that has been frequently co-changed in change history of the source file.
The existing change-recommendation method was developed based on the basic concept of the association rule discovery.The existing change-recommendation method analyzes all commit history of a Git repository to make a change recommendation for given source files, which may be unpractical in software development because it is time-consuming to identify many unrelated change patterns of the given source files.To avoid this shortcoming, we consider only the commit histories extracted from the previous step to identify the change patterns of the query source and corresponding test file.
To ease understanding the identification of the sets of the change patterns of the query source and corresponding test file, we first introduce a few basic definitions.Although a commit contains changes of various types of files such as source, test, text, binary file, we focus on the changes of source files in this paper.Thus, a commit is defined as a set of source files that were changed in the commit as follows: where f i is a source file changed in a commit c and n is the total number of the changed source files in c.For example, if source files f 1 , f 2 and f 3 were changed in a commit c, the commit c is represented as a set of the source files { f 1 , f 2 , f 3 }.
Let the commits extracted at the previous step be C, we can classify the commits in C into two sets of commits for the query source file and the corresponding test file.For example, the commits c ∈ C that contains the query source file are classified into a set of commits for the query source file, and the commits that contains the corresponding test file are classified into a set of commits for the corresponding test file.It is defined as follows: where s and t refer to the query source file and the corresponding test file, respectively.Based on the above definitions, the co-changed source files with the query source and the corresponding test file are defined as the change patterns of the query source and corresponding test file as follows: where c a and c b are a commit involved in the C s and C t , respectively.CP s and CP t are considered to be co-changeable source files with the query source file and the corresponding test file, respectively.Figure 3 shows the overall process of identifying CP s and CP t .
Symmetry 2017, 9, x 8 of 15 For example, if a source file  1 is contained in three commits  1 ,  2 , and  3 ∈   , -ℎ , 1 is computed by 3 (|{ 1 ,  2 ,  3 }|).Using Equation ( 6), the confidence of the change pattern { →   } is computed by dividing the co-change frequency of  and   by the number of commits in the entire commit history of the query source file , as follows:  the co-change frequency of the given source file s and the other source file f i is the number of commit histories in C s that involve the source file f i .It is defined as follows: For example, if a source file f 1 is contained in three commits c 1 , c 2 , and c 3 ∈ C s , co-change s, f 1 is computed by 3 (|{c 1 , c 2 , c 3 }|).Using Equation ( 6), the confidence of the change pattern {s → f i } is computed by dividing the co-change frequency of s and f i by the number of commits in the entire commit history of the query source file s, as follows: For example, if C s has five commits {c 1 , c 2 , c 3 , c 4 , c 5 }, the confidence of a change pattern CP s→ f 1 is computed by 0.6.Using the Equation ( 7), the confidences of all the change patterns in CP s are computed and a set of change recommendations is constructed by selecting the top k change patterns from CP s in the order of the confidence values.Then, all the change patterns in CP t are added to the change-recommendation set.Finally, the change-recommendation set for the given source file s is determined as follows: Here, CP s,k comprises the k change patterns selected from CP s .The value of k can be arbitrarily chosen.If the value of k is high, meaningless change patterns may be recommended, on the other hand, if the value of k is extremely low, meaningful change patterns may be missing.Therefore, the value of k should be determined between 10 and 30.For example, change patterns of the given query source file and the corresponding test file are given as CP s = { f 1,0.8 , f 2,0.8 , f 3,0.6 , f 4,0.5 , f 5,0.2 }, where f 1,0.8 is abbreviation of CP s→ f = 0.8, and CP t = { f 6 , f 7 }, respectively, and the value of k is chosen by three, CP s,k is determined as { f 1 , f 2 , f 3 } and then ChgRec s is determined as { f 1 , f 2 , f 3 , f 4 , f 5 }.
ChgRec s involves not only the change patterns of the given query source file but also the change patterns of its corresponding test file.Although the proposed change-recommendation approach cannot identify any change patterns from CP s,k owing to the short change history of the given source file, it can recommend alternative change patterns from CP t .

Experiment
In this section, we report the results of an experiment performed to evaluate the performance of the proposed change-recommendation approach.The objective of the experiment is to investigate whether the proposed change-recommendation approach has higher performance than the existing change-recommendation approach.The data used for the experiment and the experiment settings are described in Sections 4.1 and 4.2, respectively.The metric for evaluating the performance of the change-recommendation approaches is introduced in Section 4.3.The experimental results are presented in Section 4.4.

Experimental Data
For this experiment, the several software projects that employ at least one test framework in their development and allow public access to their Git repository are required.We chose the following projects for the experiment: commons-lang [35], commons-math [36], JGit [34], Maven [37], Flink [38], and Wicket [39].Commons-lang and commons-math are utility libraries for Java application project development; JGit is a Java library implementing the commands of the Git; Maven is a tool for software project management and integration; Flink is an open-source framework for stream processing; and Wicket is an open-source web application framework based on components.These projects are Symmetry 2018, 10, 534 9 of 15 developed using the JUnit test framework [40] and allow access to their Git repository.Hence, we selected these open-source projects for the experiment.
To collect the experimental dataset from the aforementioned open source projects for our change recommendation experiment, we first cloned the Git repositories of the open source projects from GitHub [41].We then extracted the commits that contain the changes of source files or test files from the cloned Git repositories.In the cloned Git repositories of the open source projects, we determined the pairs of the source and test files that can be used as the queries for the change recommendations in the experiment.First, we identified the corresponding source and test files using the source-test linking method mentioned in Section 3.2.We then excluded the pair of the source and test files that have insufficient commit histories to identify their change patterns.The pairs of the source and test files with less than five commits were excluded from the identified source-test pairs.Table 1 shows the experimental dataset collected from the open source projects.For each open source project, the column of #.Commits refers to the number of commits submitted by developers within the Commit period.The columns of #.Changed Source Files and #.Changed Test Files refer to the numbers of source and test files that have been changed in the number of commits (#.Commits), respectively.The column of #.Pairs of Source and Test refers to the number of paired source and test files in the changed source and test files (#.Changed Source Files and #.Changed Test Files).The column of #.Related Commits refers to the number of commits in which the source files or the test files has been changed.The dataset in Table 1 varies across the open source projects.This is because the open source projects are different in development period, functionality implemented, and testing strategy that they adopted.

Experimental Setting
In the experiment, we compare the proposed approach with the existing method developed by Ying et.al [11].The existing method is based on the association rule mining.Given a query source file and a number of change patterns to be recommended (k) as input, the existing method first extracts the commits of the given query source file from a source code repository.Change patterns are then formed by applying association rule mining algorithm.Finally, k change patterns are recommended according to the confidence value of the change patterns.
To make a change recommendation, the proposed and existing change-recommendation approaches require several arguments, such as a query source file, training commits, and a number of recommended patterns, k, from a set of the change patterns of the query source file.In this experiment, we used the source files involved in the identified source-test pairs as the query source files and used all of the commits related to the source-test pairs in each project for training and evaluation.The proposed approach usually employs more commits for training than the existing approach.For a source-test pair and an evaluation commit, the existing approach uses the commits that precede the evaluation commit in a set of commits of the source for training, while the proposed approach uses the commits that precede the evaluation commit in both sets of the commits of the source and test.For example, given a source-test pair s, t , their commit history {c s,1 , c t,1 , c s,2 , c t,2 , c s,3 } and an evaluation commit c s,2 , the existing approach and proposed approach choose the commits {c s,1 , c s,2 } and {c s,1 , c t,1 , c s,2 , c t,2 } for training, respectively.We set the value of k to 10 according to the previous work related to recommendation system [42].Thus, for each change recommendation, the existing approach makes only 10 change recommendations from the change patterns of a query Symmetry 2018, 10, 534 10 of 15 source file, while the proposed approach makes 10 change recommendations and additional change recommendations from the change patterns of a query source file and the corresponding test file.

Evaluation Metric
To evaluate the performance of the proposed change-recommendation approach, we used an accuracy measurement method.Accuracy measurement is widely used to evaluate various information retrieval methods and recommendation systems [20,[42][43][44][45][46].In this study, the accuracy of a change recommendation is computed as follows: where AC s is a set of actually co-changed source files with a given query source file s.The accuracy ranges from 0 to 1.If none of the recommended source files is included in the set of actually-changed source files, the accuracy is 0. In contrast, if all the recommended source files are involved in a set of actually-changed source files, the accuracy is 1.For example, in a change recommendation, given a set of recommended source files CR s,5 = f a , f c , f e , f f , f g and a set of actually-changed source files AC s = { f a , f b , f c , f d , f e }, the accuracy for the change recommendation is 60%.

Result
Table 2 shows the average accuracy of the proposed and existing approaches in the experiment projects.For all the projects, the proposed approach obtained a significantly higher average accuracy than the existing approach.For the projects, as listed in Table 2, the proposed approach obtained average accuracies of 82%, 70%, 58%, 48%, 56%, and 50%, while the existing approach obtained average accuracies of 20%, 13%, 11%, 27%, 16%, and 16%.On average, the accuracy of the proposed was improved by 43% compared with the existing approach.Comparing the average accuracy between the proposed and existing approaches for each project reveals that the proposed approach improved the accuracy by 62%, 56%, 47%, 21%, 40%, and 34%.
To find out how the proposed approach obtains the improved results, we investigated the change-recommendation results of the proposed and existing approaches.Through the investigation, we observed that even for query source files that have relatively few commit histories, the proposed approach can make correct change patterns from the commit histories of corresponding test files, while the existing approach cannot identify any change patterns.We summarized the root-causes of the incorrect change pattern identification.Figure 4 shows the four types of categories of the root-causes, such as Development environment, Project, Testing and Commit activity.'Short development period' categorized in 'Development environment' is a major cause of lacking commit history to identify change patterns.'Absence of guide' and 'Absence of manual' categorized in 'Project' may cause mistakes of developers in committing.The mistakes may be major causes on 'Missed commit', 'Overlapped commit', and 'Delayed commit'.'Absence of testing strategy' and 'Absence of testing framework' may also affect the quality of commit history.We summarized the root-causes of the incorrect change pattern identification.Figure 4 shows the four types of categories of the root-causes, such as Development environment, Project, Testing and Commit activity.'Short development period' categorized in 'Development environment' is a major cause of lacking commit history to identify change patterns.'Absence of guide' and 'Absence of manual' categorized in 'Project' may cause mistakes of developers in committing.The mistakes may be major causes on 'Missed commit', 'Overlapped commit', and 'Delayed commit'.'Absence of testing strategy' and 'Absence of testing framework' may also affect the quality of commit history.Furthermore, we performed a pairwise t-test statistical analysis to evaluate the difference of the results.A null hypothesis and its alternative hypothesis for the statistical analysis are presented as follows.
H  : There are no statistically significant differences between the proposed and existing approaches.
H  : There are statistically significant differences between the proposed and existing approaches.
We performed a pairwise t-test for all of the paired recommendation results by using the student's t-test function in the R package.Table 3 shows the p-values obtained in the statistical analysis.For all of the experiment projects, the p-values are less than 0.01.It means that the null hypothesis is rejected with 99% confidence.Thus, it is shown that there are statistically significant differences between the proposed and existing approaches in accuracy.This means that the average accuracy obtained by the proposed approach is statistically better than the average accuracy obtained by the existing approach.Furthermore, we performed a pairwise t-test statistical analysis to evaluate the difference of the results.A null hypothesis and its alternative hypothesis for the statistical analysis are presented as follows.
H null : There are no statistically significant differences between the proposed and existing approaches.
H alternative : There are statistically significant differences between the proposed and existing approaches.
We performed a pairwise t-test for all of the paired recommendation results by using the student's t-test function in the R package.Table 3 shows the p-values obtained in the statistical analysis.For all of the experiment projects, the p-values are less than 0.01.It means that the null hypothesis is rejected with 99% confidence.Thus, it is shown that there are statistically significant differences between the proposed and existing approaches in accuracy.This means that the average accuracy obtained by the proposed approach is statistically better than the average accuracy obtained by the existing approach.

Discussion and Implications
The experimental results shown in Section 4.4 demonstrate that the proposed approach can obtain better performance than the existing method.In this section, we discuss why the proposed approach can achieve the performance.The significant difference between the proposed approach and the existing method is Equation (8).The proposed approach contains the change patterns of the query source and corresponding test file while the existing method contains only the change patterns of the query source file into a set of change recommendations.The benefit of the difference is that the change history of the corresponding test file can be considered to discover appropriate change patterns of the query source file when a query source file has short change history.We believe that the advantage of the proposed approach can complement the limitation of the existing method.
This study is impressed by the nature of evolution of test files revealed in previous studies [24,[26][27][28]31,47].In general, source and test files co-evolve in software project development.Modifications of a source file affect several related test files.A test file is needed to verify several related source files.Thus, the change history of a test file reflects the changes of source files that are explicitly related to each other.Hence, we consider that when identifying change patterns of a source file, it is reasonable to consider the change history of the test file explicitly related to the source file.The experimental results in Section 4.4 emphasize that change history of test files should be considered when identifying change patterns of corresponding source files.We believe that this study can progress further studies on change recommendations, and also that the proposed approach in this study can significantly contribute to real-world software project development, especially for young software projects where most source files have a short commit history.

Limitation
The internal validity of this study is related to the identification of source and test pairs.The proposed approach requires the change history of a test file that is related to a given source file.If an incorrect test file is identified, the change patterns obtained from the identified test file may be unreliable.Source-test traceability linking is an important research field.An approach that can perfectly identify traceability links between source and test files has not yet been reported.In this study, we used a source-test traceability linking approach based on the naming convention rule between source and test files to identify test files corresponding to given source files.Then, we manually verified whether the identified pairs were correct in the experiment.Typically, software project developers write test files by following the naming convention rule in most software projects [29,30].Therefore, in this study, it is believed to minimize the bias by using the source-test traceability linking approach in this study.
The external validity of this study is related to the generalization of the experimental results.As the experimental results are obtained from six open-source projects, it cannot be generalized to all software projects.Therefore, it is required to conduct additional experiments on various software projects to reduce the bias.However, the experiment projects have different scales and involve different domains.Thus, we believe that the bias is reduced.

Conclusion
In this study, we proposed a novel change-recommendation approach that considers not only the change history of a given source file but also the change history of a test file corresponding to the given source file.Given a source file as a query, the proposed approach identifies a corresponding test file by using a source-test linking approach based on the naming convention rule between source and test files and extracts the commit histories of the given source file and the identified test file from a Git repository.Then, the change patterns of the source and test files are identified according to the extracted commit histories.Finally, a set of change recommendations is constructed using the identified change patterns of the source and the test files.The proposed change-recommendation approach is evaluated for six open-source projects.For the open-source projects, the proposed change-recommendation approach obtained significantly better accuracy than the existing change-recommendation approach.
In future works, we plan to conduct additional experiments with various software projects to reduce the bias.In addition, we will study to employ our approach for functional change recommendation.We believe that the proposed change recommendation approach can be extended to make functional change recommendations.For instance, if changes of source files are classified into functional changes of a software system, the functional change recommendations can be made by considering functional change history and corresponding testing history of a software system.

x 5 of 15 Figure 1 .
Figure 1.Workflows of the proposed change-recommendation approach.

Figure 1 .
Figure 1.Workflows of the proposed change-recommendation approach.

Figure 3 .
Figure 3. Process of identifying of change patterns of a pair of source and test files.

3. 5 .
Constructing Change-Recommendation Set A change-recommendation set for the given source file is constructed according to the   and   identified in the previous step.The proposed change-recommendation approach first selects k change patterns from   in the order of their confidence values and includes the selected change patterns in a change-recommendation set.It then adds all the change patterns in   to the change-recommendation set.Similar to association patterns, the confidence of a change pattern is determined by the co-change frequency of the source files involved in a change pattern.For a change pattern { →   } in   , the co-change frequency of the given source file  and the other source file   is the number of commit histories in   that involve the source file   .It is defined as follows: -ℎ ,  = |{  |  ∈   ,   ∈   }|

Figure 3 .
Figure 3. Process of identifying of change patterns of a pair of source and test files.

3. 5 .
Constructing Change-Recommendation Set A change-recommendation set for the given source file is constructed according to the CP s and CP t identified in the previous step.The proposed change-recommendation approach first selects k change patterns from CP s in the order of their confidence values and includes the selected change patterns in a change-recommendation set.It then adds all the change patterns in CP t to the change-recommendation set.Similar to association patterns, the confidence of a change pattern is determined by the co-change frequency of the source files involved in a change pattern.For a change pattern {s → f i } in CR s ,

Figure 4 .
Figure 4. Fishbone diagram for root-causes of Incorrect change pattern identification.

Figure 4 .
Figure 4. Fishbone diagram for root-causes of Incorrect change pattern identification.

Table 2 .
Results for the accuracy and improvements of the proposed approach.

Table 3 .
Results of the statistical t-test.