4.1. Influence of Code Smells on Software Bugs
In this subsection, we use evidence collected from the selected studies to answer Research Question (RQ1): To what extent code smells influence the occurrence of software bugs?
presents how each of the selected studies contributed to answer RQ1
. In this case, the following 16 studies provide evidence of the influence of code smells on the occurrence of software bugs: S1–S9, S11–S14, S16–S18. On the other hand, two studies—S10 and S15—state that based on the collected evidence, no cause and effect relationship between code smells and occurrence of software bugs should be claimed. In other words, the majority of studies reached the conclusion that classes affected by specific code smells tend to be more change- and fault-prone. In the following paragraphs, we discuss the aforementioned influences.
classifies the code smells according to their influence on software bugs. The most influential code smells are the ones positively associated with error proneness and recognized as possible causative agents of a significant number of software bugs in the projects analyzed. These studies correspond to those whose answer was Yes
in Table 3
. Therefore, studies S10 [10
] and S15 [1
] were not considered sources for this group of the figure. Note that the No information
node shown in Figure 3
is not a type of code smell. This node merely indicates that the studies in question did not disclose which code smells had a greater or lesser influence on the bugs of the projects analyzed.
From the set of problems mentioned in S10 [10
], roughly 30% percent is related to files containing code smells. Within the limits established by the context of this study, it is clear that the proportion of problems linked to code smells was not as large as could be expected. For example, the findings in S15 [1
] do not support the claim that clones should be approached as a code smell
. The authors argued that most bugs have very little to do with clones and that even cloned code contains less buggy
code, i.e., code implicated in bug fixes, than the rest of the code.
The S3 study [8
] argued that smell Middle Man
is related to fewer faults in ArgoUML
and Data Clumps
is related to fewer faults in Eclipse
and Apache Commons
. They also mentioned that Data Clumps
, Speculative Generality
, and Middle Man
are associated to a relatively low number of faults in the analyzed software projects. Moreover, the authors report that Data Clumps
, Speculative Generality
, and Middle Man
present a weak relationship with the number of faults in some circumstances and in some systems.
In S4 [9
], the authors analyzed the code smells Data Class
, God Class
, God Method
, Refused Bequest
and Shotgun Surgery
. They concluded that God Class
was a significant contributor and was positively associated with error proneness in all releases. Shotgun Surgery
and God Method
were significant contributors in Eclipse 2.1 and positively associated with error proneness. In contrast, other code smells seemed to have less influence on bug occurrence. In S11 [13
], the code smells Tradition Breaker
, Data Clumps
and Data Class
had the lowest proportions of bugs in their classes with percentages smaller than or equal to 5%.
In S1 [5
], the authors argued that class fault proneness is significantly higher in classes in which co-occurrence of anti-patterns and clones is found, in comparison with other classes in the analyzed software projects. S2 [7
] also claimed that classes with code smells tend to be more change- and fault-prone than other classes and that this is even more noticeable when classes are affected by multiple smells [7
]. The authors reported high fault-proneness for the code smell Message Chains
. Of the 395 releases analyzed in 30 projects, Message Chains
affected 13% and in the most affected release (a release of HSQLDB
), only four out of the 427 classes (0.9%) are instances of this smell. Therefore, the authors concluded that although Message Chains
is potentially harmful, its diffusion is rather limited. S3 emphasized that Message Chains
increased the flaws in two software projects (Eclipse
). However, the authors note that when detected in larger files, the number of detected flaws relative to file size was actually smaller.
] reported the influence of five code smells—Data Clumps
, Message Chains
, Middle Man
, Speculative Generality
and Switch Statements
—on the fault-prone behavior of source code. However, the effect of these smells on faults seems to be small [8
]. The authors argued that, in general, the smells analyzed have a relatively small influence (always below 10%) on bugs except for Message Chains
and suggested that Switch Statements
had no effect in any of the three projects analyzed. The authors argued that Data Clumps
reduced the failures in Apache
but increased the number of failures in ArgoUML
. Middle Man
reduced flaws only in ArgoUML
, and Speculative Generality
reduced flaws only in Eclipse
. In that study, collected evidence suggested that smells have different effects on different systems and that arbitrary refactoring is not guaranteed to reduce fault-proneness significantly. In some cases, it may even increase fault-proneness [8
] reported evidence of the influence on bugs of smells Shotgun Surgery
, God Class
, and God Method
. The authors analyzed the post-release system evolution process to reach this conclusion [9
]. Results confirmed that some smells were positively associated with the class error probability in three error-severity levels (High
, and Low
) usually applied to classify issues. This finding suggests that code smells could be used as a systematic method to identify problematic classes in this specific context [9
]. The study reported that the code smells (Shotgun Surgery
, God Class
and God Method
) were positively associated (ratio greater than one) with class error probability across and within the three error-severity levels (High
, and Low
) created by the authors to represent the original Bugzilla
, and Minor
). In particular, Shotgun Surgery
was associated with all severity levels of errors in all analyzed releases of the Eclipse
software project. On the other hand, the study did not find a relevant relationship between smells Data Class
and Refused Bequest
and the occurrence of bugs.
Although not explicitly investigating the occurrence of bugs as a consequence of code smells, S5 [37
] analyzed the influence of a set of code smells in class change-proneness. The study took into account a plethora of previous research findings to argue that change-proneness increases the probability of the advent of bugs in software projects [48
]. Considering this scenario, we included study S5 [37
] in this SLR. Twenty-nine code smells were analyzed in this study throughout nine releases of the Azureus
software project and in thirteen releases of the Eclipse
software project. The authors analyzed these releases to understand to what extent these code smells influenced class change-proneness. The conclusion was that, in almost all releases of projects Azureus
, classes with code smells were more change-prone than other classes, and specific smells were more correlated to change-proneness than other smells [37
]. In the case of the Azureus
, the smell Not Abstract
had a significant impact on change proneness in more than 75% of releases, whereas Abstract Class
and Large Class
proved to be relevant as agents of change-proneness in more than 50% of the analyzed releases. In project Eclipse
, and NotComplex
smells had a significant effect on change-proneness for 75% of the releases or more.
The authors of S7 [11
] provided evidence that instances of God Class
and Brain Class
suffered more frequent changes and contained more defects than classes not affected by those smells. Considering that both God Class
and Brain Class
have tendency to increase in size, there is also a tendency for more defects to occur in these classes. However, they also argued that when measured effects were normalized with respect to size, God Class
and Brain Class
were less subject to change and had fewer defects than other classes.
files against the occurrence of faults increases with time if the files do not have smells. For S11 [13
], the results of the empirical study also indicated that classes affected by code smells are more likely to manifest bugs. The study analyzed the influence of smells Schizophrenic Class
and Tradition Breaker
on the occurrence of bugs. Schizophrenic Class
was associated to a significant proportion of bugs, whereas Tradition Breaker
seemed to have a low influence. The authors recommended investigating these smells in other systems with an emphasis on Schizophrenic Class
. Still, according to S11 [13
], empirical evidence from project Apache Ant
indicated that classes affected by code smells are up to three times more likely to show bugs than other classes. In the case of the Apache Xerces
software project, the odds rate proved to be up to two times more likely to show bugs.
] reported that one of the most predominant kind of performance-related change is the fixing of bugs as a consequence of code smells manifested in the code. The study provided examples of new releases of projects to fix the inefficient usage of regular expressions, recurrent computations of constant data, and usage of deprecated decryption algorithms.
] comprised an analysis of 34 software projects and reported a significant positive correlation between number of bugs and number of anti-patterns. It also reported a negative correlation between number of anti-patterns and maintainability. This further supports the intuitive thinking that establishes a relation between anti-patterns, bugs, and (lack of) quality.
] investigated the association of code smells with merge conflicts, i.e., the impact on the bug proneness of the merged results. The authors of S18 argued that program elements that are involved in merge conflicts contain, on average, 3 times more code smells than program elements that are not involved in a merge conflict [12
]. In S18, 12 out of the 16 smells that co-occurred with conflicts are significantly associated with merge conflicts. From those, God Class
, Message Chains
, Internal Duplication
, Distorted Hierarchy
, and Refused Parent Bequest
stood out. The only two (significant) smells associated with semantic conflicts in the S18 study are Blob Operation
and Internal Duplication
, which proved to be respectively 1.77 and 1.55 times more likely to be present in a semantic conflict than with non-semantic conflicts.
shows that God Class
stood out from the other smells in its influential role on bug occurrence in the projects analyzed in studies S4, S7, S11, and S18. In S4, God Class
is positively correlated to code fault-proneness in 3 releases of project Eclipse
. Likewise, Message Chains
also stood out as an influential factor, as reported in studies S2, S3, S5, S11, and S18. On the other hand, no relevant tendency to change- and fault-prone was reported as a consequence of smells Data Class
, Refused Bequest
, No information
, Tradition Breaker
, Data Clumps
, Middle Man
, and Switch Statements
. As can be seen in the same figure, a node labeled No information
is connected to Less Influential Code Smells
. In this case, studies S1, S2, S5, S8, S9, S13, and S14 did not provide information on which smells were less influential in their respective studies.
S11 reported God Class as the smell with the greatest number of related bugs with a percentage of 20%, followed by Feature Envy with a percentage close to 15%, Schizophrenic Class with 9%, Message Chains with 7% in the analyzed projects. Still, in S11, Tradition Breaker, Data Clumps and Data Class were associated to percentages equal or lower than 5%. Feature Envy stood apart in S8. Message Chains also featured prominently in S2, S3, S5, and S18. In S3, Message Chains was associated to higher occurrence of faults in projects Eclipse and ArgoUML, though no influence was detected in project Apache Commons. Middle Man was related to fewer faults in ArgoUML, while Data Clumps was related to fewer faults in Eclipse and Apache Commons. Also, according to S3, Switch Statements showed no effect on faults in any of the three systems. In S5, the smell NotAbstract was the sole smell to betray a significant impact on change proneness in more than 75% of project Azureus. AbstractClass and LargeClass proved influential to a significant degree on more than 50% of the releases (five out of nine) of project Azureus, according to S5. In Eclipse, the smells HasChildren, MessageChainsClass, and NotComplex had a significant influence on the change-proneness in more of 75% of the releases. In S9, the risk rate varied between the 5 projects analyzed, since smells Chained Methods, This Assign and Variable Re-assign had the highest hazard ratios in project Express. Smells Nested Callbacks, Assignment in Conditional Statements and Variable Re-assign had the highest hazard rates in project Grunt. Deeply Nested Code was the most hazardous smell in terms of bug influence in project Bower. In terms of bug influence, Assignment in Conditional Statements had the highest hazard ratio in project Less.js and Variable Re-assign had the highest hazard ratio in project Request.
Four studies focused on one specific smell: S6, S12, S16 on Comments and S17 on Code Clone. Their authors concluded that they have an influence on bug occurrence in the projects analyzed and therefore they were considered more influential. S1, S13, S14 did not disclose which smells exerted the greatest influence on bug occurrence. S1, S2, S5, S8, S9, S13, S14 did not provide information on which code smells were less influential.
S6, S12, and S16 analyzed the Code Comments smell. The S12 results suggest there is a tendency for the presence of inner comments to relate to fault-proneness in methods. However, more inner comments do not seem to result in higher fault-proneness. For S6 and S16, comments contributed to improve comprehensibility, but often the motivation for comments can be to compensate for a lack of comprehensibility of complicated and difficult-to-understand code. Consequently, some well-written comments may be indicative of low-quality code. For S6, methods with more comments than the quantity estimated on the basis of its size and complexity are about 1.6–2.8 times more likely than average to be faulty. S16 also suggested that the risk of being faulty in well commented modules is about 2 to 8 times greater than in non-commented modules.
Unlike S1, S2, S3, S4, S5, S6, S7, S8, S9, S11, S12, S13, S14, S16, S17, and S18, the S10 study concluded that code smells do not increase the proportion of bugs in software projects. For S10, a total of 137 different problems were identified, of which only 64 problems (47%) were associated with source code, with code smells representing only 30% of those 47% of problems. The other 73 problems (53%) were related to other factors, such as lack of adequate technical infrastructure, developer coding habits, dependence on external services such as Web services, among others. S10 concluded that in general, code smells are only partial indicators of maintenance difficulties, because the study results showed a relatively low coverage of smells when observing project maintenance. S10 commented that analyzing code smells individually can provide a wrong picture, due to potential interaction effects among code smells and between smells and other features, such as those. The S10 study suggests that to evaluate code more widely and safely, different analysis techniques should be combined with code smell detection.
The results of S15 are in line with S10 and reported that: (1) most bugs had little to do with clones; (2) cloned code contained less buggy code (i.e., code implicated in bug fixes) than the rest of the system; (3) larger clone groups did not have more bugs than smaller clone groups, and, in fact, making more copies of code did not introduce more defects; and furthermore, larger clone groups had lower bug density per line than smaller clone groups; (4) scattered clones across files or directories may not induce more defects; and (5) bugs with high clone content may require less effort to fix (as measured by the number of lines changed to fix a bug). Most of the bugs (more than 80%) had no cloned code and around 90% of bugs yielded a clone ratio lower than the average project. In other words, S10 and S15 diverge from the other selected studies in that they did not find evidence of a clear relationship between code smells and software bugs.
4.2. Tools, Resources, and Techniques to Identify the Influence of Specific Code Smells on Bugs
In this section, we use evidence collected from the selected studies to answer research question RQ2
: Which tools, resources, and techniques were used to find evidence of the influence of code smells on the occurrence of software bugs?
Evidence collected from selected studies is presented in Figure 4
depicts tools, resources, and techniques reported in the selected studies to investigate the influence of specific smells on software bugs. S1 evaluated fault-proneness by analyzing the change and fault reports included in the software repositories. S1 used the version-control systems CVS
), also used by S5 and S8, and Subversion
), also used in studies S7, S10, S11, and S17. S1 executed a Perl
script that implemented the heuristics proposed in [54
] to analyze the commit message co-occurrence with maintenance activities to detect bug-fixing commits. Those heuristics searched for commit messages containing words such as bug
and performed during the maintenance phase of the studied release of a system. The bug identifier found in the commit log message was then compared with the project list of known bugs to determine the list of files (and classes) that were changed to fix a bug. Finally, the script checked the excerpt of changes performed on this list of files using CCFinder
(Defect Detection for CORrection) tools to identify the files in which a code smell and a fault occurred in the same file location.
S5 also used DECOR
to detect smells and to test whether the proportion of classes exhibiting at least one change significantly varied between classes with smells and other classes. For that purpose, they used Fisher’s exact test [55
], which checks whether a proportion varies between two samples and also computes the odds ratio (OR) [55
] indicating the likelihood of an event to occur. To compare the number of smells in change-prone classes with the number of smells in non-change-prone classes, a (non-parametric) Mann–Whitney test was used, and to relate change-proneness with the presence of particular kinds of smells, a logistic regression model [56
] was used, similarly to Vokac’s study [57
In S2, S3, S4, S7, S8, S11, S15, the Bugzilla
repositories were used to query the bugs of the projects under analysis, while S11, S15 and also S2, S7 for some projects, used the Jira
tracking system. S15 used Git
. S2, S3, S9, S14, S16, and S4 do not report the version control system used. S2, S3, and S7 opted to develop their own detection tool, and only S7 identifies the tool, naming it EvolutionAnalyzer
). S2 and S3 opted for the development tool as none of them were ever applied to detect all the studied code smells. For S2, the tool detection rules are generally more restrictive to ensure a good compromise between recall and precision, with the consequence that they may have missed some smell instances. To validate this claim, they evaluated the behavior of three existing tools: DECOR
, and HIST
. S15 used Deckard
]. File name, line number, information about which clone a line belongs to and sibling clones were extracted from the Deckard
output. S8 parsed the source code with the tool inFusion
) and produced a FAMIX
is a language-independent, object-oriented meta-model of the software system presented in [59
], whose results are a list of method-level design flaws from each class). Having the FAMIX-compliant model as input, the authors used detection strategies to spot the design flaws. This operation was performed within the Moose
reengineering framework [60
], and the result was a list of method level design flaws within each class.
To identify a relationship between bugs and source code, S2, S3, S8 resorted to the use of a mining technique of regular expressions applied to bug fixing commits written by developers which often include references to problem reports containing issue IDs in the versioning system change log, e.g., “fixed issue #ID”, “issue ID”, “bug”, etc. These can then be linked back to the issue tracking system’s issue identifier [61
]. In S8, some false positives were derived when the approach was limited to simply looking a the number. In their algorithm, each time a candidate reference to a bug report is found, a check is made to confirm that a bug with such an ID indeed exists. A check is also made, that the date in which the bug was reported is indeed prior to the timestamp of the commit comment in which the reference was found (i.e., it checks that the bug is fixed after
it is reported).
S4 and S10 used Borland Together
)—a plug-in tool for Eclipse
—to identify the classes that had bad smells. S10 also used InCode
). To analyse the change reports from the Eclipse
change log, S4 used CVSchangelog
, a tool available at Sourceforge
. S4 used two types of dependent variable for their experiment: a binary variable indicating whether a class is erroneous or not and a categorical variable indicating the error-severity level. They used the Multivariate Logistic Regression (MLR) to study the association between bad smells and class error proneness and the Multinomial Multivariate Logistic Regression (MMLR) to study the association between bad smells and the various error-severity levels.
S10 used Trac
), a system similar to Bugzilla
. Observation notes and interview transcripts were used to identify and register the problems, and, where applicable, the Java
files associated to the problems were registered. The record of maintenance problems was examined and categorized into non-source code-related and source code-related. Smells were detected via Borland Together
). Files considered problematic but that did not contain any detectable smell were manually reviewed to see if they exhibited any characteristics or issues that could explain why they were associated to problems. This step is similar to doing code reviews for software inspection where peers or experts review code for constructs that are known to lead to problems. In this case, the task is easier because it is already known that there is a problem associated with the file. It is just a matter of looking for evidence that can explain the problem. In addition, to determine whether multiple files contributed to maintenance problems, the notion of coupling
was used as part of the analysis. Tools InCode
) were used to identify such couplings.
S6, S12, and S16 comprised empirical analyses of relationships between comments and fault-proneness in the programs. S12 focused on comments describing sequences of executions, i.e., Functions/Methods and Documentation Comments and Inner Comments. S6 studied Lines of Comments (LCM) written inside a method’s body and S16 studied Lines of Comments written in modules. S12 and S6 analysed projects maintained with Git which provides various powerful functions for analyzing their repositories. For example, the “git log” command can easily and quickly extract commit logs which specific keywords (e.g., “bug”). Git can also easily make a clone of the repository in one’s local hard disk, so that data collection can be quickly performed at low cost. S16 did not disclose the control system used. The failures of the projects analysed in S12 were kept in Git. S6 did not disclose the failure portal. S16 got its fault data from the PROMISE data repository also used by S16.
To analyze relationships between comments and fault-proneness, S6, S12, and S16 collected several metrics. For each method, S12 calculated Lines of Inner comments (LOI) and Lines of Documentation comments (LOD) to obtain the number of comments in a method. Then, they collected the change history of each source file from Git, and extracted the change history of each method in the file. They obtained the latest version of the source files and built the complete list of methods except for the abstract methods. They used JavaMethodExtractor to extract the method data. For each method, they collected its change history. They examined whether the method was changed by checking all commits corresponding to the source file the target method is declared in. A method was deemed faulty if one of its changes was a bug fixing.
For data collection, S6 used JavaMethodExtractor
, and CyclomaticNumberCounter
and obtained the latest version of the source files from the repository, and made a list of all methods in the files, except for abstract methods. Then, the “initial” versions of the methods were also obtained by tracing their change history, and the LCM, LOC, and CC values were taken from those initial versions. For each method, they checked whether the method was faulty by examining all changes in which the method was involved. S6 and S12 classified a change as a bug fix when the corresponding commit’s log included one of the bug-fix-related words—“bug, fix, defect”—while S16 used the Lines of Comments
(LCM) metric to compute the number of comments written in a module and the FR metric [63
] to estimate the fault-proneness of modules.
into a framework available on GitHub
). All the five studied systems are hosted on GitHub
and use it as their issue tracker. The framework performs a Git
clone to get a copy of a system’s repository and then generates the list of all the commits used to perform an analysis at the commit level. S9 used GitHub
APIs to obtain the list of all resolved issues on the systems. They leveraged the SZZ
] to detect changes that introduced faults. Fault-fixing commits were identified using the heuristic proposed by Fischer et al. [61
], which comprises the use of regular expressions to detect bug IDs from the studied commit messages. Next, they extracted the modified files of each fault-fixing commit through the Git
command. Given each file (F) in a commit (C), they extracted C’s parent commit (C0). Then, they used Git
command to extract F’s deleted lines. They applied Git
command to identify commits that introduced these deleted lines, noted as the “candidate faulty changes”. Finally, they filtered the commits that were submitted after the corresponding dates of bug creation. To automatically detect code smells in the source code, the Abstract Syntax Tree
(AST) was first extracted from the code, using ESLint
), a popular open source Lint
as the core of the framework. They developed their own plugins and modified ESLint
built-in plugins to traverse the AST generated by ESLint
to extract and store information related to the set of code smells. For each kind of smell, a given metric was used. To identify code smells using the metric values provided by the framework, they defined threshold values above which files should be considered as having the code smell using Boxplot
files, they performed survival analysis to compare the time until a fault occurred in files containing code smells and files without code smells. Survival analysis was used to model the time until the occurrence of a well-defined event [64
The goal of S13 was to investigate performance-related commits in Android
apps with the purpose of understanding their nature and their relationship with project characteristics, such as domain and size. The study target in 2443 open-source apps taken from Google Play
store considering their releases hosted on GitHub
. S13 extracted the commits and pCommits (the number of performance-related commits in the GitHub
repository of the app, as compared to the overall number of commits, and (ii) the app category on Google Play
) using a script that only considers the folder containing the source code and resources of the mobile app, excluding back end, documentation, tests, or mockups. The mining script identifies a commit as performance-related if it matches at least one of the following keywords: wait
, or leak
. These keywords were identified by considering, analysing, and combining mining strategies in previous empirical studies on software performance in the literature. They extracted the category variable by mining the web page of the Google Play
store of each app. Then, they identified the concerns by applying the open card sorting technique [65
] to categorize performance-related commits into relevant groups. They performed card sorting in two phases: in the first phase, they tagged each commit with its representative keywords (e.g., read from file system
, swipe lag
) and in the second phase, they grouped commits into meaningful groups with informative titles (e.g., UI issues, file system issues).
To ensure a quality assessment, S14 chose the ColumbusQM
probabilistic quality model [66
] which ultimately produces one number per system describing how good
that system is. The antipattern-related information came from their own structural analysis-based extractor tool, and source code metrics were computed using the Columbus CodeAnalyzer
reverse engineering tool [67
]. S14 compiled the types of data described above for a total of 228 open-source Java
systems, 34 of which had corresponding class level bug numbers from the open- access PROMISE
database. The metric values were extracted by Columbus
. First, they converted the code to the LIM model (Language Independent Model
), a part of the Columbus
framework. From this data, LIM2Metrics
was used to compute various code metrics. They performed correlation analysis on the collected data. Since they did not expect the relationship between the inspected values to be linear—only monotone—they used Spearman correlation, which is, in fact, a traditional
Pearson correlation. The extent of this matching movement was somewhat masked by the ranking—which can be viewed as a kind of data loss—but this is not too important as they were more interested in the existence of this relation rather than its type.
S11 obtained the SVN
change log and Bugzilla
bug report of the selected projects. The source code for each version of the software was also extracted and processed by InCode
for bad smell detection (Intooitus, the company that provided InCode
, seems to have closed and the tool website is no longer available). To help with information processing, MapBsB
was developed, whose name is an acronym for Bad Smells Mapping for Bugs
, which generates a set of scripts to support the mapping of bad smells and bugs to classes. These scripts were processed by the Relink
tool (available at https://code.google.com/archive/p/bugcenter/wikis/ReLink.wiki
), which, in turn, generated a file with bugs and associated revisions (commits). Finally, bad smells, bug-revisions, and change log files were processed so that software versions, link bugs to system classes and versions, and cross related information between bad smells and bugs could be analyzed.
S17 extracted the SVN
commit messages by applying SVN log
command to identify bug-fix commits of candidate systems and apply the heuristics proposed by Mockus and Votta [68
] on the commit messages to automatically identify bug-fix commits. They analyzed and identified all the cloned methods relating to bug-fixes, and analyzed the stability considering fine-grained change types associated with each of the bug-related clone fragments to measure the extent of the relationship between stability and bug-proneness. Pretty-printing of the source files was then carried out to eliminate the formatting differences using ArtisticStyle
) and extracted the file modification history using SVN diff
command to list added, modified, and deleted files in successive revisions. To analyse the changes to cloned methods throughout all the revisions, they extracted method information from the successive revisions of the source code. They stored the method information (file path, package name, class name, signature, start line, end line) in a database for mapping changes to the corresponding methods.
S18 selected active open source projects hosted on GitHub that used the Maven build system and were developed in Java. They chose to use InFusion to identify code smells. Since Git does not record information about merge conflicts, they recreated each merge in the corpus in order to determine if a conflict had occurred. They used Git’s default algorithm, the recursive merge strategy, as this is the most likely to used by the average Git project. They used GumTree for their analysis, as it allowed them to track elements at the AST level. This way, they tracked just the elements that they were interested in (statements) and ignored other changes that do not actually change the code. The assessment of the impact of code smells was based on the number of bug-fixes occurred on code lines associated with a given code smell and a merge conflict. The authors used Generalized Linear Regression where the dependent variable (count of bug fixes occurring on smelly and conflicting lines) follows a Poisson distribution. Therefore, they used a Poisson regression model with a log linking function. They counted the number of references to and from other files to the files that were involved in a conflict. They also collected other factors for each commit such as the difference between the two merged branches in terms of LoC and AST difference, and the number of methods and classes being affected. After collecting these metrics, they checked for multi-collinearity using the Variance Inflation Factor (VIF) of each predictor in our model. VIF describes the level of multicollinearity (correlation between predictors). A VIF score between 1 and 5 indicates moderate correlation with other factors, so they selected the predictors with a VIF score threshold of 5.