Abstract
It is popular to use software defect prediction (SDP) techniques to predict bugs in software in the past 20 years. Before conducting software testing (ST), the result of SDP assists on resource allocation for ST. However, DP usually works on fine-level tasks (or white-box testing) instead of coarse-level tasks (or black-box testing). Before ST or without historical execution information, it is difficult to get resource allocated properly. Therefore, a SDP-based approach, named DPAHM, is proposed to assist on arranging resource for coarse-level tasks. The method combines analytic hierarchy process (AHP) and variant incidence matrix. Besides, we apply the proposed DPAHM into a proprietary software, named MC. Besides, we conduct an up-to-down structure, including three layers for MC. Additionally, the performance measure of each layer is calculated based on the SDP result. Therefore, the resource allocation strategy for coarse-level tasks is gained according to the prediction result. The experiment indicates our proposed method is effective for resource allocation of coarse-level tasks before executing ST.
1. Introduction
Software testing (ST) is an necessary and vital activity in software quality assurance activities [,,]. ST assists in finding defects (aka. bugs) to improve software reliability and quality [,,]. Distributing test resources evenly makes defective and non-defective software entities (such as, class, file, function, subsystem) treated equally, which will lead to a waste of test resources (e.g., time, budget, software testers, etc.) or even do not meet the test objectives, especially with limited resource [,]. Therefore, before performing ST, it is quite important to appropriately allocate the resources.
Software defect prediction (SDP), which uses historical defect data including static code metrics (aka. features) and historical defect information (aka. labels) to predict the defect situations of entities via machine learning or deep learning techniques, is a good way to alleviate the issue mentioned above [,,,,,,,,,,]. If there is lack of local historical defect data, cross-project defect data can be collected for SDP, named cross-project defect prediction (CPDP) [,,,,,]. However, the granularity of prediction results relies on historical defect data and prediction models. That is, if the defect information is class-level and binary (i.e., the entity is defective or non-defective), only the machine learning algorithms that are able to implement binary classification targets are used to predict the class under test defect or not.
For ST, generally, test methods can be divided into white-box testing (WBT), which focuses on source codes (such as code review, module or unit testing) and black-box testing (BBT), which is related to inputs and outputs of the system without internal structure (such as system testing or functional testing) []. We refer WBT and BBT as methods for fine- and coarse-level tasks of ST, respectively. Moreover, researchers or industrial practitioners proposed several approaches to design test cases and improve test quality for the two tasks [,,]. For instance, Mostafa et al. proposed a coverage-based test case prioritization (TCP) method by using the code coverage data of code units and the fault-proneness estimations to prioritize test cases []. The authors of Chen et al. combined adaptive random sequence approach with clustering by applying black-box information to make TCP []. These authors wanted to refine test cases by using execution information to assist on resource allocation.
From the summary above, SDP can be widely used without execution or code coverage information for fine-level tasks instead of coarse-level test tasks. However, how should managers or project leaders arrange the resources when they face on coarse-level test tasks? Can the defect prediction results provide some meaningful information for coarse-level test tasks?
Based on the above motivation and need, in this study, we consider resource allocation for coarse-level tasks of ST as a single objective decision problem. We proposed a DP-based association hierarchy method (DPAHM) using incidence matrix and analytic hierarchy process (AHP). We conduct hierarchy framework that consists of three layers via AHP and analyze relationships between the inter-level (i.e., coarse- and fine-level) and intro-level (i.e., coarse- or fine-level) via incidence matrix, and then calculate the SDP result of the top-layer from AHP based on the SDP result of the bottom-layer, which also drives from AHP. The SDP result of the top-layer is the strategy for coarse-level test tasks. Besides, our contributions are as follows:
- We apply SDP and make use of the prediction information for coarse-level tasks of ST.
- We combine AHP with incidence matrix to conduct an up-to-down association hierarchy framework.
- We propose a defect prediction based on AHP and incidence matrix, and use an example to apply DPAHM.
Moreover, our study aims at answering the following research questions (RQs):
- RQ1: how is defect prediction used for coarse-level test tasks?
- RQ2: can DPAHM complement different defect prediction task types?
- RQ3: how do different defect prediction learners affect DPAHM?
The remainder of the study is organized, as follows: Section 2 introduces the background and reviews the related work. Section 3 describes our research method: DPAHM. Section 4 uses a project example to verify the effectiveness of our proposed method. Section 5 presents and discusses the experiment results. Section 6 points out the potential threats to validity of our study. Finally, Section 7 concludes this study and states the future work directions.
2. Reviews of Related Works
As is shown in Reviews of Related Works, state-of-art research results about ST, SDP, AHP and incidence matrix are summarized, as follows:
2.1. Software Testing
ST is an effective process in the develop cycle to reduce the defects in software. it can be divided into different ways. According to testing types, software testing includes unit testing, system testing, graphical user interface (GUI) testing, performance testing, user acceptance testing, and other types []. ST will be referred to unit testing (aka, module testing), integration testing, function testing, system testing, acceptance testing, and installation testing [] from the perspective of development processes. According to testing techniques, it can be classified as functional testing, structural testing, and code reading []. In addition, WBT and BBT are two common methods of ST [,]. The ST process costs around 45% budget of the software develop cost []. Therefore, many practitioners or researchers reported approaches to make resource allocation more reasonably.
Several researchers analyzed software to assist on ST [,]. For example, Murrill proposed a model, called dynamic error flow analysis (DEFA), about the semantic behavior of a program in fine-level tasks and produced a testing strategy from DEFA analysis []. Yumoto et al. proposed a test analysis method for BBT by using a test category and organizing the classification based on fault and application under test knowledge [].
Several authors proposed TCP methods using white- or black-box information [,]. For example, Mostafa et al. improved the coverage-based TCP method by considering the fault-proneness distribution over code units for fine-level tasks []. Chi et al. extended the original additional greedy coverage algorithm for TCP called Additional Greedy method Call sequence) by using dynamic relation-based coverage as measurement []. Parejo et al. used considered three prioritization objectives for TCP in Highly-Configurable Systems []. Chen et al. leveraged black-box related information in order to propose a TCP approach with adaptive random sequence and clustering [].
Several researchers reported test case selection (TCS) or test case minimization (TCM) approaches, such as literature [,,]. Banias took low memory consumption into consideration that is suited for medium to large projects and proposed a dynamic programming approach for TCS issues []. Arrieta et al. proposed a cost-effective approach, which defined six effectiveness measures and one cost measure relying on black-box data []. Zhang et al. applied the Uncertainty Theory and multi-objective search to test case generation (TCG) and TCM [].
However, if there does not exist execution information or coverage data, their methods cannot work well for resource allocation.
2.2. Software Defect Prediction
SDP is one of the hottest topics of software engineering in two decades, because of its predictability. Test leaders or managers want to know the defect situation in a software under test (SUT) via SDP techniques before executing ST work. There has been many researchers study on SDP issues to improve prediction performances [,,,,,,,,,,]. According to different prediction tasks, SDP can be divided into binary classification tasks (such as [,,,]), multi-classification tasks (such as, [,]), ranking tasks (such as [,,] ), and numeric tasks (such as [,,]).
For the issues in SDP, some research teams focus on handling defect data, such as feature selection [,], instance selection [,], and class imbalance processing [,]. Some groups stated some novel algorithms [,,], such as TCA+ [], CCA [], and TNB []. Some articles reported different machine learning algorithms and prediction models for binary classification tasks [,,]. Several research scholars used historical numeric defect data to predict the defect numbers of every entity under test [,,]. Additionally, paper [,] paid attention to rank software entities to give testers some guidance for resource allocation.
However, these techniques are aiming at predicting defect in fine-level (such as class, function, file) with static code information without execution data. For coarse-level prediction(such as system), SDP seems to be less effective.
2.3. Analytic Hierarchy Process
Analytic hierarchy process (AHP) is a classic hierarchical decision analysis method proposed by Thomas L. Saaty to help decision-makers to quantify qualitative problems []. There are three steps of AHP in total. In the first step, a hierarchical structure that contains three different layers (i.e., goal, criteria, and alternatives, respectively) is built. In the second step, the relationships between inter-layers (i.e., alternatives- and goal-layer, criteria-, and alternatives-layer) are determined by judgment matrix and the relative weights are determined. In the third step, the total rank is derived and corresponding consistency is checked. Besides, the final decision of the alternatives is made (see literature [] for a more detailed description on AHP). AHP is widely used in different areas, such as water quality management [], safety measures [], Technology maturity evaluation [], and space exploration [].
However, the weight of the AHP method is subjective (see e.g., [,]). Therefore, several researchers extend AHP to make the decision more objective. In software engineering, researchers improved AHP for different software objectives. For example, Thapar et al. integrated fuzzy theory with AHP, called FAHP, to measure the reusability of software components []. However, the premise of using AHP is that the elements of each layer are independent, which is not always satisfied. In addition to solving the subjectivity of AHP, how to use AHP to clarify the relationship between the elements in each layer is another problem.
2.4. Incidence Matrix
Before summarizing the incidence matrix, we first understand the definitions of graph, undirected or directed graph, and incidence matrix, as shown in Definitions 1–3:
Definition 1 (Graph).
The graph is composed of a finite set of non-empty vertices and a set of edges between vertices, denoted as , where
- 1.
- G represents a graph.
- 2.
- V is the set of vertices in graph G (i.e., the vertex set), .
- 3.
- E is the set of edges in graph G (i.e., the edge set), .
Definition 2 (Undirected/Directed graph).
If the graph G that is defined in Definition 1 does not have a directed edge, which means that the edge between the vertices and has no direction, the graph G is called undirected graph. Otherwise, it is called directed graph, whose edge are called directed edge or arc.
- 1.
- An undirected edge is expressed as or .
- 2.
- A directed edge is represented by , where , are called the arc tail and head, respectively.
Definition 3 (Incidence Matrix).
Let arbitrary graph , where represents the number of times the vertex is associated with the edge . The possible values are 0, 1, 2, …, and the resulting matrix is the incidence matrix of graph G.
That is, H is the incidence matrix, such that
being the element in the ith row and jth column of matrix H.
Similarly, for directed graph , the element of corresponding incidence matrix is denoted as
being the element in the ith row and jth column of matrix .
Incidence matrix is a graph representation method to represent the relationships between the vertices and edges. Many researchers used the incidence matrix to solve different problems, such as paper [] for the traffic assignment problem, article [] for urban development projects evolution, paper [] for stability analysis of multi-agent systems, and literature [] for tracing power flow.
In ST, coarse-level tasks need to call many functions or other fine-level units to implement the requirement. Therefore, we only take the positive direction of the directed graph. Because changes of high-level functions do less effect on the low-level functions. For example, a high-level function T calls a low-level function F. When we change the parameters in F, the output from F usually changes, which leads to the change of T. However, if we change the parameters in T that are not related to F, the output result of F will not be affected by T. Thus, in the paper, we define a positive incidence matrix (PIM) . The related description of is as follows:
Definition 4 (Positive Incidence Matrix).
Let arbitrary graph , where represents the number of times the vertex is associated with the tail of the edge . The possible values are 0, 1, 2, …, and the resulting matrix is the positive incidence matrix of graph .
That is, is the positive incidence matrix, such that
being the element in the ith row and jth column of matrix .
3. The Association Hierarchy Defect Prediction Method
3.1. Motivation of the Proposed Method
For coarse-level tasks of ST, such as system-level testing, if there are not enough execution information about the fine-level tasks, it is difficult for test leaders or project managers to arrange test resources. Before executing ST, it is more difficult to properly arrange the resource. However, for fine-level tasks, SDP provides a way to predict the defect situation without coverage data or other execution information. It is effective to assist on resource allocation. Besides, effect analysis is a key step for testing. The call relationship between coarse- and fine-level tasks can be gained by analysis.
Therefore, we inspired by AHP and incidence matrix to extend the hierarchy framework. A association hierarchy structure about the whole tasks from up to down is built. Moreover, before executing ST, the defect situation about the coarse-level tasks based on the SDP of fine-level tasks is derived.
3.2. Four Phases of the Proposed Method
The proposed method DPAHM consists of four phases: the framework conduction phase for the whole test tasks, the SDP model conduction phase for the bottom layer, the positive incidence matrix production phase of each part of the framework, and the resource allocation strategy output phase. A brief framework of the entire method is demonstrated, as illustrated in Figure 1. It should be noted that codes, software requirement specifications, and test case specifications are needed. Moreover, the proposed method is applied after code completion and before ST.

Figure 1.
A Framework of The Propose Method.
Each stage is summarized, as follows.
In the framework conduction phase, an up-to-down association hierarchy structure about the three different element layers (Goal-, Criteria-, and Alternatives-layers) is conducted. The hierarchy framework is divided according to the vertical call mapping, which is obtained from software requirement specifications. For the Goal-layer, the resource allocation goal is determined by the test manager or project leader according to the requirement specification. The Criteria-layer contains l elements that are numbered from to . The elements are coarse-level or mixed coarse- and fine-level tasks. Besides, these tasks are obtained from the requirement specification or/and the architecture design specification. In the Alternatives-layer, there are q elements belonging to fine-level tasks are represented by () to (). The Alternatives are derived from specifications, such as the module design specification. In addition, the horizontal relationship between elements is connected according to the horizontal call mapping. The horizontal call mapping is gotten by specifications for coarse-level tasks and obtained by codes for fine-level tasks. That is, according to design specifications for fine- and coarse- tasks, both the horizontal and vertical call relationships can be obtained, and then the association hierarchy structure can be gained.
In the SDP model conduction phase, the SDP model of the bottom layer (i.e., Alternatives-layer) is built, which uses historical defect data from other projects and a machine learning learner in order to train a CPDP model.
In the positive incidence matrix production phase, PIMs from down to up are produced. That is, the vertical PIM about the inter-layer and the horizonal PIM about the intro-layer are produced by regarding the relationships as a directed graph.
In the resource allocation strategy output phase, the prediction result of Criteria-layer (i.e., coarse-level) is calculated by incidence matrix and the fine-level SDP result. Test case specifications is generated according to codes or/and software requirement specifications. Therefore, the allocation resource strategy for coarse-level tasks will be output if the most defective-prone task is predicted according to the prediction result.
Based on these four phases, the prediction order we advise is listed as follows (In the paper, we focus on resource allocation for coarse-level tasks. Thus, it is feasible that all of the prediction results are gained after completing test case specifications for system testing.):
- the prediction result of Alternatives-layer, which should be gained before unit tesing;
- the predction result of Criteria-layer, which should be obtained before system testing (or before integration testing if necessary);
- the resource allocation strategy for coarse-level tasks after completing test case specifications for system testing.
3.3. Implementation Steps of the Proposed Method
According to the framework and four phases mentioned in Section 3.2, there are seven steps in total to complete our approach. Details of each step in the framework are illustrated, as follows. In addition, the relationship between phases and steps is represented in Table 1.

Table 1.
The Relationship Between Phases and Steps of Our Proposed Approach.
- Step 1: determine the goal is making resource allocation for coarse-level tasks and conduct an association hierarchy structure of the ST tasks according to the calling relationship between the outer layer and the inner layer;
- −
- Step 1.1: apply the first step of AHP to build the hierarchy framework of the ST tasks including three layers (Goal-, Criteria-, and Alternatives-layer) from up to down (Note: Criteria-layer can be divided into one or more sublayers.);
- −
- Step 1.2: analyze the relationship between Criteria- or Alternatives-layer, and find the association structure;
- Step 2: conduct a SDP model (In the paper, we conduct a CPDP model for ranking as paper [] said.) and predict the defect situation for Alternatives-layer;
- −
- Step 2.2: use historical defect training data as and a defect learner to train a SDP model;
- −
- Step 2.3: apply the to , which is target project data of Alternatives-layer (i.e., fine-level), to gain the prediction probability vector . The formula is as follows:
- Step 3: represent the structure of inter- and intro-layers above as directed graphs: , ;
- Step 4: gain incidence matrix of inter-layers from and of intro-layers from .
- −
- Step 4.1: produce vertical PIM of inter-layers from graph (i.e., Criteria-layer and Alternatives-layer), such that
- −
- Step 4.2: obtain horizontal PIM of the Criteria-layer from graph . The formula is as follows:
- Step 5: calculate the prediction probability vector of Criteria-layer by the following formula:
- Step 6: normalize the prediction probability vector . The formula is illustrated, as follows:
- Step 7: according to the prediction result to arrange the test resource.
4. Case Study
4.1. Research Questions
To verify the effective of our proposed method, three RQs are summarized, as follows.
RQ1: How is SDP used for coarse-level tasks of software testing?
To the best of our knowledge, SDP is widely used for fine-level tasks (i.e., WBT), but is rarely used for coarse-level tasks. Therefore, a basic defect prediction learner is selected for SDP to implement the processes of DPAHM. The question is a complete implementation of the method to show how to use SDP techniques for coarse-level tasks of ST.
RQ2: Can DPAHM complement different defect prediction task types?
The proposed method DPAHM is based on SDP. For different tasks of SDP, the types can be parted into classification, numeric, ranking and so on as Section 2.2 introduced. This question is to explore the prediction granularity-levels of coarse-level task.
RQ3: How do different defect prediction learners affect DPAHM?
Many research indicated that different defect prediction learners make different results for the same datasets [,]. This question is to discuss the effect of different learners on DPAHM.
4.2. Experimental Subjects
To answer the three RQs in Section 4.1. We apply our approach to a proprietary software under test. The software named is a safety-related electronic system for railway signals, which is completed by the C programming language. includes three coarse-level tasks and seventeen fine-level tasks.
For fine-level tasks, there are thirtheen files and four functions. Files are the basic modules which contain a total of 5334 lines of code (LOC). We collected 48 static metrics by LDRA TESTBED, which is an embedded test tool provided by LDRA (https://ldra.com/). Basic information about ID, file name, and some static metrics of is illustrated in Table 2. From the table, LOC, McCabe metrics (Cyclomatic Complexity (), Essential Cyclomatic Complexity ( ) ) and Halsteads metrics (Total Operators (), Total Operands (), Unique Operators(), Unique Operands (), Length (N), and Volume (V)) are shown. The meaning of McCabe metrics and Halsteads metrics can be referred in paper [,], respectively. The basic files are served for four parts (Function 1, Function 2, Function 3, and Function 4) to complete related functions. Table 3 lists the basic descriptive information.

Table 2.
The Basic information bout Thirteen Files from Fine-level tasks of MC.

Table 3.
The Basic Information about Four Functions from Fine-level Tasks of MC.
Three are three tasks in the coarse level. The names are Stress Calculation Task, Frame Number Acquisition Task, and Frame Number Acquisition Task. These tasks collect voltage signal and mass weight for safety of railway. Moreover, these tasks depend on fine-level tasks. The basic information (such as ID, Description) is presented in Table 4.

Table 4.
The Basic Information about Coarse-level Tasks of MC.
4.3. Apply Our Method to MC
4.3.1. The First Phase
According to the test requirement, needs to be checked by system testing, integration testing, and unit testing. For coarse-level tasks, the call relationship is derived from development specifications (or documents) and software development engineers. Our goal is to make system testing about three tasks, which is implemented by calling four integration functions. For fine-level tasks, the relationship is gained by the code and specifications. Besides, the functions, which will be tested in the integration testing process, call thirteen files (units), which will be checked by unit testing.
After the analysis, we complement the first phase of our method. The structure of MC is divided into Goal-layer, Criteria-layer, and Alternatives layer. Our target is to obtain the resource allocation strategy of ST for . Moreover, there are two sublayers in Criteria-layer. The whole structure is conducted as Figure 2 indicated. As the figure depicted, the inter-layers and intro-layers are interdependent. For example, Task 1 calls Function 2, and Function 2 needs A3, who calls A4 and A5 to work.

Figure 2.
The Association Hierarchy Structure of Software MC (Note: “A → B” represents that B calls A to implement the task).
4.3.2. The Second Phase
After obtaining the framwork of , we carry out the second phase to predict the defect situation for Alternatives-layer. In the paper, we use the same 10 historical defect datasets from the other projects, as in paper []. The basic information about these data is listed in Table 5. For the target project , we use LDRA TESTBED, as mentioned in Section 4.2, to gain the metrics. Moreover, the linear regression model (LRM), which is applied in paper [], is also used as the basic learner for predicting the result.

Table 5.
The Basic Information about Historical Cross-project Defect Data.
Finally, we obtain the predict result of Alternatives-layer, which is listed in Table 6. Therefore, the probability vector of the layer is as Formula (9).

Table 6.
The Defect Probability of Alternatives-layer by Linear Regression Model (LRM) for .
4.3.3. The Third Phase
The graphs of Criteria-sublayer 1 (i.e., Function-layer) and Alternatives-layer from the structure of Figure 2 are shown as Figure 3 and Figure 4. The graphs of Criteria-sublayer (i.e., Task-layer) and Function-layer from the structure of Figure 2 are illustrated as Figure 5 and Figure 6.

Figure 3.
The Vertical Graph of Function- and Alternatives-layer from The Association Hierarchy Structure of (Note: the arrow from A to B represents that A calls B directly; the dash arrow from A to B means A calls B undirectly).

Figure 4.
The Horinzonal Graph of Function-layer from The Association Hierarchy Structure of Software (Note: the arrow from A to B represents that A calls B directly; the dash arrow from A to B means A calls B undirectly).

Figure 5.
The Vertical Graph of Task- and Function-layer from The Association Hierarchy Structure of (Note: the arrow from A to B represents that A calls B directly).

Figure 6.
The Vertical Graph of Function-layer and from The Association Hierarchy Structure of .
4.3.4. The Four Phase
In the last phase, we derive the defect probability of each element in Criteria-layer. For the Function-layer, their defect probability vector are as Formula (14). Besides, the probability vector of the Task-layer is displayed in Formula (15).
We normalize the by Formula (8). Subsequently, we obtain the normalized defect probability values of the three tasks (i.e., 0.146, 0, 1).
5. Results Analysis and Discussion
RQ1: How is SDP used for coarse-level tasks of software testing?
For RQ1, the result of indicates that Task 2 is the least defective and Task 3 is the most defective. If a test leader or manager does not know the prediction result in advance, he or she will divide the resource evenly and make the testers test each task randomly. However, when the defectiveness of each task is predicted, the test leader or manager needs to arrange more resource for Task 3, less resource for Task 2. Besides, for test resource, it includes number of test cases, testers, test environment (such as, test tool, test devices, support staff), time, budget, etc.
For instance, the test manager totally has four test environments and four testers of equivalent professional level to execute ST for the three tasks. According to the coarse-level predict result, the manager can arrange two testers to check Task 3, one tester to check Task 2, and the left one to check Task 1. Besides, For Task 3 and Task 2, testers can use two days. However, for Task1, it is enough for the tester to use one day.
RQ2: Can DPAHM complement different defect prediction task types?
From the processes of DPAHM in Section 3 and the implementation steps of an example software in Section 4.3, we can realize that the result of SDP for Alternatives-level is an input value for coarse-level prediction. Thus, the final result form of coarse-level totally depends on the input value form. That is, the prediction granularity of the SDP result is binary, numeric, or ranking labels according to the SDP model which relies on prediction learners. For example, if a machine learning technique for regression task is chosen as a basic learner, the result from SDP on Alternatives-level will be numeric and the result of coarse-level will be the number of defect in each element of the Criteria-layer.
Therefore, for RQ2, the answer is “YES”.
RQ3: How do different defect prediction learners affect DPAHM?
For RQ3, we can analyze from two different perspectives:
One perspective is to discuss the effect of the prediction tasks. Because of RQ2, we know DPAHM can finish different prediction tasks. Therefore, we analyze how the different prediction tasks affect DPAHM. For classification models of SDP, the result of coarse-level tasks is binary. Each task will be predicted as defective or non-defective. The project manager cannot obtain extra information for all tasks. Therefore, the resource allocation strategy is coarse, but better than evenly resource allocation. For regression models of SDP, the result is numeric or ranking. For ranking result, the project manager can manage the resource according to the order. For numeric result, the manager can not only arrange the resource according to the numeric values, but also design different test cases to find these bugs. Therefore, we advise practitioners to do numeric or ranking prediction tasks. Because the result will provide more detailed information for resource allocation than classification prediction tasks.
The other perspective is the different learners for the same prediction tasks. Different learners may indicate different prediction results [,]. For ranking prediction tasks as examples, Xiao et al. analyzed the effect of different learners (i.e., LRM, Random Forest, and gradient boost regression tree) on FIP [], we use the same prediction approach as a basic SDP model. Therefore, we do not use the other learners for our experiment. Our final prediction result relies on the result of the Alternatives-layer. Therefore, the practitioners need to select a proper learner according to what result you want to get.
6. Threats to Validity
Potential threats to the validity of our research method are shown as follows.
Threats to internal validity: we use the SDP technique as the basic method for coarse-level tasks. The datasets we used in the experiment are cleaned, which means 18 common metrics are used and similar instances with software MC from cross-projects are selected. Besides, we have checked our experiments four times by all of the authors, there may still be some errors. Moreover, to the best of our knowledge, it is the first time to apply SDP for coarse-level tasks. Therefore, we did not execute control experiments. However, we explain the advantage of our proposed approach as compared with the uncertainty of evenly allocating resources.
Threats to external validity: we have verified the effectiveness of DPAHM via applying it to a specific software. Moreover, we only use LRM to make defect prediction for Alternatives-layer. Because our goal is to provide a method about allocating test resource for coarse-level tasks instead of finding the best model for the tasks. In order to generalize our proposed approach, we analyze the different types SDP methods (i.e., classification, ranking, numeric types) of DPAHM. In the future, more software for coarse-level tasks should be considered to reduce external validity.
Threats to construct validity: DPAHM relates to AHP and PIM. Therefore, the framework and relationships are important for the final result. We carefully check and draw the structure via the specifications and codes. In addition, our proposed DPAHM is based on SDP. According to previous SDP studies [,,], different performance measures are used. In the paper, we just follow paper [] and also use PoD to assess the performance of SDP in the Alternatives-layer.
7. Conclusions and Future Work
To alleviate the difficult resource allocation situation without execution information for coarse-level tasks, an approach, called DPAHM, is proposed in this paper. The method regards the resource allocation problem for ST as a multiple decision-making problem and combine AHP with a variation of incidence matrix to predict the defect situation of coarse-level tasks based on SDP techniques. Thus, the corresponding resource allocation strategy is born.
The approach is divided into four phases: association hierarchy framework conduction phase, software defect prediction model establishment phase, positive incidence matrix from vertical and horizonal direction production phase, and resource allocation strategy output phase. We apply the proposed method to a true software and the result indicates our method can provide ideas about resource allocation strategies for coarse-level testing tasks, such as system-level testing.
In the study, we aim at resource allocation for coarse-level tasks of ST. Accordingly, we only depend on SDP to predict defect situation for coarse-level tasks. DPAHM provides guidance for allocating resource. In the future, we will collect more resource allocation information (such as the number of test cases executed by each person every day, proportion of system testing time, budget for each person) to optimize the allocation strategy. Moreover, we assume that the call relationship between fine- and coarse-level is known or obtained by analysis. However, for complex software, it is difficult to analyze the association or hierarchy structure. Therefore, it is another direction in the future to provide resource allocation strategies for the coarse-level tasks of complex software.
Author Contributions
Conceptualization, C.C. and P.X.; methodology, C.C.; software, P.X.; validation, C.C., P.X. and S.W.; investigation, C.C.; resources, P.X.; data curation, P.X.; writing–original draft preparation, C.C.; writing–review and editing, C.C., B.L. and S.W.; supervision, B.L.; project administration, S.W.; funding acquisition, B.L. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by a grant from the Science & Technology on Reliability & Environmental Engineering Laboratory of China (Grant No. 614200404031117). Besides, the research is also supported by Foundation of No. 61400020404.
Acknowledgments
The authors would like to thank the providers for NASA and Softlab data sets. Besides, the authors are very thankful for the reviewers for their time and energy to provide valuable advice and suggestions for the paper.
Conflicts of Interest
The authors declare no conflict of interest.
Abbreviations
The following abbreviations are used in this paper:
AHP | analytic hierarchy process |
BBT | black-box testing |
CPDP | cross-project defect prediction |
DPAHM | DP-based association hierarchy method |
PIM | positive incidence matrix |
SDP | software defect prediction |
ST | software testing |
SUT | software under test |
TCG | test case generation |
TCP | test case prioritization |
TCS | test case selection |
TCM | test case minimization |
WBT | white-box testing |
References
- Boehm, B.W.; Papaccio, P.N. Understanding and controlling software costs. IEEE Trans. Softw. Eng. 1988, 14, 1462–1477. [Google Scholar] [CrossRef]
- Porter, A.A.; Selby, R.W. Empirically guided software development using metric-based classification trees. IEEE Softw. 1990, 7, 46–54. [Google Scholar] [CrossRef]
- Garousi, V.; Zhi, J. A survey of software testing practices in Canada. J. Syst. Softw. 2013, 86, 1354–1376. [Google Scholar] [CrossRef]
- Yucalar, F.; Ozcift, A.; Borandag, E.; Kilinc, D. Multiple-classifiers in software quality engineering: Combining predictors to improve software fault prediction ability. Eng. Sci. Technol. Int. J. 2019, in press. [Google Scholar] [CrossRef]
- Huo, X.; Li, M. On cost-effective software defect prediction: Classification or ranking? Neurocomputing 2019, 363, 339–350. [Google Scholar] [CrossRef]
- Malhotra, R.; Kamal, S. An empirical study to investigate oversampling methods for improving software defect prediction using imbalanced data. Neurocomputing 2019, 343, 120–140. [Google Scholar] [CrossRef]
- Chen, J.; Hu, K.; Yang, Y.; Liu, Y.; Xuan, Q. Collective transfer learning for defect prediction. Neurocomputing 2019, in press. [Google Scholar] [CrossRef]
- Fenton, N.E.; Neil, M. A Critique of Software Defect Prediction Models. IEEE Trans. Softw. Eng. 2002, 25, 675–689. [Google Scholar] [CrossRef]
- Menzies, T.; Milton, Z.; Turhan, B.; Cukic, B.; Jiang, Y.; Bener, A. Defect prediction from static code features: Current results, limitations, new approaches. Autom. Softw. Eng. 2010, 17, 375–407. [Google Scholar] [CrossRef]
- Nagappan, N.; Ball, T. Use of relative code churn measures to predict system defect density. In Proceedings of the 27th International Conference on Software Engineering (ICSE), Saint Louis, MO, USA, 15–21 May 2005; pp. 284–292. [Google Scholar]
- Menzies, T.; Greenwald, J.; Frank, A. Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng. 2007, 33, 2–13. [Google Scholar] [CrossRef]
- Lessmann, S.; Baesens, B.; Mues, C.; Pietsch, S. Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Trans. Softw. Eng. 2008, 34, 485–496. [Google Scholar] [CrossRef]
- Shepperd, M.; Song, Q.; Sun, Z.; Mair, C. Data Quality: Some Comments on the NASA Software Defect Datasets. IEEE Trans. Softw. Eng. 2013, 39, 1208–1215. [Google Scholar] [CrossRef]
- Cui, C.; Liu, B.; Li, G. A novel feature selection method for software fault prediction model. In Proceedings of the 2019 Annual Reliability and Maintainability Symposium (RAMS), Orlando, FL, USA, 28–31 January 2019. [Google Scholar]
- Pan, C.; Lu, M.; Xu, B.; Gao, H. An Improved CNN Model for Within-Project Software Defect Prediction. Appl. Sci. 2019, 9, 2138. [Google Scholar] [CrossRef]
- Balogun, A.O.; Basri, S.; Abdulkadir, S.J.; Hashim, A.S. Performance Analysis of Feature Selection Methods in Software Defect Prediction: A Search Method Approach. Appl. Sci. 2019, 9, 2764. [Google Scholar] [CrossRef]
- Alsawalqah, H.; Hijazi, N.; Eshtay, M.; Faris, H.; Al Radaideh, A.; Aljarah, I.; Alshamaileh, Y. Software Defect Prediction Using Heterogeneous Ensemble Classification Based on Segmented Patterns. Appl. Sci. 2020, 10, 1745. [Google Scholar] [CrossRef]
- Ren, J.; Liu, F. A Novel Approach for Software Defect prediction Based on the Power Law Function. Appl. Sci. 2020, 10, 1892. [Google Scholar] [CrossRef]
- Zimmermann, T.; Nagappan, N.; Gall, H.; Giger, E.; Murphy, B. Cross-project defect prediction: A large scale experiment on data vs. domain vs. process. In Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (ESEC/FSE), Amsterdam, The Netherlands, 24–28 August 2009; pp. 91–100. [Google Scholar] [CrossRef]
- Rahman, F.; Posnett, D.; Devanbu, P. Recalling the “imprecision” of cross-project defect prediction. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, Cary, NC, USA, 11–16 November 2012. [Google Scholar] [CrossRef]
- Canfora, G.; De Lucia, A.; Di Penta, M.; Oliveto, R.; Panichella, A.; Panichella, S. Multi-objective cross-project defect prediction. In Proceedings of the 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation, Luxembourg, 18–22 March 2013; pp. 252–261. [Google Scholar] [CrossRef]
- Qing, H.; Biwen, L.; Beijun, S.; Xia, Y. Cross-project software defect prediction using feature-based transfer learning. In Proceedings of the 7th Asia-Pacific Symposium on Internetware, Wuhan, China, 6 November 2015; pp. 74–82. [Google Scholar] [CrossRef]
- Qiu, S.; Xu, H.; Deng, J.; Jiang, S.; Lu, L. Transfer Convolutional Neural Network for Cross-Project Defect Prediction. Appl. Sci. 2019, 9, 2660. [Google Scholar] [CrossRef]
- Jiang, K.; Zhang, Y.; Wu, H.; Wang, A.; Iwahori, Y. Heterogeneous Defect Prediction Based on Transfer Learning to Handle Extreme Imbalance. Appl. Sci. 2020, 10, 396. [Google Scholar] [CrossRef]
- Myers, G.J. The Art of Software Testing, 2nd ed.; Wiley: Hoboken, NJ, USA, 2004. [Google Scholar]
- Mandieh, M.; Mirian-Hosseinabadi, S.H.; Etemadi, K.; Nosrati, A.; Jalali, S. Incorporating fault-proneness estimations into coverage-based test case prioritization methods. Inf. Softw. Technol. 2020, 121, 106269. [Google Scholar] [CrossRef]
- Chen, J.; Zhu, L.; Chen, T.Y.; Towey, D.; Kuo, F.C.; Huang, R.; Guo, Y. Test case prioritization for object-oriented software: An adaptive random sequence approach based on clustering. J. Syst. Softw. 2018, 135, 107–125. [Google Scholar] [CrossRef]
- Basili, V.R.; Selby, R.W. Comparing the Effectiveness of Software Testing Strategies. IEEE Trans. Softw. Eng. 1988, 13, 1278–1296. [Google Scholar] [CrossRef]
- Yumoto, T.; Matsuodani, T.; Tsuda, K. A Test Analysis Method for Black Box Testing Using AUT and Fault Knowledge. Procedia Comput. Sci. 2013, 22, 551–560. [Google Scholar] [CrossRef]
- Murrill, B.W. An empirical, path-oriented approach to software analysis and testing. J. Syst. Softw. 2008, 81, 249–261. [Google Scholar] [CrossRef]
- Chi, J.; Qu, Y.; Zheng, Q.; Yang, Z.; Jin, W.; Cui, D.; Liu, T. Relation-based test case prioritization for regression testing. J. Syst. Softw. 2020, 163, 110539. [Google Scholar] [CrossRef]
- Parejo, J.A.; Sánchez, A.B.; Segura, S.; Ruiz-Cortés, A.; Lopez-Herrejon, R.E.; Egyed, A. Multi-objective test case prioritization in highly configurable systems: A case study. J. Syst. Softw. 2016, 122, 287–310. [Google Scholar] [CrossRef]
- Banias, O. Test case selection-prioritization approach based on memoization dynamic programming algorithm. Inf. Softw. Technol. 2019, 115, 119–130. [Google Scholar] [CrossRef]
- Arrieta, A.; Wang, S.; Markiegi, U.; Arruabarrena, A.; Etxeberria, L.; Sagardui, G. Pareto efficient multi-objective black-box test case selection for simulation-based testing. Inf. Softw. Technol. 2019, 114, 137–154. [Google Scholar] [CrossRef]
- Zhang, M.; Ali, S.; Yue, T. Uncertainty-wise test case generation and minimization for Cyber-Physical Systems. J. Syst. Softw. 2019, 153, 1–21. [Google Scholar] [CrossRef]
- Pandey, S.K.; Mishra, R.B.; Tripathi, A.K. BPDET: An effective software bug prediction model using deep representation and ensemble learning techniques. Expert Syst. Appl. 2020, 144, 113085. [Google Scholar] [CrossRef]
- Majd, A.; Vahidi-Asl, M.; Khalilian, A.; Poorsarvi-Tehrani, P.; Haghighi, H. SLDeep: Statement-level software defect prediction using deep-learning model on static code features. Expert Syst. Appl. 2020, 147, 113156. [Google Scholar] [CrossRef]
- Xiao, P.; Liu, B.; Wang, S. Feedback-based integrated prediction: Defect prediction based on feedback from software testing process. J. Syst. Softw. 2018, 143, 159–171. [Google Scholar] [CrossRef]
- Shao, Y.; Liu, B.; Wang, S.; Li, G. Software defect prediction based on correlation weighted class association rule mining. Knowl.-Based Syst. 2020, 196, 105742. [Google Scholar] [CrossRef]
- Ryu, D.; Baik, J. Effective multi-objective naive Bayes learning for cross-project defect prediction. Appl. Soft Comput. 2016, 49, 1062–1077. [Google Scholar] [CrossRef]
- Hong, E. Software fault-proneness prediction using module severity metrics. Int. J. Appl. Eng. Res. 2017, 12, 2038–2043. [Google Scholar]
- Jindal, R.; Malhotra, R.; Jain, A. Prediction of defect severity by mining software project reports. Int. J. Syst. Assur. Eng. Manag. 2017, 8, 334–351. [Google Scholar] [CrossRef]
- Yang, X.; Tang, K.; Yao, X. A Learning-to-Rank Approach to Software Defect Prediction. IEEE Trans. Reliab. 2015, 64, 234–246. [Google Scholar] [CrossRef]
- Ostrand, T.J.; Weyuker, E.J.; Bell, R.M. Predicting the location and number of faults in large software systems. IEEE Trans. Softw. Eng. 2005, 31, 340–355. [Google Scholar] [CrossRef]
- Bell, R.M.; Ostrand, T.J.; Weyuker, E.J. Looking for bugs in all the right places. In Proceedings of the ACM/SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2006, Portland, ME, USA, 17–20 July 2006. [Google Scholar]
- Yadav, H.B.; Yadav, D.K. A fuzzy logic based approach for phase-wise software defects prediction using software metrics. Inf. Softw. Technol. 2015, 63, 44–57. [Google Scholar] [CrossRef]
- Hosseini, S.; Turhan, B.; Mäntylä, M. A benchmark study on the effectiveness of search-based data selection and feature selection for cross project defect prediction. Inf. Softw. Technol. 2018, 95, 296–312. [Google Scholar] [CrossRef]
- Turhan, B.; Menzies, T.; Bener, A.; Di Stefano, J. On the relative value of cross-company and within-company data for defect prediction. Empir. Softw. Eng. 2009, 14, 540–578. [Google Scholar] [CrossRef]
- Li, Z.; Jing, X.Y.; Zhu, X.; Zhang, H. Heterogeneous Defect Prediction Through Multiple Kernel Learning and Ensemble Learning. In Proceedings of the IEEE International Conference on Software Maintenance & Evolution, Shanghai, China, 17–22 September 2017. [Google Scholar]
- Ma, Y.; Luo, G.; Zeng, X.; Chen, A. Transfer learning for cross-company software defect prediction. Inf. Softw. Technol. 2012, 54, 248–256. [Google Scholar] [CrossRef]
- Nam, J.; Pan, S.; Kim, S. Transfer defect learning. In Proceedings of the 2013 35th International Conference on Software Engineering (ICSE), San Francisco, CA, USA, 18–26 May 2013; pp. 382–391. [Google Scholar] [CrossRef]
- Jing, X.; Wu, F.; Dong, X.; Qi, F.; Xu, B. Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), Bergamo, Italy, 30 August–4 September 2015; pp. 496–507. [Google Scholar] [CrossRef]
- Saaty, T.L. Analytic Hierarchy Process; John Wiley and Sons, Ltd.: Hoboken, NJ, USA, 2005. [Google Scholar]
- Vázquez-Burgos, J.L.; Carbajal-Hernández, J.J.; Sánchez-Fernández, L.P.; Moreno-Armendáriz, M.A.; Tello-Ballinas, J.A.; Hernández-Bautista, I. An Analytical Hierarchy Process to manage water quality in white fish (Chirostoma estor estor) intensive culture. Comput. Electron. Agric. 2019, 167, 105071. [Google Scholar] [CrossRef]
- Abrahamsen, E.B.; Milazzo, M.F.; Selvik, J.T.; Asche, F.; Abrahamsen, H.B. Prioritising investments in safety measures in the chemical industry by using the Analytic Hierarchy Process. Reliab. Eng. Syst. Saf. 2020, 198, 106811. [Google Scholar] [CrossRef]
- Huang, J.; Cui, C.; Gao, C.; Lv, X. Technology maturity evaluation for DC-DC converter based on AHP and KPA. In Proceedings of the 2016 Prognostics and System Health Management Conference (PHM-Chengdu), Chengdu, China, 19–21 October 2016. [Google Scholar] [CrossRef]
- Higgins, M.; Benaroya, H. Utilizing the Analytical Hierarchy Process to determine the optimal lunar habitat configuration. Acta Astronaut. 2020, 173, 145–154. [Google Scholar] [CrossRef]
- Whitaker, R. Criticisms of the Analytic Hierarchy Process: Why they often make no sense. Math. Comput. Model. 2007, 46, 948–961. [Google Scholar] [CrossRef]
- Simrandeep Singh Thapar, H.S. Quantifying reusability of software components using hybrid fuzzy analytical hierarchy process (FAHP)-Metrics approach. Appl. Soft Comput. 2020, 88, 105997. [Google Scholar] [CrossRef]
- Wang, H.F.; Liao, H.L. User equilibrium in traffic assignment problem with fuzzy N–A incidence matrix. Fuzzy Sets Syst. 1999, 107, 245–253. [Google Scholar] [CrossRef]
- Morisugi, H.; Ohno, E. Proposal of a benefit incidence matrix for urban development projects. Reg. Sci. Urban Econ. 1995, 25, 461–481. [Google Scholar] [CrossRef]
- Dimarogonas, D.V.; Johansson, K.H. Stability analysis for multi-agent systems using the incidence matrix: Quantized communication and formation control. Automatica 2010, 46, 695–700. [Google Scholar] [CrossRef]
- Xie, K.; Zhou, J.; Li, W. Analytical model and algorithm for tracing active power flow based on extended incidence matrix. Electr. Power Syst. Res. 2009, 79, 399–405. [Google Scholar] [CrossRef]
- Mccabe, T. A Complexity Measure. IEEE Trans. Softw. Eng. 1976, 4, 308–320. [Google Scholar] [CrossRef]
- Halstead, M.H. Elements of Software Science; Operating and Programming Systems Series; Elsevier: Amsterdam, The Netherlands, 1978. [Google Scholar]
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).