You are currently viewing a new version of our website. To view the old version click .
Applied Sciences
  • Article
  • Open Access

4 August 2020

Can Defect Prediction Be Useful for Coarse-Level Tasks of Software Testing?

,
,
and
1
School of Reliability and Systems Engineering, Beihang University, Beijing 100191, China
2
The Key Laboratory on Reliability and Environmental Engineering Technology, Beihang University, Beijing 100191, China
3
School of Software, Nanchang Hangkong University, Nanchang 330063, China
*
Author to whom correspondence should be addressed.
This article belongs to the Section Computing and Artificial Intelligence

Abstract

It is popular to use software defect prediction (SDP) techniques to predict bugs in software in the past 20 years. Before conducting software testing (ST), the result of SDP assists on resource allocation for ST. However, DP usually works on fine-level tasks (or white-box testing) instead of coarse-level tasks (or black-box testing). Before ST or without historical execution information, it is difficult to get resource allocated properly. Therefore, a SDP-based approach, named DPAHM, is proposed to assist on arranging resource for coarse-level tasks. The method combines analytic hierarchy process (AHP) and variant incidence matrix. Besides, we apply the proposed DPAHM into a proprietary software, named MC. Besides, we conduct an up-to-down structure, including three layers for MC. Additionally, the performance measure of each layer is calculated based on the SDP result. Therefore, the resource allocation strategy for coarse-level tasks is gained according to the prediction result. The experiment indicates our proposed method is effective for resource allocation of coarse-level tasks before executing ST.

1. Introduction

Software testing (ST) is an necessary and vital activity in software quality assurance activities [,,]. ST assists in finding defects (aka. bugs) to improve software reliability and quality [,,]. Distributing test resources evenly makes defective and non-defective software entities (such as, class, file, function, subsystem) treated equally, which will lead to a waste of test resources (e.g., time, budget, software testers, etc.) or even do not meet the test objectives, especially with limited resource [,]. Therefore, before performing ST, it is quite important to appropriately allocate the resources.
Software defect prediction (SDP), which uses historical defect data including static code metrics (aka. features) and historical defect information (aka. labels) to predict the defect situations of entities via machine learning or deep learning techniques, is a good way to alleviate the issue mentioned above [,,,,,,,,,,]. If there is lack of local historical defect data, cross-project defect data can be collected for SDP, named cross-project defect prediction (CPDP) [,,,,,]. However, the granularity of prediction results relies on historical defect data and prediction models. That is, if the defect information is class-level and binary (i.e., the entity is defective or non-defective), only the machine learning algorithms that are able to implement binary classification targets are used to predict the class under test defect or not.
For ST, generally, test methods can be divided into white-box testing (WBT), which focuses on source codes (such as code review, module or unit testing) and black-box testing (BBT), which is related to inputs and outputs of the system without internal structure (such as system testing or functional testing) []. We refer WBT and BBT as methods for fine- and coarse-level tasks of ST, respectively. Moreover, researchers or industrial practitioners proposed several approaches to design test cases and improve test quality for the two tasks [,,]. For instance, Mostafa et al. proposed a coverage-based test case prioritization (TCP) method by using the code coverage data of code units and the fault-proneness estimations to prioritize test cases []. The authors of Chen et al. combined adaptive random sequence approach with clustering by applying black-box information to make TCP []. These authors wanted to refine test cases by using execution information to assist on resource allocation.
From the summary above, SDP can be widely used without execution or code coverage information for fine-level tasks instead of coarse-level test tasks. However, how should managers or project leaders arrange the resources when they face on coarse-level test tasks? Can the defect prediction results provide some meaningful information for coarse-level test tasks?
Based on the above motivation and need, in this study, we consider resource allocation for coarse-level tasks of ST as a single objective decision problem. We proposed a DP-based association hierarchy method (DPAHM) using incidence matrix and analytic hierarchy process (AHP). We conduct hierarchy framework that consists of three layers via AHP and analyze relationships between the inter-level (i.e., coarse- and fine-level) and intro-level (i.e., coarse- or fine-level) via incidence matrix, and then calculate the SDP result of the top-layer from AHP based on the SDP result of the bottom-layer, which also drives from AHP. The SDP result of the top-layer is the strategy for coarse-level test tasks. Besides, our contributions are as follows:
  • We apply SDP and make use of the prediction information for coarse-level tasks of ST.
  • We combine AHP with incidence matrix to conduct an up-to-down association hierarchy framework.
  • We propose a defect prediction based on AHP and incidence matrix, and use an example to apply DPAHM.
Moreover, our study aims at answering the following research questions (RQs):
  • RQ1: how is defect prediction used for coarse-level test tasks?
  • RQ2: can DPAHM complement different defect prediction task types?
  • RQ3: how do different defect prediction learners affect DPAHM?
The remainder of the study is organized, as follows: Section 2 introduces the background and reviews the related work. Section 3 describes our research method: DPAHM. Section 4 uses a project example to verify the effectiveness of our proposed method. Section 5 presents and discusses the experiment results. Section 6 points out the potential threats to validity of our study. Finally, Section 7 concludes this study and states the future work directions.

3. The Association Hierarchy Defect Prediction Method

3.1. Motivation of the Proposed Method

For coarse-level tasks of ST, such as system-level testing, if there are not enough execution information about the fine-level tasks, it is difficult for test leaders or project managers to arrange test resources. Before executing ST, it is more difficult to properly arrange the resource. However, for fine-level tasks, SDP provides a way to predict the defect situation without coverage data or other execution information. It is effective to assist on resource allocation. Besides, effect analysis is a key step for testing. The call relationship between coarse- and fine-level tasks can be gained by analysis.
Therefore, we inspired by AHP and incidence matrix to extend the hierarchy framework. A association hierarchy structure about the whole tasks from up to down is built. Moreover, before executing ST, the defect situation about the coarse-level tasks based on the SDP of fine-level tasks is derived.

3.2. Four Phases of the Proposed Method

The proposed method DPAHM consists of four phases: the framework conduction phase for the whole test tasks, the SDP model conduction phase for the bottom layer, the positive incidence matrix production phase of each part of the framework, and the resource allocation strategy output phase. A brief framework of the entire method is demonstrated, as illustrated in Figure 1. It should be noted that codes, software requirement specifications, and test case specifications are needed. Moreover, the proposed method is applied after code completion and before ST.
Figure 1. A Framework of The Propose Method.
Each stage is summarized, as follows.
In the framework conduction phase, an up-to-down association hierarchy structure about the three different element layers (Goal-, Criteria-, and Alternatives-layers) is conducted. The hierarchy framework is divided according to the vertical call mapping, which is obtained from software requirement specifications. For the Goal-layer, the resource allocation goal is determined by the test manager or project leader according to the requirement specification. The Criteria-layer contains l elements that are numbered from C r i t e r i o n 1 ( C 1 ) to C r i t e r i o n l ( C l ) . The elements are coarse-level or mixed coarse- and fine-level tasks. Besides, these tasks are obtained from the requirement specification or/and the architecture design specification. In the Alternatives-layer, there are q elements belonging to fine-level tasks are represented by A l t e r n a t i v e 1 ( A 1 ) to A l t e r n a t i v e q ( A q ). The Alternatives are derived from specifications, such as the module design specification. In addition, the horizontal relationship between elements is connected according to the horizontal call mapping. The horizontal call mapping is gotten by specifications for coarse-level tasks and obtained by codes for fine-level tasks. That is, according to design specifications for fine- and coarse- tasks, both the horizontal and vertical call relationships can be obtained, and then the association hierarchy structure can be gained.
In the SDP model conduction phase, the SDP model of the bottom layer (i.e., Alternatives-layer) is built, which uses historical defect data from other projects and a machine learning learner L e a r n e r in order to train a CPDP model.
In the positive incidence matrix production phase, PIMs from down to up are produced. That is, the vertical PIM about the inter-layer and the horizonal PIM about the intro-layer are produced by regarding the relationships as a directed graph.
In the resource allocation strategy output phase, the prediction result of Criteria-layer (i.e., coarse-level) is calculated by incidence matrix and the fine-level SDP result. Test case specifications is generated according to codes or/and software requirement specifications. Therefore, the allocation resource strategy for coarse-level tasks will be output if the most defective-prone task is predicted according to the prediction result.
Based on these four phases, the prediction order we advise is listed as follows (In the paper, we focus on resource allocation for coarse-level tasks. Thus, it is feasible that all of the prediction results are gained after completing test case specifications for system testing.):
  • the prediction result of Alternatives-layer, which should be gained before unit tesing;
  • the predction result of Criteria-layer, which should be obtained before system testing (or before integration testing if necessary);
  • the resource allocation strategy for coarse-level tasks after completing test case specifications for system testing.

3.3. Implementation Steps of the Proposed Method

According to the framework and four phases mentioned in Section 3.2, there are seven steps in total to complete our approach. Details of each step in the framework are illustrated, as follows. In addition, the relationship between phases and steps is represented in Table 1.
Table 1. The Relationship Between Phases and Steps of Our Proposed Approach.
  • Step 1: determine the goal is making resource allocation for coarse-level tasks and conduct an association hierarchy structure of the ST tasks according to the calling relationship between the outer layer and the inner layer;
    Step 1.1: apply the first step of AHP to build the hierarchy framework of the ST tasks including three layers (Goal-, Criteria-, and Alternatives-layer) from up to down (Note: Criteria-layer can be divided into one or more sublayers.);
    Step 1.2: analyze the relationship between Criteria- or Alternatives-layer, and find the association structure;
  • Step 2: conduct a SDP model (In the paper, we conduct a CPDP model for ranking as paper [] said.) and predict the defect situation for Alternatives-layer;
    Step 2.2: use historical defect training data as S o u r c e and a defect learner l e a r n e r to train a SDP model;
    Step 2.3: apply the L e a r n e r to T a r g e t , which is target project data of Alternatives-layer (i.e., fine-level), to gain the prediction probability vector p a l t e r . The formula is as follows:
    p a l t e r = L e a r n e r ( S o u r c e , T a r g e t ) = [ p 1 , p 2 , , p i , , p q ] T
    where p i means the prediction probability of the ith element of Alternatives-layer, q means the length of this layer;
  • Step 3: represent the structure of inter- and intro-layers above as directed graphs: G C A = ( V 1 , E 1 ) , G C C = ( V 2 , E 2 ) ;
  • Step 4: gain incidence matrix H c a of inter-layers from G C A and H c c of intro-layers from G C C .
    Step 4.1: produce vertical PIM H c a of inter-layers from graph G C A (i.e., Criteria-layer and Alternatives-layer), such that
    h c a i j = 1 , if the i th element of Criteria-layer calls the j th element of Alternatives-layer ; 0 , else .
    being h c a i j the element in the ith row and jth column of matrix H c a . Besides, the size of H c a is l × q , where l and q are the number of elements of Criteria- and Alternatives-layer, respectively;
    Step 4.2: obtain horizontal PIM H c c of the Criteria-layer from graph G C C . The formula is as follows:
    h c c i j = 1 , if the i th element calls the j th element of Criteria-layer or ( i = j ) ; 0 , else .
    being h c c i j the element in the ith row and jth column of matrix H c a . Moreover, the size of H c c is l × l .
  • Step 5: calculate the prediction probability vector p c r i of Criteria-layer by the following formula:
    p c r i = H l × l × H l × q × p a l t e r = [ p 1 , p 2 , , p j , , p l ] T
    where p j denotes the prediction probability of the jth element of Criteria-layer, l means the length of this layer;
  • Step 6: normalize the prediction probability vector p c r i . The formula is illustrated, as follows:
    p c r i = p c r i m i n ( p c r i ) m a x ( p c r i ) m i n ( p c r i )
    where m a x ( p c r i ) and m i n ( p c r i ) are the maximum and minimum value in p c r i , respectively.
  • Step 7: according to the prediction result to arrange the test resource.

4. Case Study

4.1. Research Questions

To verify the effective of our proposed method, three RQs are summarized, as follows.
RQ1: How is SDP used for coarse-level tasks of software testing?
To the best of our knowledge, SDP is widely used for fine-level tasks (i.e., WBT), but is rarely used for coarse-level tasks. Therefore, a basic defect prediction learner L R M L e a r n e r is selected for SDP to implement the processes of DPAHM. The question is a complete implementation of the method to show how to use SDP techniques for coarse-level tasks of ST.
RQ2: Can DPAHM complement different defect prediction task types?
The proposed method DPAHM is based on SDP. For different tasks of SDP, the types can be parted into classification, numeric, ranking and so on as Section 2.2 introduced. This question is to explore the prediction granularity-levels of coarse-level task.
RQ3: How do different defect prediction learners affect DPAHM?
Many research indicated that different defect prediction learners make different results for the same datasets [,]. This question is to discuss the effect of different learners on DPAHM.

4.2. Experimental Subjects

To answer the three RQs in Section 4.1. We apply our approach to a proprietary software under test. The software named M C is a safety-related electronic system for railway signals, which is completed by the C programming language. M C includes three coarse-level tasks and seventeen fine-level tasks.
For fine-level tasks, there are thirtheen files and four functions. Files are the basic modules which contain a total of 5334 lines of code (LOC). We collected 48 static metrics by LDRA TESTBED, which is an embedded test tool provided by LDRA (https://ldra.com/). Basic information about ID, file name, and some static metrics of M C is illustrated in Table 2. From the table, LOC, McCabe metrics (Cyclomatic Complexity ( v g ), Essential Cyclomatic Complexity ( e ( v g ) ) ) and Halsteads metrics (Total Operators ( N 1 ), Total Operands ( N 2 ), Unique Operators( η 1 ), Unique Operands ( η 2 ), Length (N), and Volume (V)) are shown. The meaning of McCabe metrics and Halsteads metrics can be referred in paper [,], respectively. The basic files are served for four parts (Function 1, Function 2, Function 3, and Function 4) to complete related functions. Table 3 lists the basic descriptive information.
Table 2. The Basic information bout Thirteen Files from Fine-level tasks of MC.
Table 3. The Basic Information about Four Functions from Fine-level Tasks of MC.
Three are three tasks in the coarse level. The names are Stress Calculation Task, Frame Number Acquisition Task, and Frame Number Acquisition Task. These tasks collect voltage signal and mass weight for safety of railway. Moreover, these tasks depend on fine-level tasks. The basic information (such as ID, Description) is presented in Table 4.
Table 4. The Basic Information about Coarse-level Tasks of MC.

4.3. Apply Our Method to MC

4.3.1. The First Phase

According to the test requirement, M C needs to be checked by system testing, integration testing, and unit testing. For coarse-level tasks, the call relationship is derived from development specifications (or documents) and software development engineers. Our goal is to make system testing about three tasks, which is implemented by calling four integration functions. For fine-level tasks, the relationship is gained by the code and specifications. Besides, the functions, which will be tested in the integration testing process, call thirteen files (units), which will be checked by unit testing.
After the analysis, we complement the first phase of our method. The structure of MC is divided into Goal-layer, Criteria-layer, and Alternatives layer. Our target is to obtain the resource allocation strategy of ST for M C . Moreover, there are two sublayers in Criteria-layer. The whole structure is conducted as Figure 2 indicated. As the figure depicted, the inter-layers and intro-layers are interdependent. For example, Task 1 calls Function 2, and Function 2 needs A3, who calls A4 and A5 to work.
Figure 2. The Association Hierarchy Structure of Software MC (Note: “A → B” represents that B calls A to implement the task).

4.3.2. The Second Phase

After obtaining the framwork of M C , we carry out the second phase to predict the defect situation for Alternatives-layer. In the paper, we use the same 10 historical defect datasets from the other projects, as in paper []. The basic information about these data is listed in Table 5. For the target project M C , we use LDRA TESTBED, as mentioned in Section 4.2, to gain the metrics. Moreover, the linear regression model (LRM), which is applied in paper [], is also used as the basic learner for predicting the result.
Table 5. The Basic Information about Historical Cross-project Defect Data.
Finally, we obtain the predict result of Alternatives-layer, which is listed in Table 6. Therefore, the probability vector of the layer is as Formula (9).
P a l t e r = 0.045 0.027 0.040 0.027 0.053 0.022 0.018 0.020 0.012 0.012 0.162 0.064 0.057 T
Table 6. The Defect Probability of Alternatives-layer by Linear Regression Model (LRM) for M C .

4.3.3. The Third Phase

The graphs of Criteria-sublayer 1 (i.e., Function-layer) and Alternatives-layer from the structure of Figure 2 are shown as Figure 3 and Figure 4. The graphs of Criteria-sublayer (i.e., Task-layer) and Function-layer from the structure of Figure 2 are illustrated as Figure 5 and Figure 6.
Figure 3. The Vertical Graph of Function- and Alternatives-layer from The Association Hierarchy Structure of M C (Note: the arrow from A to B represents that A calls B directly; the dash arrow from A to B means A calls B undirectly).
Figure 4. The Horinzonal Graph of Function-layer from The Association Hierarchy Structure of Software M C (Note: the arrow from A to B represents that A calls B directly; the dash arrow from A to B means A calls B undirectly).
Figure 5. The Vertical Graph of Task- and Function-layer from The Association Hierarchy Structure of M C (Note: the arrow from A to B represents that A calls B directly).
Figure 6. The Vertical Graph of Function-layer and from The Association Hierarchy Structure of M C .
As Figure 3 illustrated, the vertical PIM H f a between Function-layer and Alternatives-layer and the horizonal PIM of Function-layer are obtained. They are denoted as Formula (10) and Formula (11), respectively.
H f a = 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
H f f = 1 0 0 0 1 1 0 0 1 0 1 0 0 0 0 1
As Figure 5 shows, the vertical PIM H t f between Task- and Function-layer is obtained as Formula (12). Because the elements of Task-layer are independent, the PIM H t t is obtained as Formula (13).
H t f = 1 1 0 0 1 0 1 0 0 1 0 1
H t t = 1 0 0 0 1 0 0 0 1

4.3.4. The Four Phase

In the last phase, we derive the defect probability of each element in Criteria-layer. For the Function-layer, their defect probability vector P f u n are as Formula (14). Besides, the probability vector P t a s k of the Task-layer is displayed in Formula (15).
P f u n = H f f × H f a × P a l t e r = 0 . 072 0 . 192 0 . 156 0 . 283 T
P t a s k = H t t × H t f × P f u n = 0 . 264 0 . 228 0 . 475 T
We normalize the P t a s k by Formula (8). Subsequently, we obtain the normalized defect probability values of the three tasks (i.e., 0.146, 0, 1).

5. Results Analysis and Discussion

RQ1: How is SDP used for coarse-level tasks of software testing?
For RQ1, the result of P t a s k indicates that Task 2 is the least defective and Task 3 is the most defective. If a test leader or manager does not know the prediction result in advance, he or she will divide the resource evenly and make the testers test each task randomly. However, when the defectiveness of each task is predicted, the test leader or manager needs to arrange more resource for Task 3, less resource for Task 2. Besides, for test resource, it includes number of test cases, testers, test environment (such as, test tool, test devices, support staff), time, budget, etc.
For instance, the test manager totally has four test environments and four testers of equivalent professional level to execute ST for the three tasks. According to the coarse-level predict result, the manager can arrange two testers to check Task 3, one tester to check Task 2, and the left one to check Task 1. Besides, For Task 3 and Task 2, testers can use two days. However, for Task1, it is enough for the tester to use one day.
RQ2: Can DPAHM complement different defect prediction task types?
From the processes of DPAHM in Section 3 and the implementation steps of an example software M C in Section 4.3, we can realize that the result of SDP for Alternatives-level is an input value for coarse-level prediction. Thus, the final result form of coarse-level totally depends on the input value form. That is, the prediction granularity of the SDP result is binary, numeric, or ranking labels according to the SDP model which relies on prediction learners. For example, if a machine learning technique for regression task is chosen as a basic learner, the result from SDP on Alternatives-level will be numeric and the result of coarse-level will be the number of defect in each element of the Criteria-layer.
Therefore, for RQ2, the answer is “YES”.
RQ3: How do different defect prediction learners affect DPAHM?
For RQ3, we can analyze from two different perspectives:
One perspective is to discuss the effect of the prediction tasks. Because of RQ2, we know DPAHM can finish different prediction tasks. Therefore, we analyze how the different prediction tasks affect DPAHM. For classification models of SDP, the result of coarse-level tasks is binary. Each task will be predicted as defective or non-defective. The project manager cannot obtain extra information for all tasks. Therefore, the resource allocation strategy is coarse, but better than evenly resource allocation. For regression models of SDP, the result is numeric or ranking. For ranking result, the project manager can manage the resource according to the order. For numeric result, the manager can not only arrange the resource according to the numeric values, but also design different test cases to find these bugs. Therefore, we advise practitioners to do numeric or ranking prediction tasks. Because the result will provide more detailed information for resource allocation than classification prediction tasks.
The other perspective is the different learners for the same prediction tasks. Different learners may indicate different prediction results [,]. For ranking prediction tasks as examples, Xiao et al. analyzed the effect of different learners (i.e., LRM, Random Forest, and gradient boost regression tree) on FIP [], we use the same prediction approach as a basic SDP model. Therefore, we do not use the other learners for our experiment. Our final prediction result relies on the result of the Alternatives-layer. Therefore, the practitioners need to select a proper learner according to what result you want to get.

6. Threats to Validity

Potential threats to the validity of our research method are shown as follows.
Threats to internal validity: we use the SDP technique as the basic method for coarse-level tasks. The datasets we used in the experiment are cleaned, which means 18 common metrics are used and similar instances with software MC from cross-projects are selected. Besides, we have checked our experiments four times by all of the authors, there may still be some errors. Moreover, to the best of our knowledge, it is the first time to apply SDP for coarse-level tasks. Therefore, we did not execute control experiments. However, we explain the advantage of our proposed approach as compared with the uncertainty of evenly allocating resources.
Threats to external validity: we have verified the effectiveness of DPAHM via applying it to a specific software. Moreover, we only use LRM to make defect prediction for Alternatives-layer. Because our goal is to provide a method about allocating test resource for coarse-level tasks instead of finding the best model for the tasks. In order to generalize our proposed approach, we analyze the different types SDP methods (i.e., classification, ranking, numeric types) of DPAHM. In the future, more software for coarse-level tasks should be considered to reduce external validity.
Threats to construct validity: DPAHM relates to AHP and PIM. Therefore, the framework and relationships are important for the final result. We carefully check and draw the structure via the specifications and codes. In addition, our proposed DPAHM is based on SDP. According to previous SDP studies [,,], different performance measures are used. In the paper, we just follow paper [] and also use PoD to assess the performance of SDP in the Alternatives-layer.

7. Conclusions and Future Work

To alleviate the difficult resource allocation situation without execution information for coarse-level tasks, an approach, called DPAHM, is proposed in this paper. The method regards the resource allocation problem for ST as a multiple decision-making problem and combine AHP with a variation of incidence matrix to predict the defect situation of coarse-level tasks based on SDP techniques. Thus, the corresponding resource allocation strategy is born.
The approach is divided into four phases: association hierarchy framework conduction phase, software defect prediction model establishment phase, positive incidence matrix from vertical and horizonal direction production phase, and resource allocation strategy output phase. We apply the proposed method to a true software M C and the result indicates our method can provide ideas about resource allocation strategies for coarse-level testing tasks, such as system-level testing.
In the study, we aim at resource allocation for coarse-level tasks of ST. Accordingly, we only depend on SDP to predict defect situation for coarse-level tasks. DPAHM provides guidance for allocating resource. In the future, we will collect more resource allocation information (such as the number of test cases executed by each person every day, proportion of system testing time, budget for each person) to optimize the allocation strategy. Moreover, we assume that the call relationship between fine- and coarse-level is known or obtained by analysis. However, for complex software, it is difficult to analyze the association or hierarchy structure. Therefore, it is another direction in the future to provide resource allocation strategies for the coarse-level tasks of complex software.

Author Contributions

Conceptualization, C.C. and P.X.; methodology, C.C.; software, P.X.; validation, C.C., P.X. and S.W.; investigation, C.C.; resources, P.X.; data curation, P.X.; writing–original draft preparation, C.C.; writing–review and editing, C.C., B.L. and S.W.; supervision, B.L.; project administration, S.W.; funding acquisition, B.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by a grant from the Science & Technology on Reliability & Environmental Engineering Laboratory of China (Grant No. 614200404031117). Besides, the research is also supported by Foundation of No. 61400020404.

Acknowledgments

The authors would like to thank the providers for NASA and Softlab data sets. Besides, the authors are very thankful for the reviewers for their time and energy to provide valuable advice and suggestions for the paper.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this paper:
AHPanalytic hierarchy process
BBTblack-box testing
CPDPcross-project defect prediction
DPAHMDP-based association hierarchy method
PIMpositive incidence matrix
SDPsoftware defect prediction
STsoftware testing
SUTsoftware under test
TCGtest case generation
TCPtest case prioritization
TCStest case selection
TCMtest case minimization
WBTwhite-box testing

References

  1. Boehm, B.W.; Papaccio, P.N. Understanding and controlling software costs. IEEE Trans. Softw. Eng. 1988, 14, 1462–1477. [Google Scholar] [CrossRef]
  2. Porter, A.A.; Selby, R.W. Empirically guided software development using metric-based classification trees. IEEE Softw. 1990, 7, 46–54. [Google Scholar] [CrossRef]
  3. Garousi, V.; Zhi, J. A survey of software testing practices in Canada. J. Syst. Softw. 2013, 86, 1354–1376. [Google Scholar] [CrossRef]
  4. Yucalar, F.; Ozcift, A.; Borandag, E.; Kilinc, D. Multiple-classifiers in software quality engineering: Combining predictors to improve software fault prediction ability. Eng. Sci. Technol. Int. J. 2019, in press. [Google Scholar] [CrossRef]
  5. Huo, X.; Li, M. On cost-effective software defect prediction: Classification or ranking? Neurocomputing 2019, 363, 339–350. [Google Scholar] [CrossRef]
  6. Malhotra, R.; Kamal, S. An empirical study to investigate oversampling methods for improving software defect prediction using imbalanced data. Neurocomputing 2019, 343, 120–140. [Google Scholar] [CrossRef]
  7. Chen, J.; Hu, K.; Yang, Y.; Liu, Y.; Xuan, Q. Collective transfer learning for defect prediction. Neurocomputing 2019, in press. [Google Scholar] [CrossRef]
  8. Fenton, N.E.; Neil, M. A Critique of Software Defect Prediction Models. IEEE Trans. Softw. Eng. 2002, 25, 675–689. [Google Scholar] [CrossRef]
  9. Menzies, T.; Milton, Z.; Turhan, B.; Cukic, B.; Jiang, Y.; Bener, A. Defect prediction from static code features: Current results, limitations, new approaches. Autom. Softw. Eng. 2010, 17, 375–407. [Google Scholar] [CrossRef]
  10. Nagappan, N.; Ball, T. Use of relative code churn measures to predict system defect density. In Proceedings of the 27th International Conference on Software Engineering (ICSE), Saint Louis, MO, USA, 15–21 May 2005; pp. 284–292. [Google Scholar]
  11. Menzies, T.; Greenwald, J.; Frank, A. Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng. 2007, 33, 2–13. [Google Scholar] [CrossRef]
  12. Lessmann, S.; Baesens, B.; Mues, C.; Pietsch, S. Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Trans. Softw. Eng. 2008, 34, 485–496. [Google Scholar] [CrossRef]
  13. Shepperd, M.; Song, Q.; Sun, Z.; Mair, C. Data Quality: Some Comments on the NASA Software Defect Datasets. IEEE Trans. Softw. Eng. 2013, 39, 1208–1215. [Google Scholar] [CrossRef]
  14. Cui, C.; Liu, B.; Li, G. A novel feature selection method for software fault prediction model. In Proceedings of the 2019 Annual Reliability and Maintainability Symposium (RAMS), Orlando, FL, USA, 28–31 January 2019. [Google Scholar]
  15. Pan, C.; Lu, M.; Xu, B.; Gao, H. An Improved CNN Model for Within-Project Software Defect Prediction. Appl. Sci. 2019, 9, 2138. [Google Scholar] [CrossRef]
  16. Balogun, A.O.; Basri, S.; Abdulkadir, S.J.; Hashim, A.S. Performance Analysis of Feature Selection Methods in Software Defect Prediction: A Search Method Approach. Appl. Sci. 2019, 9, 2764. [Google Scholar] [CrossRef]
  17. Alsawalqah, H.; Hijazi, N.; Eshtay, M.; Faris, H.; Al Radaideh, A.; Aljarah, I.; Alshamaileh, Y. Software Defect Prediction Using Heterogeneous Ensemble Classification Based on Segmented Patterns. Appl. Sci. 2020, 10, 1745. [Google Scholar] [CrossRef]
  18. Ren, J.; Liu, F. A Novel Approach for Software Defect prediction Based on the Power Law Function. Appl. Sci. 2020, 10, 1892. [Google Scholar] [CrossRef]
  19. Zimmermann, T.; Nagappan, N.; Gall, H.; Giger, E.; Murphy, B. Cross-project defect prediction: A large scale experiment on data vs. domain vs. process. In Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (ESEC/FSE), Amsterdam, The Netherlands, 24–28 August 2009; pp. 91–100. [Google Scholar] [CrossRef]
  20. Rahman, F.; Posnett, D.; Devanbu, P. Recalling the “imprecision” of cross-project defect prediction. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, Cary, NC, USA, 11–16 November 2012. [Google Scholar] [CrossRef]
  21. Canfora, G.; De Lucia, A.; Di Penta, M.; Oliveto, R.; Panichella, A.; Panichella, S. Multi-objective cross-project defect prediction. In Proceedings of the 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation, Luxembourg, 18–22 March 2013; pp. 252–261. [Google Scholar] [CrossRef]
  22. Qing, H.; Biwen, L.; Beijun, S.; Xia, Y. Cross-project software defect prediction using feature-based transfer learning. In Proceedings of the 7th Asia-Pacific Symposium on Internetware, Wuhan, China, 6 November 2015; pp. 74–82. [Google Scholar] [CrossRef]
  23. Qiu, S.; Xu, H.; Deng, J.; Jiang, S.; Lu, L. Transfer Convolutional Neural Network for Cross-Project Defect Prediction. Appl. Sci. 2019, 9, 2660. [Google Scholar] [CrossRef]
  24. Jiang, K.; Zhang, Y.; Wu, H.; Wang, A.; Iwahori, Y. Heterogeneous Defect Prediction Based on Transfer Learning to Handle Extreme Imbalance. Appl. Sci. 2020, 10, 396. [Google Scholar] [CrossRef]
  25. Myers, G.J. The Art of Software Testing, 2nd ed.; Wiley: Hoboken, NJ, USA, 2004. [Google Scholar]
  26. Mandieh, M.; Mirian-Hosseinabadi, S.H.; Etemadi, K.; Nosrati, A.; Jalali, S. Incorporating fault-proneness estimations into coverage-based test case prioritization methods. Inf. Softw. Technol. 2020, 121, 106269. [Google Scholar] [CrossRef]
  27. Chen, J.; Zhu, L.; Chen, T.Y.; Towey, D.; Kuo, F.C.; Huang, R.; Guo, Y. Test case prioritization for object-oriented software: An adaptive random sequence approach based on clustering. J. Syst. Softw. 2018, 135, 107–125. [Google Scholar] [CrossRef]
  28. Basili, V.R.; Selby, R.W. Comparing the Effectiveness of Software Testing Strategies. IEEE Trans. Softw. Eng. 1988, 13, 1278–1296. [Google Scholar] [CrossRef]
  29. Yumoto, T.; Matsuodani, T.; Tsuda, K. A Test Analysis Method for Black Box Testing Using AUT and Fault Knowledge. Procedia Comput. Sci. 2013, 22, 551–560. [Google Scholar] [CrossRef]
  30. Murrill, B.W. An empirical, path-oriented approach to software analysis and testing. J. Syst. Softw. 2008, 81, 249–261. [Google Scholar] [CrossRef]
  31. Chi, J.; Qu, Y.; Zheng, Q.; Yang, Z.; Jin, W.; Cui, D.; Liu, T. Relation-based test case prioritization for regression testing. J. Syst. Softw. 2020, 163, 110539. [Google Scholar] [CrossRef]
  32. Parejo, J.A.; Sánchez, A.B.; Segura, S.; Ruiz-Cortés, A.; Lopez-Herrejon, R.E.; Egyed, A. Multi-objective test case prioritization in highly configurable systems: A case study. J. Syst. Softw. 2016, 122, 287–310. [Google Scholar] [CrossRef]
  33. Banias, O. Test case selection-prioritization approach based on memoization dynamic programming algorithm. Inf. Softw. Technol. 2019, 115, 119–130. [Google Scholar] [CrossRef]
  34. Arrieta, A.; Wang, S.; Markiegi, U.; Arruabarrena, A.; Etxeberria, L.; Sagardui, G. Pareto efficient multi-objective black-box test case selection for simulation-based testing. Inf. Softw. Technol. 2019, 114, 137–154. [Google Scholar] [CrossRef]
  35. Zhang, M.; Ali, S.; Yue, T. Uncertainty-wise test case generation and minimization for Cyber-Physical Systems. J. Syst. Softw. 2019, 153, 1–21. [Google Scholar] [CrossRef]
  36. Pandey, S.K.; Mishra, R.B.; Tripathi, A.K. BPDET: An effective software bug prediction model using deep representation and ensemble learning techniques. Expert Syst. Appl. 2020, 144, 113085. [Google Scholar] [CrossRef]
  37. Majd, A.; Vahidi-Asl, M.; Khalilian, A.; Poorsarvi-Tehrani, P.; Haghighi, H. SLDeep: Statement-level software defect prediction using deep-learning model on static code features. Expert Syst. Appl. 2020, 147, 113156. [Google Scholar] [CrossRef]
  38. Xiao, P.; Liu, B.; Wang, S. Feedback-based integrated prediction: Defect prediction based on feedback from software testing process. J. Syst. Softw. 2018, 143, 159–171. [Google Scholar] [CrossRef]
  39. Shao, Y.; Liu, B.; Wang, S.; Li, G. Software defect prediction based on correlation weighted class association rule mining. Knowl.-Based Syst. 2020, 196, 105742. [Google Scholar] [CrossRef]
  40. Ryu, D.; Baik, J. Effective multi-objective naive Bayes learning for cross-project defect prediction. Appl. Soft Comput. 2016, 49, 1062–1077. [Google Scholar] [CrossRef]
  41. Hong, E. Software fault-proneness prediction using module severity metrics. Int. J. Appl. Eng. Res. 2017, 12, 2038–2043. [Google Scholar]
  42. Jindal, R.; Malhotra, R.; Jain, A. Prediction of defect severity by mining software project reports. Int. J. Syst. Assur. Eng. Manag. 2017, 8, 334–351. [Google Scholar] [CrossRef]
  43. Yang, X.; Tang, K.; Yao, X. A Learning-to-Rank Approach to Software Defect Prediction. IEEE Trans. Reliab. 2015, 64, 234–246. [Google Scholar] [CrossRef]
  44. Ostrand, T.J.; Weyuker, E.J.; Bell, R.M. Predicting the location and number of faults in large software systems. IEEE Trans. Softw. Eng. 2005, 31, 340–355. [Google Scholar] [CrossRef]
  45. Bell, R.M.; Ostrand, T.J.; Weyuker, E.J. Looking for bugs in all the right places. In Proceedings of the ACM/SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2006, Portland, ME, USA, 17–20 July 2006. [Google Scholar]
  46. Yadav, H.B.; Yadav, D.K. A fuzzy logic based approach for phase-wise software defects prediction using software metrics. Inf. Softw. Technol. 2015, 63, 44–57. [Google Scholar] [CrossRef]
  47. Hosseini, S.; Turhan, B.; Mäntylä, M. A benchmark study on the effectiveness of search-based data selection and feature selection for cross project defect prediction. Inf. Softw. Technol. 2018, 95, 296–312. [Google Scholar] [CrossRef]
  48. Turhan, B.; Menzies, T.; Bener, A.; Di Stefano, J. On the relative value of cross-company and within-company data for defect prediction. Empir. Softw. Eng. 2009, 14, 540–578. [Google Scholar] [CrossRef]
  49. Li, Z.; Jing, X.Y.; Zhu, X.; Zhang, H. Heterogeneous Defect Prediction Through Multiple Kernel Learning and Ensemble Learning. In Proceedings of the IEEE International Conference on Software Maintenance & Evolution, Shanghai, China, 17–22 September 2017. [Google Scholar]
  50. Ma, Y.; Luo, G.; Zeng, X.; Chen, A. Transfer learning for cross-company software defect prediction. Inf. Softw. Technol. 2012, 54, 248–256. [Google Scholar] [CrossRef]
  51. Nam, J.; Pan, S.; Kim, S. Transfer defect learning. In Proceedings of the 2013 35th International Conference on Software Engineering (ICSE), San Francisco, CA, USA, 18–26 May 2013; pp. 382–391. [Google Scholar] [CrossRef]
  52. Jing, X.; Wu, F.; Dong, X.; Qi, F.; Xu, B. Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), Bergamo, Italy, 30 August–4 September 2015; pp. 496–507. [Google Scholar] [CrossRef]
  53. Saaty, T.L. Analytic Hierarchy Process; John Wiley and Sons, Ltd.: Hoboken, NJ, USA, 2005. [Google Scholar]
  54. Vázquez-Burgos, J.L.; Carbajal-Hernández, J.J.; Sánchez-Fernández, L.P.; Moreno-Armendáriz, M.A.; Tello-Ballinas, J.A.; Hernández-Bautista, I. An Analytical Hierarchy Process to manage water quality in white fish (Chirostoma estor estor) intensive culture. Comput. Electron. Agric. 2019, 167, 105071. [Google Scholar] [CrossRef]
  55. Abrahamsen, E.B.; Milazzo, M.F.; Selvik, J.T.; Asche, F.; Abrahamsen, H.B. Prioritising investments in safety measures in the chemical industry by using the Analytic Hierarchy Process. Reliab. Eng. Syst. Saf. 2020, 198, 106811. [Google Scholar] [CrossRef]
  56. Huang, J.; Cui, C.; Gao, C.; Lv, X. Technology maturity evaluation for DC-DC converter based on AHP and KPA. In Proceedings of the 2016 Prognostics and System Health Management Conference (PHM-Chengdu), Chengdu, China, 19–21 October 2016. [Google Scholar] [CrossRef]
  57. Higgins, M.; Benaroya, H. Utilizing the Analytical Hierarchy Process to determine the optimal lunar habitat configuration. Acta Astronaut. 2020, 173, 145–154. [Google Scholar] [CrossRef]
  58. Whitaker, R. Criticisms of the Analytic Hierarchy Process: Why they often make no sense. Math. Comput. Model. 2007, 46, 948–961. [Google Scholar] [CrossRef]
  59. Simrandeep Singh Thapar, H.S. Quantifying reusability of software components using hybrid fuzzy analytical hierarchy process (FAHP)-Metrics approach. Appl. Soft Comput. 2020, 88, 105997. [Google Scholar] [CrossRef]
  60. Wang, H.F.; Liao, H.L. User equilibrium in traffic assignment problem with fuzzy N–A incidence matrix. Fuzzy Sets Syst. 1999, 107, 245–253. [Google Scholar] [CrossRef]
  61. Morisugi, H.; Ohno, E. Proposal of a benefit incidence matrix for urban development projects. Reg. Sci. Urban Econ. 1995, 25, 461–481. [Google Scholar] [CrossRef]
  62. Dimarogonas, D.V.; Johansson, K.H. Stability analysis for multi-agent systems using the incidence matrix: Quantized communication and formation control. Automatica 2010, 46, 695–700. [Google Scholar] [CrossRef]
  63. Xie, K.; Zhou, J.; Li, W. Analytical model and algorithm for tracing active power flow based on extended incidence matrix. Electr. Power Syst. Res. 2009, 79, 399–405. [Google Scholar] [CrossRef]
  64. Mccabe, T. A Complexity Measure. IEEE Trans. Softw. Eng. 1976, 4, 308–320. [Google Scholar] [CrossRef]
  65. Halstead, M.H. Elements of Software Science; Operating and Programming Systems Series; Elsevier: Amsterdam, The Netherlands, 1978. [Google Scholar]

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.