Mutated Speciﬁcation-Based Test Data Generation with a Genetic Algorithm †

: Speciﬁcation-based testing methods generate test data without the knowledge of the structure of the program. However, the quality of these test data are not well ensured to detect bugs when non-functional changes are introduced to the program. To generate test data effectively, we propose a new method that combines formal speciﬁcations with the genetic algorithm (GA). In this method, formal speciﬁcations are reformed by GA in order to be used to generate input values that can kill as many mutants of the target program as possible. Two classic examples are presented to demonstrate how the method works. The result shows that the proposed method can help effectively generate test cases to kill the program mutants, which contributes to the further maintenance of software.


Introduction
Regression testing is an important technique to ensure that previously tested software still performs in the same way after it is changed or integrated with other software [1][2][3]. In general, changes to software are mainly concerned with the efficiency enhancement, robustness improvement, and configuration changes, but these changes should not result in big alternation of the functionality defined in the specification of the software. Therefore, specification-based testing (SBT) methods can be effectively used in regression testing.
SBT is characterized by test data being generated from the specification without concerning the structure of the corresponding program and test results being analyzed based on the specification [4][5][6][7][8]. Formal specifications may allow the test data generation and test result analysis to be done rigorously, systematically, and even automatically in many cases [9][10][11][12]. In our work, we mainly use formal specifications in pre-and postconditions, such as Vienna Development Method (VDM) [13], a formal method that has been developed over past years [14,15], and Structured-Object-Oriented Formal Language (SOFL) [16], which has the potential of practical use in industry and serves as a solid foundation to develop a method of functional scenario-based test data generation [17].
However, in spite of considerable progresses having been made, it is still not easy for SBT to generate various test data only from specifications to detect different bugs that are contained in the program. This is because different features and effects of the program output cannot be controlled and triggered by only input data suites that satisfy some parts of the specification (some constraints over only input variables). Consequently, many faulty program paths would not be detected and thus the bug detection would be likely to fail in some cases. For the existing SBT, one of the major deficiencies is that the test data generation only considers the constraints over input variables in the formal specification, without making use of the constraints over outputs before the execution of a program.
To overcome the shortcomings of the existing SBT methods, we propose a new method for test data generation in this paper. The proposed method introduces dummy variables into some specified constraints in the specification, and makes use of the constraints over both input and output variables to guide the test data generation, in contrast to the conventional SBT methods that concerns only constraints over input variables. This method features the combination of three techniques, SBT, mutation testing, and GA. It is to obtain the enhanced (mutated) formal specifications by using a GA so that input data generated from these specifications are more likely to kill different kinds of mutants of the target program under test. The expected effect of the test data generated in this way is to detect various bugs probably occurring in the program that is being developed or improved.
The work in this paper is an extension for our previous work in [18]. Comparing with the previous work, we use more different kinds of mutants (16 types against previously 10) for each program in our case study to gain better mutated functional scenarios that are capable of detecting more bugs. In addition, we give additional experimental results of the proposed approach that uses no dummy variables, and point out the importance of introducing dummy variables. In addition, we carefully conduct more experiments with the dummy variables that are introduced to reform different parts of original specification, the equality and inequality relations. Based on that, we enriched the analysis for the effect of different ways of introducing dummy variables on bug detection.
The remaining part of the paper is organized as follows. Section 2 briefly introduces the existing work related to our approach. Section 3 illustrates how to transform formal specifications to chromosomes as well as the corresponding genetic manipulations in GA. Section 4 describes the main procedures of obtaining desirable reformed specifications for test data generation by integrating GA. Section 5 presents two classic cases to demonstrate the feasibility and efficiency of the approach. Finally, Section 6 concludes the paper and points out future research directions.

Related Work
In this section, we introduce several advanced techniques that relate to our methodology for test data generation.
Data flow analysis, a technique for computing the def-use associations for the control flow graph (CFG) of a program, are often used to develop different strategies for test data generation over a long history [19][20][21]. Many research works have proposed promising methods for automatic test data generation that integrates the data flow analysis with the heuristics, such as GAs [22][23][24], particle swarm optimization [25], and ant colony optimization [26]. Different from these techniques, our approach conducts the testing under the circumstance where the source code of thirty-party library under test cannot be accessed. Thus, these techniques rely too much on the knowledge and analysis of internal design (or code structure), but our approach focuses on generating test data without analyzing the source code when applying the GA to the formal specification. In addition, with respect to the usage of the GA, our approach uses the GA to search for the optimal mutated specifications that are later used for input data generation, while the techniques mentioned above use the GA to directly search for good input data.
The techniques of mutation-based test data generation [27,28] are used to select a set of "good" test data by executing designed incorrect versions of an original program with a great number of test data from the domain. Test data are selected if it can cause unintended behaviors for a certain number of incorrect versions. These techniques mainly concentrate on designing appropriate mutation operators to introduce small modifications for different kinds of programming languages such as Java [29], C# [30], and C++ [31]. The incorrect versions, also called program mutants, are created by inserting the mutant operators into the original program. Compared with these techniques, our method selects a set of "good" mutated specifications as a seed for further test data generation by using not only program mutants but also the mutated specifications with the GA.
The SBT techniques, some of them integrated with heuristic search strategies, have been well developed to cope with different kinds of specifications, such as SOFL [9,16], Alloy [32,33], protocol specifications [34,35], and Object Constraint Language (OCL) specification [36,37]. Among these specifications, we take an interest in the formal specification of pre-post style like SOFL and Alloy. On the one hand, the SBT techniques for both SOFL and Alloy generate test data only from the pre-condition and use the post-condition as an oracle to check if the outcome is correct. On the other hand, the SBT with SOFL still needs to be improved since a data suite generated only from the original SOFL specification is not sufficient enough to trigger different kinds of bad behaviors of programs. On the contrary, our approach uses both the pre-and post-conditions to generate input data, as well as selects the optimal mutated specifications to enhance the bug detection.

GA with Mutated Specification
We first briefly introduce the basics of GA and then discuss how to obtain reformed specifications.

Description for GA
GA is a heuristic search method inspired from evolutionary biology and was first proposed by John Holland [38]. In general, a GA is involved in an iterative process with three steps: (i) create an initial population of solutions (called individuals) represented by a pre-defined chromosome that are typically encoded the solutions to a problem; (ii) in the existing population, select a group of individuals by a specified selection strategy based on a fitness function, and generate the next population from applying two key genetic operators, crossover and mutation to those selected individuals; (iii) repeat step (ii) until the remaining individuals in the generation are good enough according to both the fitness function and the stop criteria.
Since GA works well in finding optimal solutions for nonlinear problems and the specifications of pre-post style could be easily transformed to chromosomes with few efforts, we employ GA to find the best mutated specifications in this paper. Later, we will first describe how to transform the original formal specifications into a chromosome, and then carefully describe the evolution in step (ii) in detail for our specific goal: to obtain all the mutated functional scenarios from the specification, each a constraint over only input variables.

Mutation Testing
Mutation testing, also called program mutation [39], is used to design test cases and evaluates the quality of existing testing techniques. In mutation testing, some small modifications are injected into the original program. Each mutated version is called program mutant and test data are regarded as the good one if it kills the program mutants, that is, it makes the behavior of program mutants different from that of the original program.
In our approach, both programs and the specifications are mutated. The program mutants are used to evaluate the quality of mutated specifications. We search for good mutated specifications that can be used to effectively generate test data for bug detection.

Mutated Specifications
We use SOFL as the formal notation for specifications in this paper. There are two reasons: one is that SOFL, as a formal notation, is more comprehensible than other formal notations since it uses the comprehensible condition data flow diagrams for system structure as well as pre-and post-conditions for defining individual operations in the system. Another reason is that SOFL is familiar to us and its use in industry has been increasing [40].
In SOFL, the defining condition describes the constraints over input and output variables after a method in the program performs. Generally, the defining condition is not used for directly generating input data in most of existing techniques because the values of outputs in defining condition are unknown to us before the execution of the program. We consider the defining conditions as an important factor for test data generation from which the test data are sensitive to bad behaviors of the program.
Since defining conditions describe how output variables relate to input variables, they are often used to check whether an execution of the program is correct or not, rather than being used to directly generate input values. For a program, it is usually difficult to directly generate input values that satisfy a defining condition without knowing the corresponding output values. For instance, suppose input variable x and output variable y satisfy the defining condition (x * y > x + y), we cannot generate input x from (x * y > x + y) due to the unknown output y. Thus, usually (x * y > x + y) is not used to help generate the input values but can be used to check the result of executing the program with input x.
Nevertheless, by assigning appropriate values to the output variables in the defining conditions, we can get some useful mutated specifications that can be used to directly generate input values. For the defining condition (x * y > x + y) mentioned above, input data generated from (x * 2 > x + 2) (when y = 2) may be more likely to trigger bugs than from (0 > x) (when y = 0). Currently, it cannot be determined without further checking. However, (x * 2 > x + 2) is definitely better than (x * 1 > x + 1) (when y = 1) because the latter is always false and cannot be used to generate test data.
Our work mainly concentrates on developing a way to find appropriate output values for the specification. These output values are then used to build the mutated specifications that are the constraints over only input variables. Then, the mutated specifications can be directly used to generate input values in regression testing. To achieve that, we employ GA to seek such appropriate values of outputs from the defining condition.
Moreover, to obtain mutated specifications that are more powerful in bug detection, some extension is considered for reforming defining conditions before applying GA. In this extension, we make a slight change in defining conditions so as to induce the generated test data that satisfy those reformed ones to trigger as many bad behaviors of the program as possible.
In our method, mutated specifications are made from after applying GA to the original specification. More precisely, the mutated specifications can be obtained by following two rules:

1.
Reforming the original specification by introducing dummy variables into defining conditions so that test data that satisfy those reformed ones can trigger bad behaviors of the program; 2.
Finding appropriate concrete values through GA and assigning them to the output and dummy variables that occur in the reformed specification.
Our goal is to obtain a new version of the specification from which the test data suite can be generated to trigger as many bugs as possible in the program. Next, we will define the chromosome forms for the reformed specification, as well as describe the crossover operator and mutation operator. Then, we apply the GA for gaining the suitable mutated specifications that can do well in bug detection.
We define the form of chromosomes for a condition data flow diagram (CDFD) that is part of the SOFL language.
A condition data flow diagram (CDFD) is a directed graph that specifies how processes work together to provide functional behaviors [41]. Every process has its own pre-and post-conditions. For instance, Figure 1 displays a small CDFD that consists of two processes A and B where process A first consumes two input variables x and y and produces output z, and then process B consumes z and produces w.
However, the intermediate variables between two processes like variable z can not always be replaced in real CDFDs. Therefore, our discussion on test data generation from specifications focuses on a single process.

Chromosome Formation
In this approach, the specification is converted into an equivalent expression called functional scenario form (FSF).

Definition 1. A FSF of process is the disjunction of functional scenarios
which is the conjunction of the pre-condition S pre and the guard-condition G i ; and D i is a predicate called a defining condition.
The pre-condition S pre of process S is a constraint on the input, and it contains only input variables. A guard condition G i is part of the post-condition but contains no output variables. A defining condition D i is also part of the post-condition but contains at least one output variable. The functional scenario T i ∧ D i describes a single specific functional behavior: when test condition T i is true, the output of the operation is defined using defining condition D i . In this paper, we assume that any FSF ∨ n i=1 (T i ∧ D i ) of process S is complete, which means that any input satisfying S pre must make ∨ n i=1 T i true. Each functional scenario defines an independent function of the operation: when the test condition holds on the input variables, the output variables will be defined by the defining condition. Currently, test data generation from a functional scenario only takes the test condition into account meanwhile leaving the defining condition untouched [9,42,43]. Now, we explain how to make a slight extension to change the form of defining conditions so as to allow bad behaviors to occur. To obtain a more flexible and useful reformed specification, we introduce dummy variables, d i (i = 1, · · · , c), to the relationship of inputs and outputs from the defining condition. Then, we build an output vector from both the dummy variables and output variables.

Definition 2. An output vector o is a vector constructed by output variables and dummy variables:
n) are output variables, and d i (i = 1, · · · , c) are dummy integer variables.
For a relation ( f (inputs, outputs) 0) (where is a operator such as =, >, < . . . ) in the defining condition D i , by introducing dummy variables d 1 and d 2 , we construct an inequality d 1 <= f (inputs, outputs) <= d 2 and replace the relation f with this new inequality in D i . Then, the output vector is formed as o = (o 1, · · · , o n , d 1, d 2 ). In our work, we mainly make such change to only equality relations.
We change an equality relation to such an inequality relation because an equality relation is quite a strict condition that would drastically narrow down the exploration of input values for a single functional scenario when output values are determined by GA. Therefore, dummy variables need to be introduced for equality. For the inequality relation in the specification, dummy variables are not introduced to them because, compared with equality relation, inequality relation is not a too strict condition for the generation of input values. Thus, these kind of relations are used to preserve some original features of specifications. In addition, the experimental results in Section 5 also indicate that additional dummy variables for inequality cannot help considerably improve the quality of the mutated specifications.
where some dummy variables are introduced to D i . An individual (a mutated specification) is a constraint over symbolic inputs, established from the chromosome [T i ∧ D i ] o by assigning concrete values to the output vector o = (o 1, · · · , o n , d 1, · · · , d c ). A population is a group of such individuals. For convenience, the output vector o is also called d-chromo, and each element of o is called a genetic.
From this definition, a d-chromo o with concrete values determines an individual formula [T i ∧ D i ] o that is a constraint on symbolic inputs. Such an individual is a reformed specification that can be used to generate test data for the program. In order to obtain good individuals to generate test data that are useful for bug detection, we apply the genetic manipulation to a group of individuals [T i ∧ D i ] o and find the appropriate d-chromo o . Each individual will be scored by evaluating the quality of the test data that are generated from it.

Genetic Manipulations and Selection
The genetic manipulation refers to the change of genetic structure in biology, but, in the GA, it indicates that a "child" solution is produced from a pair of "parent" solutions by using genetic operators like crossover and mutation.
In the existing population, a pair of individuals (solutions) are selected as parents to perform the crossover operator to produce their offspring. More specifically, as illustrated in  To perform the mutation operator, each genetic of an individual is mutated with possibility q (0 < q < 1), as displayed in Figure 3. More clearly, for one individual [T i ∧ D i ] o with its d-chromo o = (o 1, · · · , o n , d 1, · · · , d c ), each genetic of it has the possibility q to be mutated: where is a different scalar of small value. , The factor Σ m j=1 I(k j > 0) in Kill_rate is intended to encourage each mutated functional scenario to generate a test data suite that can kill as many different kinds of program mutants as possible. The factor Sum(N_kill i,o ) as a part of the numerator in Grade would inspire every mutated functional scenario to generate a test data suite where most test data are effective enough to kill as many program mutants as possible.
For a given chromosome [T i ∧ D i ] o , its individual with appropriate d-chromo o i,best is regarded as the best if and only if this individual possesses the highest value of Grade in the whole population. GA is to find such best individuals for these chromosomes [T i ∧ D i ] o (i = 1, · · · , N).
After all the individuals from the current population are evaluated, GA would select most of the best ones to form a new population for the next generation. This process is called selection. In our approach, we evaluate all the individuals and sort them by descending, then select individuals in the top 50 percent of the current population to breed the next generation.
As we can see, GA is used to find the best individuals separately for each chromosomes [T i ∧ D i ] o (i = 1, · · · , N). In order to evaluate all the best individuals represented by different chromosomes, the final formula of evaluation is made as follows: We use the final formula to find all the mutated functional scenarios that together hit the highest final grade (i.e., do the best in bug detection), each mutated one with well-tuned values for dummy variables and output variables. Additionally, this final grade is also used for comparison between our approach and other techniques. In the case study, our method is compared with the conventional specification-based method with respect to test data generation for bug detection.

Algorithm Summary
Our approach that incorporates GA accomplishes the goal of obtaining the mutated specifications by taking three key steps:

1.
Inject faults into the original program to obtain a set of program mutants; 2.
Use reformed specification [T i ∧ D i ] o as seed chromosomes. Each chromosome corresponds to a group of individuals that are generated by assigning concrete values to the output vector in the chromosome; 3.
Apply GA to each chromosome and select the best individuals (the best mutated specifications). According to the original specifications, determine whether or not a test case from a mutated specification (a constraint over inputs) kills the program mutants. Figure 4 displays the whole evolution process of GA. In the first round of evolution, a group of individuals are initialized and evaluated. Then, the best individuals in the top k (k = 50% in this paper) of the group are selected to perform both crossover and mutation operators to produce their offspring for the next round. In the next round, all of the individuals are evaluated and the top k of them again prepare to breed a new generation by performing genetic manipulations. The population iteratively evolves in this process until there has been no improvement in the population or it reaches the predefined maximum number of generations.
In the mutation testing, we use Z3 [44], a widely used satisfiability modulo theories (SMT) solver, as our constraint solver to generate the data suite for each individual formula (i.e., each mutated functional scenario). The generation for a data suite takes three steps: (1) use Z3 to generate a test data that satisfies the logical formula, (2) exclude all the test data obtained previously from the logical formula; (3) go to step (1) to obtain another piece of test data unless enough test data are obtained. Each individual formula is evaluated by the fitness function that measures the quality of the test data suite. We give the pseudo-code of the algorithm in Appendix A.

Case Study
In this section, we apply GA to two classic examples to demonstrate the effectiveness of the proposed method. The original specifications are used as test oracles for determining whether the outputs are correct or not during the evaluation of individuals.
We compare our method with the conventional method, called original specificationbased method, which directly generates input data from the original specifications by using Z3. In the original specification-based method, neither dummy variables nor defining condition are used to generate input values, since the defining condition contains output variables with unknown values. The input data are directly generated from only test conditions (pre-condition and guard-condition over only input variables) by using Z3.
Sixteen program mutants are prepared for each program in the way that the injected faults will not cause execution crash and infinite loops since we only focus on the functional bugs in this paper. Both methods generate a test suite of the same size 20 every time to execute these program mutants in the evaluation process.

Case Study 1: Mod
In this program, process Mod is to find the quotient q and remainder r from dividing y by x. For Mod, we give its formal specification in SOFL and the implementation in Python.
The formal specification of Mod is: process Mod (y: int, x: int) r: int, q: int

Its implementation in Python is:
def Abs(x): if x>=0: return x else: return -x def mod(y, x): r = y; q = 0; if y!=0: if x*y > 0: while Abs(x) <= Abs(r): r = r -x q = q + 1 else: while x*r < 0: r = r + x q = q -1 return r, q In this specification, Abs is a function for calculating the absolute value of its input. To shorten the explanation of each step, assume Abs is an inline executable predicate. Both −7 mod 5 = 3 and −7 mod 5 = −2 satisfy the classic definition y = q * x + r ∧ Abs(r) < Abs(x). To avoid the ambiguity, the specification of Mod puts an additional condition xr ≥ 0 in order to get only one result of −7 mod 5 = 3.
In the specification, the pre-condition, guard-conditions, and defining conditions are listed as: We can obtain the functional scenarios T i ∧ D i := S pre ∧ G i ∧ D i as follows: For T 3 ∧ D 3 , the input x and y are not related to the output q and r, so we do not need to apply GA to it. Since there is an equality y = q * x + r in which inputs and outputs are related, we introduce two dummy variables d 1 and d 2 . The chromosomes of Mod are displayed in Table 1.

Chromosome D-Chromo Dummy Vars
Apply Algorithm A1 to these chromosomes. The results are displayed in Table 2. Table 2. Results for process Mod after applying GA.

The Best Individual of Chromosome Grade
To illustrate the effectiveness of data generation from the mutated specifications, Table 3 displays the results of the conventional method that generates the test data directly from the original specifications. For the original specification, we generate test data only from the test condition T i consisting of both pre-condition S pre and guard-condition G i meanwhile ignoring the defining condition D i because the defining condition D i involves unknown output variables that can not directly help to generate test data. Table 3. Results for process Mod with original specifications.

Original Specification Grade
For the proposed method, the final Kill_rate is 100%, the same as the conventional method. It means that every program mutant has been killed by at least one piece of test data. The corresponding final Grade is 0.59, larger than the Grade of 0.37 with the original specification-based method, indicating that the test data generated from are of high quality that are more likely to kill all the program mutants.
The result suggests that it is plausible to use these best individuals of chromosomes to make four mutated specifications for test case generation in the further maintenance of the original program.
Comparing the reformed specifications with the original ones in Figure 5, we can find the Grade of reformed ones that are always larger than that of original ones. It means that the data suite generated from the mutated specifications is more likely to pinpoint bugs than that of original ones, although both of them share the same Kill_rate of 100%.

The Effect with Dummy Variables
We conduct additional experiments to figure out how dummy variables introduced into the different parts of defining conditions would affect the quality of the obtained mutated specifications. We make three versions of modifications to our approach as follows:
Version V2: Introducing dummy variables into both equality and inequality relation; 3.
Version V3: Putting no dummy variables in defining conditions.
For convenience, the approach with no modification, that is, with dummy variables for only equality relation, is called Version V0.
The previous experimental result for V0 and the original, as well as the results from after applying variations of the approach V1,V2, and V3 to process Mod, are together displayed in Figure 6.
According to Figure 6, three approaches with dummy variables V0, V1, and V2 gain higher Grades than the approach without dummy variables V3, and even V3 seems to behave better than the conventional method. There are no significant differences between the evaluation of V0 and V2. However, V2 would occupy more computation resources than V0 due to the consideration of more dummy variables. It seems that V1 gains a little better final Grade than V0, though its Grade for each single mutated functional scenario is not good enough. In addition, by using an approach without dummy variables V1 and V3, every obtained single mutated functional scenario demonstrates the strong capability to kill some specific program mutants while leaving other program mutants not killed, though the combination of all the functional scenarios in V1 can reach 100% total Kill_rate while, for V3, the total Kill_rate unfortunately remains in 87.5%. This result demonstrates the importance of introducing dummy variables into equality relation in order to accomplish both good single and total Grades and Kill_rates.
In summary, it is necessary to introduce dummy variables into equality relations, and the additional dummy variables for inequality relation cannot significantly improve the proposed approach.

Case Study 2: Gcd
Process gcd is to compute the greatest common divisor of two inputs by using Stein's algorithm.
The formal specifications of gcd is: process gcd (x: int, y: int) r: int The implementation of process gcd in Python is: def gcd(x, y): if x < y: x, y = y, x if (0 == y): return x if x % 2 == 0 and y % 2 == 0: return 2 * gcd(x//2, y//2) if x % 2 == 0: return gcd(x // 2, y) if y % 2 == 0: return gcd(x, y // 2) return gcd((x -y) // 2, y) Process gcd is a recursive process and its post-condition contains itself, so it is difficult to generate data from this kind of post-condition. We transform the original post-condition to the following ones: Table 4 shows the chromosomes of process gcd.
Apply the algorithm to all of the chromosomes; in the meantime, make use of the original post-condition to determine whether the outputs of codes are correct or not. The results are displayed in Table 5.

Chromosome D-Chromo Dummy Vars
Conversely, the result for applying the method that generates test data directly from the original specification is displayed in Table 6. Table 6. Results for process gcd with original specifications.

Original Specification
Grade Comparing the reformed specifications with the original ones in Figure 7, we can find that the first two reformed ones [T 1 ∧ D 1 ] o and [T 2 ∧ D 2 ] o have very high values of Grade, 0.72 and 0.58, respectively, higher than 0.29 and 0.49 with the original specifications. In addition, the Kill_rate of a sole [T i ∧ D i ] o (i = 1, 2) is 94%, indicating that the test data generated from the first two reformed specifications are likely to pinpoint most bugs probably occurring in the program. Only a few program mutants (6% of total), with some faults that directly relates to the last two functional scenarios T 3 ∧ D 3 and T 4 ∧ D 4 (where x = 0 or y = 0), cannot be killed by the test data generated from either the first two reformed specifications or the first two original specifications. Due to the very simple forms and the limited functionality of the last two functional scenarios, there is no improvement of test data generation using our method against the original ones. The results from both classic examples demonstrate that the input data generated from the mutated specifications are more likely to kill the mutants of programs than that from the original specifications.

The Effect without Dummy Variables
Like what we have done for process Mod, we conduct additional experiments with the approach without using dummy variable V3 since gcd only has equality relations in the defining conditions. The results are shown in Figure 8. The experimental results are similar to that in Mod. V0 performs better than V3. V3 still encounters the problem that every single mutated functional scenario is not able to kill all the program mutants. It shows that the test data generated from those strict equality relations are less likely to trigger some bad behaviors of program.

Complexity of Our Approach
We present an abstract analysis of the complexity for our approach. Generally, a GA complexity is on the order of O(g * n * m) without the effect of the fitness function, where g is the number of generations, n is the population size, and m is the number of functional scenarios. Since our approach uses a fitness function involved in the mutation testing, we should take both the program execution time and the data suite generation time into consideration.
As the speed of the constraint solver to solve an individual formula (to generate a test suite) depends on the complexity of the functional scenarios (logical formulas) whose complexity cannot be easily determined, we associate the cost of using the constraint solver for a singular individual with the number of input variables in, the number of output variables out, and the number of dummy variables d. In addition, the number of dummy variables relies on the number of equality relation in each functional scenario, which varies in different kinds of programs. We simply assume that each functional scenario has at least one equality relation. Thus, the complexity of using the constraint solver for each individual is O(in + out + 2 * d * m). Moreover, the complexity of all the executions for program mutants is approximately O(mu * sui) with mu the number of program mutants and sui the size of test data suite. Finally, considering the complexity of the GA together with the mutation testing, the complexity for our approach is O(g * n * ((in + out + 2 * d * m) * (mu * sui)) * m).

Conclusions
We propose a new method for effective test data generation based on mutated pre-post style formal specifications. The method is characterized by the integration of the functional scenario-based testing, a genetic algorithm and the mutation testing. In the approach, by assigning appropriate values to the unknown output and dummy variables to the variations for the original specifications, we can obtain useful mutated specifications that are sensitive to small syntactic structural changes of program codes.
We have also carried out two classic cases to evaluate the performance of our method. The results of case studies demonstrate that, for a complicated functional scenario, the proposed approach is capable of effectively generating useful test data to kill as many program mutants as possible, which outperforms the conventional data generation method.
In spite of the advantages of our method as mentioned above, there are also some limitations and disadvantages in the application of our method. First, the proposed method can only work on arithmetical relationships between inputs and outputs in which outputs affect the generation of inputs. Second, as the GA usually iterates many times and executes all the program mutants for every iteration, the cost would not be low. However, if we have enough computing resources for applying our method, it might be worth taking time to obtain good reformed specifications for the further maintenance of software.
In order to cope well with complex real programs, some additional extensions can be made in our approach. Firstly, by using the character encoding standard like US-ASCII [45], we can convert a String to a byte array so that the relationship that contains string variables can also be manipulated by our method. Moreover, since many research works exist concerning about the techniques of encoding complex data [46][47][48] that may occur in specifications like images and videos, it is possible to transform these specifications into appropriate arithmetical relationships so that our approach can be used in such cases. Secondly, although there exist specifications where the input and output variables are not specified by some explicit arithmetical equality relation, our method would still be applicable. Because instead of directly using these specifications, we can design some mutated arithmetical relationships (in form of inequality) of input and output that can not only approximate to the real properties of program but also leave open the possibility of occurrence of unexpected behaviors. Thirdly, when testing a big complex system, we can decompose it into a set of subroutines and focus on testing small procedures one by one using our approach. Thus, there is no need to repeatedly executing the whole system with our algorithm.
In future work, we will focus on enhancing the capability of this method to deal with more kinds of relationships between inputs and outputs where the values of outputs may not directly determine the inputs. We will conduct more experiments to ensure that our method can be well used in different kinds of programs.
Funding: This work was funded by JSPS KAKENHI Grant No. 26240008.

Conflicts of Interest:
The authors declare no conflict of interest.