Traceability analyses between features and assets in software product lines

: In a Software Product Line (SPL), the central notion of implementability provides the 15 requisite connection between speciﬁcations and their implementations, leading to the deﬁnition of 16 products. While it appears to be a simple extension of the traceability relation between components 17 and features, it involves several subtle issues that were overlooked in the existing literature. In 18 this paper, we have introduced a precise and formal deﬁnition of implementability over a fairly 19 expressive traceability relation. The consequent deﬁnition of products in the given SPL naturally 20 entails a set of useful analysis problems that are either reﬁnements of known problems or are 21 completely novel. We also propose a new approach to solve these analysis problems by encoding 22 them as Quantiﬁed Boolean Formulae (QBF) and solving them through Quantiﬁed Satisﬁability 23 (QSAT) solvers. QBF can represent more complex analysis operations, which cannot be represented 24 by using propositional formulae. The methodology scales much better than the SAT-based solutions 25 hinted in the literature and were demonstrated through a tool called SPLAnE (SPL Analysis Engine) 26 on a large set of


Introduction
Software Product Line Engineering(SPLE) is a software development paradigm supporting joint design of closely related software products in an efficient and cost-effective manner.The starting point of an SPL is the scope, which defines all the possible features of the products in the SPL.The scope is said to define the problem space of the SPL, describing the expectations and objectives of the product line.The description is typically organized as a feature model [1] that expresses the variability of the SPL in terms of relations or constraints (exclusion, requires dependency) between the features and defines all the possible products in the product line.
An important step in Software Product Line Engineering (SPLE) is the development of core assets, a collection of reusable artifacts.The core assets contains the components, and we use the term component to represent any artifacts which contributes in products development like code, design, documents, test plan, hardware, etc.A component is an abstract concept of any assets used in products.The core assets, define the solution space of the SPL and are developed to meet the expectations outlined in the problem space [2].They are developed for systematic reuse across the different products in the SPL [3,4].The variability in core assets across the components is represented by a component model.The components in a component model, may also have exclude and requires dependency constraints, similarly to feature models.
Given the problem and solution spaces for an SPL, as defined by the scope and the core assets, the next important step is traceability, which involves relating the elements (features, core assets) at these two levels [2].
The focus of this work is formal modeling and analysis of traceability in an SPL.There are many relationships possible, one of the most useful and natural one is the implementability relation that associates each feature in the scope with a set of core assets that are required for implementing the feature(s) [5].Beside implementability, many other notions have been defined, thanks to the integration of the variabilities of the problem space and the solution space in the proposed framework.
For example, one could be interested in checking whether every product in the problem space has a correspondence in the solution space, i.e., every product represented in the feature model can be implemented using the existing assets considering the implementability relation.Another example is the property to check every asset of an SPL needs to be maintained not only because it is involved in some implementations, but the asset is only option.Let us consider an example from the cloud computing domain.The company offers a service to rent computers on a cloud with different possible software configurations using Linux-based distributions.In the back-end, instead of providing physical machines, the company provides virtual machines with some software package installed on them.Thus, the configuration of machines can be generated on demand according to the needs of the users.In order to improve the speed of creation of a new machine, there are pre-configured machines ready to launch.
In this example, the possible configurations offered to the users define the problem space.The set of available Linux packages implementing the features is the core assets.The pre-configured machines can be seen as another set of assets (limited but available immediately).
The following are some examples of relevant analyses that could arise in this example: • Check if at least one of the pre-configured machines covers the needs of a new user configuration.
• Check if at least one of the pre-configured machines realize (exactly) the needs of a new user configuration.
• Check if there are dead packages, i.e. packages that can not be in any of the virtual machines.
In the literature, formal modeling and analysis of variability at the feature model level has been studied extensively, and several efficient tools have been built to carry out the analyses [6,7].The main idea behind all these works is that the variability analysis can be reduced to constraints and variables modeling the feature level variability [2,[8][9][10][11][12][13][14][15].While there are several recent works on traceability, most of them have confined themselves to an informal treatment [16][17][18][19].Some works have chosen a formal approach for representing traceability and configuration of features [20].
In the past, most of the work [6,7] has encoded variability analysis operations in propositional formulae [21].There are various SAT solvers, like SAT4j [22] or MiniSAT [23] or PicoSAT [24], which can be used to check the satisfiability of a propositional formula.We propose a novel approach for modeling traceability and other notions relating features and core assets using Quantified Boolean Formula(QBF) [25].QBF is a generalization of SAT boolean formulae in which the variables may be universally and existentially quantified.QSAT solver is used to check the satisfiability of Quantified Boolean Formula (QBF).In this work, we make use of the well-known QSAT solver, CirQit [26] and RAReQS-NN [27].
An early version of this work was published [28].The proposed method has been implemented in a tool that is integrated with the FaMa framework [29].This tool, called SPLAnE [30], can model feature models, core assets (component models), and a traceability relations.SPLAnE is a feasible solution for automated analysis of feature models together with assets relations.We believe that this article opens the opportunity for new forms of analysis involving variability models, assets and traceability relations.The following summarizes the contributions of this paper: • A simple and abstract set-theoretic formal semantics of SPL with variability and traceability constraints are proposed.
• A number of new analysis problems, useful for relating the features and core assets in an SPL, are described.
• Quantified Boolean Formulae (QBFs) are proposed as a natural and efficient way of modeling these problems.The evidence of scalability of QSAT for the analysis problems in large SPLs (compared to SAT) is also provided.
• We present a tool named SPLAnE that enables SPL developers to perform existing operations in the literature over feature diagrams [6] and many new operations proposed in this paper.It also allows to perform analysis operations on a component model and SPL model.We used the FaMa framework to develop SPLAnE that makes it flexible to extend with new analyses of specific needs.
• We experimented our approach with numbers of models i.e. i) Real and large debian models, ii) Randomly generated SPL models from ten features to twenty thousand features with different level of cross-tree constraints and iii) SPLOT repository models.The experimental results also gives the comparison across two QSAT solvers (Cirqit and RaReQS) and three SAT solvers (Sat4j, PicoSAT and MiniSAT).
• An example from the Cloud computing domain is presented to motivate the practical usefulness of the proposed approach.
Paper organization.The remainder of the paper is organized as follows: Section 2 shows a motivating scenario for using SPLAnE ; Section 3 presents the tool SPLAnE which is implemented based on proposed approach; Section 4 describes different analysis operations to extract information by using the SPLAnE tool.Section 5 analyzes empirical results from experiments that evaluate the scalability of SPLAnE ; Section 6 compares our approach with related work; and finally, Section 8 presents concluding remarks.

Motivating example
In this paper, we present the cloud computing as a product line.Feature model and component model is used to manage the variability across the scope and core assets respectively.

Feature Models
Feature models have been used to describe the variant and common parts of the product line since Kang [31] has defined them.The sets of possible valid combinations of those features are represented by using different constraints among features.The feature model in Figure 1 represents the features provided by the cloud computing.Two different kinds of relationships are used: i) hierarchical relationships, which describe the options for variation points within the product line; and ii) cross-tree constraints that represent constraints among any features of the feature tree.Different notations have been proposed in the literature [6]; however, most of them share the following relationship flavors: Four different hierarchical relationships are defined: • mandatory: this relationship refers to features that have to be in the product if its parent feature is in the product.Note that a root feature is always mandatory in feature models.
• optional: this relationship states that a child feature is an option if its parent feature is included in the product.
• alternative: it relates a parent feature and a set of child features.Concretely, it means that exactly one child feature has to be in the product if the parent feature is included.
• or: this relationship refers to the selection of at least one feature among a group of child features, having a similar meaning to the logical OR.
Later, two kinds of cross-tree relationships are used: • requires: this relationship implies that if the origin feature is in the product, then the destination feature should be included.
• excludes: this relationship between two features implies that, only one of the feature can be present in a product.
Cloud computing technology provides ready to use infrastructure for the clients.The cloud system reduces the cost of maintaining the hardware and software, and also reduces the time to build the infrastructure on the client side.The client pays only for the hardware and software used based on the duration.The feature model for cloud computing is shown in Figure 1.The root feature Virtual Machine is a mandatory feature by default.The mandatory relationship is present between the feature Virtual Machine and UserInter f ace, so the feature UserInter f ace has to be present if feature Virtual Machine is present in the product.The optional relationship is present between the feature Language and Virtual Machine, so it is optional to have feature Language in the products.The feature GU I has alternative relationship with its child features {KDE, GNOME, XFCE}.Hence, if feature GU I is selected in a product, then only one of its child feature has to be present in that product.The feature Server has or relationship with its child features {Tomcat, Glass f ish, Klone}.Hence, if feature Server is selected in a product, then at least one of its child feature has to be present in that product.
The feature C++ requires the feature C to be present in a product.The presence of feature Tomcat in a product does not allow the feature Klone and vice versa.The client can request for a system with the set of features called a speci f ication.The minimum set of features in the specification should contain features {VirtualMachine, UserInter f ace, Console} as they are mandatory features.We can term this as commonality across all the products of an SPL.The specification F={ VirtualMachine, UserInter f ace, Console, GU I, KDE, Langauge, C} is valid for the creation of a virtual machine because it satisfies all the constraints in the feature model.The specification F={ Virtual Machine, UserInter f ace, Console, GU I, KDE, Language, C++} is not valid because it contains a feature C++ so it is necessary to select feature C.

Component Model:
Similar to a feature model, same notations can be used to represent variability amongst the components present in core assets of an SPL, we call it a Component Model (CM).The variability amongst the components can also be represented by any other models like the Orthogonal Variability Model (OVM), Varied Feature Diagram (VFD) and Free Feature Diagrams (FFDs) [20,32].The component model in Figure 2 represents the resources available to create a virtual machine.The cloud computing technology will create a virtual machine that contains a set of components required to implement the features present in the client speci f ication.
Such set of components is called an implementation.The implementation C={LinuxCore, IUser, IConsole, Terminal, ILanguage, C-lang, c-lib } is valid because it satisfies all the constraints on the component model, so a virtual machine can be created with these components.The implementation C={LinuxCore, IUser, IConsole, ILanguage, C-lang, c-lib } is invalid, because the component Terminal or XTerminal or both are required to satisfy the component model constraints.Table 1 shows the traceability relation between the features and the components.The entry in the row of feature Glass f ish means the component Glass f ishApp implements the feature Glass f ish.
Similarly, the feature Console can be implemented by the set of components: {IConsole, XTerminal} or {IConsole, Terminal}.For each feature in the client specification, the traceability relation gives the required sets of components.In the feature model, the effective features are only the leaf features.The traceability of a parent feature like the feature GU I can be implemented by the set of components: {IGU I}.The feature GU I can be abstracted by eliminating all of its child features {KDE, GNOME, XFCE}; this allows to analyze the SPL at a higher level of abstraction.Section 3 refer column 3 and 4 from Table 1 to represent the short name for features and components respectively, Figure 3 shows the four preconfigured virtual machines.The preconfigured machines show the set of components from the component model shown in Figure 2.This extraction is usually known as the automated analysis in the area.To reason about those models, the relationships existing in the feature model are processed through a CSP, SAT, BDD solver or a specific algorithm.Later, the operation is used to extract specific information from the model.An SPL with twelve leaf features can result in a search space of 2 12 possible products.Analysis of such a huge search space is a non-trivial task.Some interesting analyses that could performed in this scenario of a Virtual Machine Product Line (VMPL) are as follows: 1. Check if at least one of the pre-configured machines covers the needs of a new user configuration: In VMPL, there is always a need to check the existence of any virtual machine as per the given user specification.For example, the specification F={Virtual Machine, UserInter f ace, Console, GU I, GNOME} should be first analyzed to check the existence of any implementation that implements F. The implementation C={LinuxCore, IUser, IConsole, Terminal, IGU I, GNOMEApp, IServer, TomcatApp} (equivalent to preconfigured virtual machine 2 in Figure 3) provides all the features in the specification F, it means that there exists a pre-configured machine which covers the user specification F.
2. Check if at least one of the pre-configured machines realizes (exactly) the needs of a new user configuration: Multiple implementations may cover a given user specification F. We can analyze the VMPL to find the realized implementation for the user specification.For example, the implementation C={LinuxCore, IUser, IConsole, Terminal, IGU I, GNOMEApp} (equivalent to preconfigured virtual machine 3 in Figure 3) exactly provides all the feature in the specification F.
3. Check if there are dead packages: Actual VMPLs contain a huge number of components for Linux systems.The components that are not present in any of the products are termed as dead elements in the product line.In the given VMPL, none of the components is dead.

Specification and Implementation
The set of all features found in any of the products in a product line defines the scope of the product line.We denote the scope of a product line by F .A scope F consists of a set of features, denoted by small letters f 1 , f 2 . . . .Specifications are subsets of features in the scope and are denoted by F 1 , F 2 , . . ., with possible subscripts.On the other hand, the collection of components in the product line defines the core assets and is denoted as C. Small letters c 1 , c 2 . . .etc. represent components.
Implementations (subsets of components) are denoted by capital letters C 1 , C 2 . . . .with possible subscripts.A Product Line (PL) specification is a set of speci f ications in an SPL, denoted as F ∈ ℘( ℘( Similarly, a Product Line (PL) implementation is denoted as C ∈ ℘(℘(C) \ {∅}).In VMPL, the scope, core assets, specifications and implementations are as follows:

Traceability:
We present a formalism for two variation of traceability relation: i) 1: M mapping and ii) N:M mapping.In traceability relation, 1:M mapping is between a feature and a set of component sets, were as N:M is a mapping between feature set and a set of component sets.

Traceability with 1:M mapping:
A feature is implemented using a set of non-empty subset of components in the core asset C.This relationship is modeled by the partial function , we interpret it as the fact that the set of components C 1 (also, C 2 and C 3 ) can implement the feature f .When T ( f ) is not defined, it denotes that the feature f does not have any components to implement it.
Traceability with N:M mapping: A set of features can be implemented using a set of non-empty subset of components in the core asset C.This relationship is modeled by the partial function T : (℘(F ) \ {∅}) → ℘(℘(C) \ {∅}).It may happen that, two features f 1 and f 2 can be implemented by In order to extend the definition to specifications and implementations, we define a function Provided_by(C) which computes all the features that are implemented by C: With the basic definitions above, we can now define when an implementation exactly implements a specification.
The realizes definition given above is rather strict.Thus, in the above example, the implementation C 3 realizes the specification F 2 , but it does not realize F 1 even though it provides an implementation of all the features in F 1 .In many real-life use-cases, due to the constraints on packaging of components, the exactness may be restrictive.We relax the definition of Realizes in the following.Thus, in the VMPL example, we see that there are many potential products.Valid products are

Analysis Operations
Given an SPL Ψ = F , C, T , we define the following analysis problems.The problems center around the new definition of an SPL product.

SPL Model Verification:
Q. Is it a valid SPL model ?Is it a void SPL model ?Is the SPL model complete ?
A given SPL model Ψ = F , C, T is valid, if there exists a specification and implementation.
Let's assume a feature model with three features f 1 , f 2 and f 3 .The feature f 1 is the root and the features f 2 and f 3 are the mandatory children of f 1 .An excludes relation exists between f 2 and f 3 . The

Complete and Sound SPL:
Q. Does the SPL model is adequate for all the user specifications ?Do all implementation has it corresponding specification ?Which are the useful implementations ?Is there at least one implementation which realizes a given user specification ?
The completeness property of the SPL relates to the implementability of a specification.A specification F is implementable if there is an implementation C such that Covers(C, F).Completeness determines if the PL implementation (set of implementation variants) is adequate to provide implementations for all the variant specifications in the PL specification.An SPL F , C, T is complete if for every F ∈ F , there is an implementation C ∈ C such that Covers(C, F).The soundness property relates to the usefulness of an implementation in an SPL.An implementation is said to be useful if it implements some specification in the scope.An SPL F , C, T is sound if for every C ∈ C, there is a specification F ∈ F such that Covers(C, F).The complete and sound are very crucial properties of any SPLs.Does a VMPL is able to provide a virtual machine for every valid requirements (specifications) from users?, if YES then the VMPL is complete.If there is some specification which cannot be implemented by any of the implementation in the PL implementation, then such PL implementation is not adequate to fill the wish of all the user specifications.In VMPL, there may be such requirements for which no virtual machine can be generated.In such case, either feature model, component model or traceability relation should be analyzed to figure out the actual problem.On the other hand, PL implementation may provide huge set of implementation where as PL specification may be answered by a subset of PL implementation.In case of VMPL, we may end up with such virtual machine which may not get covered by any of the user specifications.Such machine should be removed from the pre-configured machine list.

Product Optimization:
Q. Do the given specification and implementation, forms a product ?Is there an implementation which provide all the features in a given user specification ?Is there an implementation exactly meeting a given user specification ?Is there only one implementation for a given specification ?
Given a specification, we want to find out all the variant implementations that cover the specification.This is given by a function FindCovers(F) = {C| Covers(C, F) }.At times, it is necessary for a premier set of features to be provided exactly for some product variants.For example, a client company with a critical usage of the product would limit the risk of feature interaction.
In this case, we want to find out if there is an implementation that realizes the specification.A specification is existentially explicit if there exists an implementation C such that Realizes(C, F).Dually, it is universally explicit if for all implementations C ∈ C, Covers(C, F) implies Realizes(C, F).Multiple implementations may implement a given specification.This may be a desirable criterion of the PL implementation from the perspective of optimization among various choices.Thus, the specifications which are implemented by only a single implementation are to be identified.
Is there a virtual machine which provide all the features as per the client specification?A covers is more relaxed version where a specification is implemented by an implementation, but the implementation may contain extra components which may not require to implement any of the features in a specification.
It may happen that, the cloud may have such pre-configured virtual machines which provides all the features as per user specifications.Also this pre-configured machines has extra components which are not required to support any of the features in user specifications.This may results in redundancy of components in a virtual machine.Is there a virtual machine which provide exactly all the features as per the client specification?The tighter version of cover is realize, which strictly does not allow any extra components which are not required for features implementation present in a specification.A realize is the optimized version of cover operation.Finding the optimized virtual machine on cloud which match the exact user specifications is achieved by realize.Is there atleast one virtual machine which provides exactly all the features as per the client specification?The existentially explicit operations guarantee the presence of at least one implementation which is realized by a given specifications.
It means, in VMPL for a user specification there exists at least one virtual machine which realizes it and this guarantees the presence of at least one optimized configuration.The universally explicit is the tighter version of existentially explicit, which means all the implementation covers by a given specification implies that it is an realization.For universally explicit specifications, cloud always produce the optimized virtual machine.Does a given user specification has only one virtual machine provided by cloud?In VMPL, there may be some specification which is covered by only one virtual machine, such implementations are unique.

SPL Optimization:
Q. Does an element is present across all the products ?Does an element is used in at least single product ?Does an element not in use ?Which all elements are redundant in a given product ?Which are the extra features provided by a product apart from the given user specification ?
Identification of common, live and dead elements in an SPL are some of the basic analyses operations in the SPL community.We redefine these concepts in terms of our notion of products: An element e is common if for all F, C ∈ Prod(Ψ), e ∈ F ∪ C.An element e is live if there exists Can virtual machine provide more features with the same set of components?When a specification is covered (but not realized) by an implementation, there may be extra features (other than those in the specification) provided by the implementation.These extra features are called extraneous features of the implementation.Since there can be multiple covering implementations for the same specification, we get different choices of implementation and extraneous features pairs : User may demand for virtual machines with some specification.The available pre-configured machine provide all the features in user specification, and also provide few more features which are extraneous.

Generalization and Specialization in SPL:
Do the union of two or more products result in a new product ?What is the difference between two products ?
In an SPL, sometimes there is a need to check the aggregation relationship between the specifications, implementations or products.Is there a virtual machine which has features provided by a given set of virtual machines?The union property on two specifications will result in a new specification which has features of both the specifications.Let's say specification F 1 has features { f 1 , f 2 , f 3 } and specification F 2 has features { f 2 , f 5 , f 7 }.The union property will check for some specification F which has features of specifications F 1 and F 2 , so F should have features { f 1 , f 2 , f 3 , f 5 , f 7 }.Assume an excludes relation between features f 3 and f 5 , then the union property will return FALSE.In VMPL, the user always demand a virtual machines which has equivalent features of two or more machines.The union property is used to verify the combination of two or more virtual machines is valid.Similar to specifications, this property can be applied on implementations or products.
In an SPL, most of the time there is a need to distinguish between the multiple specifications or implementations or products.Is there a virtual machine those features are present in all virtual machines in a given set?The intersection property on multiple specifications will check the existence of any specification which is common to those specifications.Let's say specification , then the intersection property applied on specification F 1 and F 2 will result in The distinguishable features or variants between F 1 and F 2 are obtained as A specification which is contained in all the specifications of an SPL is called core speci f ication.The intersection property applied on a given SPL model will result in a core speci f ication.Similar to specifications, this property can be applied to implementations or products.
In the literature, different analysis problems in SPLs are usually encoded as satisfiability problems for propositional constraints [33] and SAT solvers such as Yices [34] or Bddsolve [35] are used to solve them.As it has been noted in [20], it is not possible to cast certain problems such as completeness and soundness as a single propositional constraint.However, we observe that these problems need quantification over propositional variables encoding features and components.The most expressive logic formalism, Quantified Boolean Formula (QBF), is necessary to encode such analysis problems.The Boolean satisfiability problem for a propositional formula is then naturally extended to a QBF satisfiability problem (QSAT).
The In SPLs, the complexity of analysis operations, like valid model or void product model which can be represented using propositional logic belongs to ∑ P 1 = ∃ P (Φ), where p is the class of all feasibly decidable languages [36].We found few analysis operations discussed in this paper like soundness or completeness that cannot be encoded using SAT formulae, but easily by using QBFs.The complexity of the soundness and completeness operations belongs to the class Π P 2 = ∀ P ∑ P 1 .More complex analysis operations, like universally explicit, unique implementation belongs to the class ∑ P 3 = ∃ P Π P 2 [36].
Similarly, QBF can be easily used to represent formula belonging to more complex classes.
Let C = {c 1 , . . ., c n } be the core assets and let F = { f 1 , . . ., f m } be the scope of the SPL.Each Let CON F and CON I denote the set of constraints over the propositional variables capturing the PL specification and PL implementation respectively.Given the PL specification and PL implementations as sets, it is straightforward to get these constraints.When one uses richer notations, like feature models, one can extract these constraints following [33].For the traceability, the encoding CON T is as follows.Let f be a feature and let T ( for the features f i for which T ( f i ) is defined and FALSE if T (.) is not defined for any feature.
The implementation question whether implements(C, f ) is now answered by asking whether the formula C for the set of components C along with the traceability constraints CON T can derive the feature f .This is equivalent to asking whether C ∧ CON T ∧ (¬p f ) is UNSAT.Since it is evident that only T ( f ) is used for the implementation of f , this can further be optimized to However, as we will see later, since implements(., .) is used as an auxiliary function in the other analyses, we want to encode it as a formula with free variables.
Thus, f orm_implements f (x 1 , . . ., x n ) is a formula which takes n Boolean values (0 or 1) as arguments, corresponding to the bitvector C of an implementation C and evaluates to either TRUE or FALSE.
This forms the core of encoding for all the other analyses.Hence, the correctness of this construction is crucial.Lemma 1 states the correctness result.The proof is given in [37].
In order to extend the construction to encode Covers, we construct a formula f _covers (x 1 , . . ., x n , y 1 , . . . ,y m ) where, the first n Boolean values encode an implementation C and the subsequent m Boolean values encode a specification F. The formula evaluates to TRUE iff Covers(C, F) holds.
Similarly, we have encoded Realizes in a Quantified Boolean Formula as below.Notice the replacement of "⇒" in f _covers(..) by "⇔" in f _realizes(..).
⇒ (p c 1 )}.The formula f orm_implements f 1 holds true, so we check the formula f orm_implements f i for remaining features in the specification F 1 .Since it is true for all the features, the Covers(C 2 , F 1 ) holds.
Since this is FALSE, we conclude correctly that Covers(C 2 , F 2 ) does not hold.
We encode the other analysis problems as QBF formulae as shown in Table 2.The theorem 5 asserts the correctness of the encoding.In the theorem, for the constraint CON I , CON I [q c 1 , . . ., q c n ] denotes the same constraint where each propositional variable p c i has been replaced by a new propositional variable q c i .
Theorem 5. Given an SPL Ψ, each of the properties listed in Table 2 holds if and only if the corresponding formula evaluates to true.

Validation
In order to validate the approach presented in this paper, a tool SPLAnE for the automated analysis of SPL models has been developed.SPL models consists of feature models with traceability relationships to the component models (Core assests).The Virtual Machine Product Line (VMPL) case study based on cloud computing concepts is presented and analyzed.

SPLAnE
SPLAnE (Software Product Line Analysis Engine) is designed and developed to analyze the traceability between the features and implementation assets.Nowadays, there is the large set of tools that enable the reasoning over feature models.However, none of them is capable of reasoning over the feature model and a set of implementations as described throughout this paper.For the sake of reusability and because it has been proven to be easily extensible [38,39], we chose to use the FaMa framework [29] as the base for SPLAnE .The FaMa framework provides a basic architecture for building FM analysis tools while defining interfaces and standard implementation for existing FM operations in the literature such as Valid Model or Void Product Model.
On the one hand, SPLAnE benefits from being a FaMa extension in different ways.For example, SPLAnE can read a large set of different file formats used to describe feature models.It is also possible to perform some of the existing operations in the literature to the feature model prior to executing the reasoning over the component layer.On the other hand, FaMa was not designed for reasoning over more than one model.Therefore, different modifications have been addressed to fill this gap.
Namely, i) we modified the architecture to enable this new extension point into the FaMa architecture; ii) created a new reasoner for a new set of operations; iii) implemented the operations, and iv) defined two new file formats to store and input traceability relationships and component models in SPLAnE .SPLAnE translator.QPRO [40] and QCIR [41] format is a standard input file format in non-prenex, non-CNF form.Later, SPLAnE invokes the QSAT solver CirQit [26] or RaReQS [27] in the back-end to check the satisfiability of the generated QBFs in QPRO/QCIR format.The choice of the tool is based upon its performance: CirQit has solved the most number of problems in the non-prenex, non-CNF track of QBFEval '10 [42].RaReQS [27] is a Recursive Abstraction Refinement QBF Solver.Table 2 shows the analysis operations provided by SPLAnE .
The design of SPLAnE makes it possible to use different QSAT solvers.Also, SPLAnE can now work hand in hand with other products based on FaMa such as Betty [29], which enables the testing of feature models.SPLAnE is now available for download with its detailed documentation from the website [30].

Experimentation
In this section, we go through the different experiments executed to validate our approach.
The experiments was conducted with i) Real debian models, ii) Randomly generated models and iii) SPLOT Repository models.Each analysis operation was executed with two QSAT solver (CirQit and RaReQS) and three SAT solver (Sat4j, PicoSAT and MiniSAT).All experiments was run on a 3.2 GHz i7 processor machine with 16 GB RAM.The experimentation results are plotted in graph with log scale.Also on the graph plot, solver names are denoted as {qcir: CirQit, qrare: RaReQS, psat: PicoSAT, msat: MiniSAT}.Table 3 shows the hypothesis and the variables used when conducting this experimentation.

Experiment 1: Validating SPLAnE with feature models from the SPLOT repository
To illustrate the SPL analysis method described in the paper, we considered case studies of various sizes.Concretely, the following SPLs were used: Entry Control Product Line (ECPL), Virtual Machine Product Line (VMPL), Mobile Phone Product Line (MPPL), Tablet Product Line (TPL) and Electronic Shopping Product Line (ESPL).The TPL, MPPL, and ESPL models were taken from the SPLOT repository [43].More details of the ECPL models can be found at [28].Table 4 gives the number of features, components in each SPL model, and the execution time taken by various analysis operations on SPLAnE reasoner -CirQit.
The SPLOT repository is a common place where a practitioners store feature models for the sake of reuse and communication.The SPLOT repository contains small and medium size feature models, most of it are conceptual and few are realistic.We extracted 698 feature models from the SPLOT repository.These feature models were given as an input to extended Betty tool to generate corresponding SPL models.Usually, components models which represent the solution space of SPLs are larger in size.So, the generated component models contain The tool SPLAnE was executed with 69800 SPL models to verify the QSAT scalability when applying it to feature modeling.SPLAnE provide an option to select any one of the two QSAT reasoners (CirQit and RaReQS).Figure 5 shows the box plot, representing the QSAT behaviour with increase in cross-tree constraints for few analysis operations on real models taken from the SPLOT repository.This experiment was executed with the QSAT reasoner CirQit.The results for experiment 5.2.1 shows that for the small and medium size real models, all analysis operations does not take much execution time which motivated us to experiments with large size models.The overall results for experiment 5.2.1 point out that the null hypothesis H 0 was wrong, thus, resulting in the acceptance of the alternative hypothesis H 1 .

Hypotheses of Experiment 1 Null
Hypothesis (H 0 ) SPLAnE does not scale when coping with SPLOT model repository.

Alt. Hypothesis (H 1 )
SPLAnE does scale when coping with SPLOT model repository.

Models used as input
Feature Model for TPL, MPPL and ESPL were taken from [43].ECPL is taken from [28].VMPL is presented in current paper.SPLOT repository.The 69800 SPL models were generated from 698 SPLOT Models.

Hypotheses of Experiment 2 Null
Hypothesis (H 0 ) SPLAnE does not scale when coping with randomly generated SPL models.

Alt. Hypothesis (H 1 )
SPLAnE does scale when coping with randomly generated SPL models.

Model used as input
1000 Randomly generated SPL Models.

Hypotheses of Experiment 3 Null
Hypothesis (H 0 ) The use of SPLAnE will not result in a faster executions of operations than SAT-based techniques in front of a real very-large SPL models.

Alt. Hypothesis (H 1 )
The use of SPLAnE will result in a faster executions of operations than SAT-based techniques in front of a real very-large SPL models.

Model used as input
We used as input the debian variability model extracted from [44] that you can find at [30] Hypotheses of Experiment 4 Null Hypothesis (H 0 ) The use of SPLAnE will not result in a faster executions of operations than SAT-based techniques in front of randomly generated SPL models.

Alt. Hypothesis (H 1 )
The use of SPLAnE will result in a faster executions of operations than SAT-based techniques in front of randomly generated SPL models.

Model used as input
We used as input random models varying from ten features to twenty thousand features.

Hypotheses of Experiment 5 Null
Hypothesis (H 0 ) The QSAT based reasoning technique is not faster as compare to SAT based technique for operations like completeness and soundness.

Alt. Hypothesis (H 1 )
The QSAT based reasoning technique is faster as compare to SAT based technique for operations like completeness and soundness.

Model used as input
We used as input random models varying from ten features to twenty thousand features and SPLOT repository models.our approach with all analysis operations. Figure 7 shows the box plot for randomly generated large 653 size SPL models.For data clarity we plotted only eight analysis operations for all models (except 654 models with 50 features ) and {10, 20, 30, 40, 50} cross-tree levels of constraints.The experiment 5.2.2

655
was executed with SPLAnE reasoner -RaReQS.The plot clearly shows the QSAT approach is more 656 scalable even with 80000 variables in a SPL model with maximum 50% constraints.The Figure 7 657 show that the execution time for all analysis operation grows with the increase in number of features.Figure 6 shows the graph plot for the same results, which help to clearly distinguish the behavior of each analysis operations against the CTC levels.From the graphs we observed that, the number of features in a model has more impact on the execution time than the different levels of CTC.The levels of CTC has very less impact on the execution time.The operation soundness take more time as it check for all implementations there exist a specification and the number of components are three times more than the number of features.The operation completeness take less time as compared to soundness, but take more time compared to all remaining operations.In completeness we check for all specifications there exist an implementation, here the number of features are less compared to the number of components.So the completeness requires less execution time compare to soundness.The results for experiments 5.2.2 shows that, SPLAnE can scale upto 80000 variables size models and this rule out the hypothesis H 0 with no option to accept the alternative hypothesis H 1 .

Experiment 3: Comparing SPLAnE and FaMa approach in front of real and large debian models
This experiment checks the behavior of SPLAnE reasoners (CirQit and RaReQS) and FaMa reasoners (Sat4j, PicoSAT and MiniSAT) on real and large debian models with the analysis operation presented in the paper.We used the feature model extracted from Debian distributions [44].This model encodes the variability present in the Ubuntu 10.04 distribution packaging system.We used four initial models containing the data from the repositories: main (7065 features), restricted (7098 features), multiverse (8122 features) and universe (26338 features).To generate the SPL model from this real feature model, we used the same actual models as component models and linked each feature with component by naming with require relationships doubling than the number of variables within the model.Consider, the universe debian model with 26338 features then its corresponding SPL model will contain 52676 variables.q q q q q q q q q q COMMON COMPLETENESS COVERS  Figure 8 shows the performance of SPLAnE reasoners (CirQit and RaReQS) and FaMa reasoners (Sat4j, PicoSAT and MiniSAT) against the proposed analysis operations.We see that both approaches scale for all operations in first three debian models except the completeness and soundness where QSAT is clearly more efficient.For completeness and soundness operations, FaMa reasoners was not able to solve even a single instance of the debian models.For the fourth model i.e. universe debian model, FaMa reasoners was not able to solve any of the analysis operations.Whereas QSAT reasoners against completeness and soundness operations, was able to solve first three debian models (main, restricted, multiverse ), but was not able to solve the huge universe debian model (26338 features) in the given timeout ( 2 hours ).Overall, for the operations where both approaches scale up, QSAT is faster than SAT.This experiments clearly accepts the hypothesis H 1 .

Experiment 4: Comparing SPLAnE and FaMa scalability in front of randomly generated large size models
In this experiment, we checked the behavior of SPLAnE reasoners (CirQit and RaReQS) and FaMa reasoners (Sat4j, PicoSAT and MiniSAT) on randomly generated SPL models taken from experiments 5.2.2. Figure 9 shows the scalability of SPLAnE reasoners and FaMa reasoners against randomly generated models.The results are only shown for large models from 1000 features to 20000 features with 50% cross-tree constraints.Here, the feature model with 10000 features means its corresponding SPL model contains 40000 variables with 50% CTC.The results clearly shows that, the SAT reasoners are not able to solve any analysis operations after getting a models of size 10000 features or more.For completeness and soundness operations, SAT reasoners was not able to solve any SPL models after 1000 features.The QSAT reasoners were able to solve all analysis operations on the random SPL models.The results deny the hypothesis H 0 with an option left to accept the hypothesis H 1 .

Experiment 5: Comparing SPLAnE with FaMa based reasoning techniques
The tool SPLAnE improves the performance with the set of models obtained from SPLOT and random SPL models.In this experiment, we are comparing QSAT based technique with SAT based techniques over the analysis operations.From the SPLOT models used in the experiment 5.2.1 we took those marked as realistic.FaMa supports analysis operations expressed using propositional formulae.We acknowledge that there are analysis operations such as completeness and soundness that cannot be expressed using propositional formulae.So, for comparing QSAT vs SAT reasoning, such operations where written in the FaMa tool suite with loop statements (for or while) for traversing the whole set of solutions.Here, the loop allows us to express such operations (completeness, soundness, etc) to its equivalent QSAT formula but note that, the complete operations cannot be expressed using standalone propositional formula.Later, we executed the analysis operations with SPLAnE reasoner -CirQit and the FaMa reasoner -Sat4j.
Figure 10 shows the results for QSAT vs SAT based reasoning for few analysis operations on real models taken from the SPLOT repository.QSAT defeat SAT encoded formulae for every analysis operations.The execution of all models can be found at www.cse.iitb.ac.in\~splane as well as the scripts used to generate this data.The first noticeable results are that SPLAnE overtakes all executions of all operations when comparing to the standard FaMa version (which is using Sat4j as a solver).
Moreover, we see improvements of more than 70% -note the log scalewhen talking about the soundness operation.Therefore, after trying to refute the null hypothesis H 0 ( for experiment 5.2.5) with no luck, we have to accept the alternative hypothesis H 1 which states that SPLAnE is faster and scalable than previously standard SAT-based techniques.

Threats to Validity
Even though the experiments presented in this paper provide evidence that the solution proposed is valid, there are some conditions that may affect their validity.In this section, we discuss the different threats to validity that affect the evaluation.
External Validity.The inputs used for the experiments presented in this paper try to mimic realistic feature models.However, SPLOT models are not necessarily realistic.To ease off this threat we decide to used feature models based on the Debian repository.Also, the random feature models may not reflect the same structure as other realistic models.The major threats to the external validity are: • Population validity, the models may not be realistic.To reduce these threats, we generated the models as in [46] and implemented in the Betty tool [45].We also, used model coming out the Debian repositories to provide more realistic topologies.
• Ecological validity: While external validity, in general comes with the generalization of the results to other contexts (e.g., using other models), the ecological validity faces the threats affecting the experiment materials and tools.To prevent the threats of third party threads running on the machines, SPLAnE analyses were executed 10 times and then averaged.

Internal Validity
The CPU capabilities required when analyzing an SPL model depend on the number of features, components and percentage of cross-tree constraints.However, there might be some variables affecting the performance, such as the topology, so we generated 10 different topologies for each SPL model.

Construct Validity
The results look promising in terms of time required to solve problems related to the feature model.However, we can not grant its validity with models more than 20000 features.

Related work
Automated Analysis of Feature Models The automated analysis of feature models has been around for more than 20 years [6].Up to 30 different analysis operations have been presented.However, there is a lack of support for implementation assets and their relations with the variability management.In this paper, we extend the variability management analysis with the automated analysis of feature models among the implementation of the different features.White et.al. [47] presented an approach to automate the configuration in SPLs by transforming feature model and configurations in constraint satisfaction problems (CSP).The CSP is used to diagnose errors in the selected features.In the case of invalid configuration, it repairs the selected features.Authors also verified their approach on the feature models in the rang of 100 to 5000 features.Bagheri et.al. shows the approach to construct feature models with its constraints using a propositional formulae [48].
They also explained the formalism to configure a semi-automated feature model.Soltani et.al. gives the configurations process based on artificial intelligence planning technique to derive product from a feature model automatically according to the stakeholders requirements, were the stakeholders may have diverse business and limited resources [49].

Traceability in SPLs.
While there is a fairly large body of work in the literature on different facets of SPL, in the following we mention only those which address traceability as a primary aspect.
Four important characteristics of a variability model, namely, consistency, visualization, scalability and traceability are defined in [9].A variability management model that focuses on the traceability aspect of the notion of problem and solution spaces is presented in [2].Anquetil et al. [8] formalize the traceability relations across the problem and solution space and also across domain and product engineering.In [12], the notion of product maps is defined which is a matrix giving the relation between features and products.Consistency analysis of product maps is presented in [13].Zhu et al. [15] define a traceability relation from requirements to features and also from features to architectures, with consistency analysis.[14] presents a method to identify the traceability between feature model and architecture model.The Czarnecki's work [3], [10], [11] on giving semantics to features in feature models by mapping them to other models has been found useful at the requirements level.However, none of the works mentioned above present the formal approach for analyses operations, nor it address the role of traceability in the implementability aspect of SPLs.Implementation derivation.Borba et al. [50] build on the idea of automatic generation of products from assets by relying on feature diagrams and configuration knowledge (CK) [3].A CK relates features to assets specifying that assets implement possible feature combinations.The [50] lays theoretical foundations on refining and evolving SPLs.The notion of traceability in [50] is general; however, unlike [50], the focus of our paper is on the implementability of SPLs.
Template-based traceability.In [10] and [11], the authors propose a template-based approach for mapping feature models to annotated models expressed in UML or a domain-specific modeling language.Based on a particular configuration of features, an instance of the template is created by evaluating presence conditions in the model.The [11] gives a verification procedure which establishes that no ill-formed template instances will be produced given a correct configuration of the feature model.The procedure takes a feature model and an annotated template, which is an instance of a class model (like UML) and a set of OCL rules.The rules are written with respect to the class model, and each OCL constraint is an invariant on some class c.The final verification is done by checking the validity of a propositional formula.Our notion of traceability is more general than instantiating a template based on the presence of a set of features; moreover, our analysis operations require an encoding into QSAT and we have experimental evidence to suggest that the QSAT encoding performs well over SAT-based procedures (Figure 10).
Variability Management.The paper that is the closest to our work is that by Metzger et al. [20] and deserves a detailed comparison.In this paper, PL variability refers to the variations in the features of the system and software variability refers to the variations among the software system artifacts.In our paper, we follow a different terminology to bring out the product line hierarchy clearly (shown in Figure 11): a scope consists of all the features, a variant specification (referred to as just "specification") is a subset of features, product line specification (PL specification) is a set of variant specifications.On the other hand, core assets comprise all the components, a variant implementation (referred to as just  The traceability among PL and software variability is represented in Metzger et al. using X-links.
One type of X-links is of the form f ⇔ V 1 ∨V 2 ∨ ∨ . . .V n which says a feature f is present iff at least one of the variations V i is present in the software variability.However, it cannot capture the fact that a feature may be implemented by different sets of software artifacts which may require constraints of The definition of traceability in our paper captures the above-mentioned class of constraints and is used to define a reasonable notion of a relation between implementations and specifications.
Marcilio et al. presented the experimental results to prove the SAT-based approach to analyze a variability models like feature model is easy [51].Authors have used feature models with maximum of 10000 feature.To increase the hardness, feature models where added with 10%, 20% and 30% of cross-tree constraints (CTC).They found that realistic feature models are not difficult for SAT solvers.
Steven et al. present the heuristic to generate a feature model from an existing system using reverse engineering [52].They used three real system as Linux, eCos and FreeBSD.The Linux has a variability model represented by Kconfig Language and eCos has Component Definition Language, where as FreeBSD has a list of features.The approach was able to successfully generate the feature model from Linux, eCos and FreeBSD systems.
Mikolás et al. presents a meta model which is mechanically formalized from the feature models in the literature [53].This meta model is used for feature modeling and reasoning about it.Larger size SPLs development involves manipulating many parts in FMs.The paper [54] propose an compositional approach to develop complex SPLs with the help of complimentary operators like aggregate, merge and slice.Along with reasoning, the paper present methods for correction of anomalies, update and extraction and reconciliation of FMs.
The SAT-based definition of products in Metzger et al. allows causally unrelated components and features as products of the SPL.At other times, it is too restrictive in that it does not allow additional components in an implementation which do not provide any feature, but are forced to be with other components because of, say, packaging restrictions.It seems necessary to strike the right balance between the strictness of X-links and the general propositional constraints for a reasonable definition of implementability.This is provided by the definition of the relation Covers in our paper.
Metzger et al. propose a number of analysis problems; in the terminology of that paper, they are realizability, internal competition, usefulness, flexibility and common and dead elements.We have redefined these in our paper from the perspective of the new implements relation.Moreover, we have described some new and useful SPL analysis problems (superfluous, redundancy, critical component, extraneous features).In Metzger et al., it was noted that the satisfiability-based formulation needed to enumerate and check all the implementations and specifications in order to solve certain analysis problems.
Hence, the cumulative complexity of satisfiability checking may be prohibitive for large SPLs.The QSAT based formulation proposed in our paper obviates this problem and gives efficient solution methods scalable to large, real-life case studies.Figure 10 gives a comparison of SAT and QSAT approaches for the analysis operation soundness and completeness.The time complexity shown in the figure shows the superiority of the QSAT approach over SAT-based approaches for some analysis problems.On a bigger case study (ESPL in Section 5), which had 290 features and an equal number of components, the SAT-based approach failed to solve any of the analysis problems.

Future Work
In future work we plan to focus on the following extension aspects of this paper: More solvers Currently, we have implemented SPLANE analysis operations using a reduced number of QSAT and SAT solvers.In the future we plan to add some SMT solvers to this list and proceed with comparative study detecting the goods and bads of each approach.

Granularity
In this paper we have considered that the traceability relation exists at the level of features and components.However, A traceability relation can be extended to map a feature with a part of components or a component can be decompose into sub-component to perform a granular mapping or multi-level mapping.

Logic paradigms
We have focused on SAT solving techniques, however, there are some other approaches such as BDD that are appealing for the same usage.In the future, we plan to do a comparison between a QSAT approach presented in paper and quantification over BDD with the implementation across all the analysis operation.
Experimentation In this paper, we have evaluated our approach in a diverse set of scenarios however, we focused in examples containing only 1:m relationships.In the future work we plan to extend the experimentation to n:m relationships to see if this has implications in the scalability of our solution.

Conclusion
In this paper, we stress the need to jointly analyze the specification and the implementation of SPLs.Thus, we have started from a formal definition of the notion of traceability and a set theoretical based framework.We imported existing analyses and propose new analyses such as superfluousness, explicitness, redundant, union, intersection, valid model, void product model, complete traceability, etc.The analysis problems have been translated into Quantified Boolean Formula and solved efficiently using a QBF solver.The approach is supported by a software tool called SPLAnE and integrated with the existing FaMa framework.We conducted a detailed experimentation with SPLAnE on i) Large debian models ii) Randomly generated models and iii) SPLOT models.We executed all analysis operations with 5 solvers i.e 2 QSAT solvers (CirQit and RaReQS) and 3 SAT solvers ( Sat4j, PicoSAT and MiniSAT ).Further, we experimented SPLAnE for scalability.The experiments are also conducted on QSAT approach vs SAT approach.For scalability, we took the extended Betty tool and generated a random set of SPL models ranging from 5 to 50 percent of cross-tree constraints, with 10 different topology and from 10 feature to 20000 feature.The scalability result shows that tool SPLAnE was able to analyze such huge SPL models.The comparison between SAT vs QSAT results clearly shows that our approach improved the performance by 70% over SAT-based approach for the analysis operation like soundness and completeness.

Definition 4 (
Given C ∈ C and F ∈ F , Covers(C, F) if F ⊆ Provided_by(C) and Provided_by(C) ∈ F .The additional condition (Provided_by(C) ∈ F ) is added to address a tricky issue introduced by the Covers definition.Suppose the scope F consisted of only two specifications { f 1 } and { f 2 }.Let's say that the two variants ( f 1 and f 2 ) are mutually exclusive features.The implementation C = {c 1 , c 2 } implements the feature f 1 , assuming T ( f 1 ) = {{c 1 }} and T ( f 2 ) = {{c 2 }}.Without the provision, we would have Covers(C, { f 1 }).However, since Provided_by(C) = { f 1 , f 2 }, it actually implements both the features together, thus violating the requirement of mutual exclusion.In the VMPL, the implementation C 6 covers the specifications F 1 , F 2 , F 3 and F 4 .The set of products of the SPL is now defined as the specifications and the implementations covering them through the traceability relation.SPL Products).Given an SPL Ψ = F , C, T , the products of the SPL, denoted by a function Prod(Ψ) which generate a set of all specification-implementation pairs F, C where Covers(C, F).
feature and component x is encoded as a propositional variable p x .Given an implementation C, C denotes the formula c i ∈C p c i , and C denotes a bitvector where C[i] = 1 (TRUE) if c i ∈ C and 0 (FALSE) otherwise.Similarly, for a specification F, we have F and F.

Figure 5 .
Figure 5.Impact on QSAT scalability on Real SPLOT models with the increment in CTC levels.

Figure 6 .
Figure 6.QSAT scalability on large random SPL models with the increment in CTC levels.
Time in milliseconds (log10 scale)

Figure 7 .
Figure 7.Boxplot for QSAT scalability on large random SPL models with the increment in CTC levels.

Figure 8 .Figure 9 .Figure 10 .
Figure 8. SPLAnE required time vs FaMa required time in front of real and large Debian based feature models.

"
implementation") is a subset of components, product line implementation (PL implementation) is a set of variant implementations.The PL variability of Metzger et al. is analogous to PL specifications and software variability is analogous to PL implementations.In Metzger et al., PL variability is represented as OVM (Orthogonal Variability Model) and software variability is represented as FD (Feature Diagrams).In our paper, we give set-theoretic semantics to SPLs in lieu of the visually appealing notations such as FD, VFD and OVM.The advantage is that in these semantics the core concepts, analysis problems, and the solution methods can be expressed in a clearer and more concise manner.
the form f ⇔ (c 11 ∧ c 12 ∧ c 13 )∨(c 21 ∧ c 22 . . .)∨ . . . .The other type of traceability constraints suggested in Metzger et al. is simple propositional formulae.However, not all propositional constraints provide the intuitive and strong implementability relations between the implementations and specifications.

c
2016 by the authors.Submitted to Entropy for possible open access publication under the terms and conditions 1007 of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/)1008

Table 1 .
Traceability Relation for Virtual Machine.
{ C 1 : {LinuxCore, A feature is implemented by a set of components C, denoted implements(C, f ), if C includes a non-empty subset of components C such that C ∈ T ( f ).It is obvious from the definition that if T ( f ) = ∅, then f is not implemented by any set of components.In VMPL, f 5 is implemented by implementations C 3 , C 5 , C 6 and C 7 , but not by implementations C 1 , C 2 and C 4 .
feature model cannot have any specification because of excludes relation and such SPL model is not a valid model.A SPL models should be validated before analyzing any operations over it.Large and complex SPLs undergoes continuous modification, such SPLs has to be verified for its validity after every modification.In case of VMPLs, after adding new features, components and cross-tree constraints, a validity of model should be tested.Is the virtual machine feature model and component model valid?, such questions must be verified before further analysis of VMPL.If all the features have a traceability relation with the components which implement them, such a traceability relation is called as complete traceability relation.If there exists a feature which does not have a traceability relation with any components, then such a traceability relation is called an incomplete traceability relation.When a SPLs are under development, all the features many not have its corresponding components developed.The operation complete traceability relation help us to identify such features and proceed for its components development.The preliminary properties valid model and complete traceability relation should hold before analyzing any other properties.Letus assume an SPL model F , C, T which is a valid model but none of the implementations C i cover any of the specification F j .Such a model is called void product model i.e. the model is not able to return a single product.In SPL model, it may happen that a feature model is valid, a component model is valid and a traceability is also complete, but the SPL model is not able to generate a single product.This is possible if no specification covers any of the implementation.A question like, Do a Virtual Machine Product Line can generate at least one virtual machine ? is very important to conduct further analysis of a product line.
least one virtual machine provided by VMPL?Do the component c is not present in any of the virtual machines provided by VMPL?The common property find all the common elements (features or components) across all the products.This operation is required to create a common platform for aSPL.Do the component c is present in all the virtual machines provided by VMPL?There may be certain implementations that are useful but the implementable specifications are not affected if these implementations are dropped from the PL implementation.These implementations are called superfluous.Formally, an implementation C ∈ C is superfluous if for all F ∈ F such that Covers(C, F), there is a different implementation D ∈ C such that Covers(D, F).Superfluousness is relative to a given PL implementation.If in an SPL Ψ, F = {{ f }}, C = {{a}, {b}} and T ( f ) = {{a}, {b}}, then both the implementations {a} and {b} are superfluous w.r.t.Ψ, whereas if either {a} or {b} is removed from the PL implementation, the remaining implementation ({b} or {a}) is not superfluous anymore (w.r.t. the reduced SPL).The feature Java in VMPL, can be implemented by component OpenJDK or OracleJDK.Such traceability results in many superfluous implementations .Superfluousness for a specification guarantees the presence of alternate implementations.
An element e is dead if for all F, C ∈ Prod(Ψ), e ∈ F ∪ C. Now a days with the advance in technology, business changes it requirements so quickly that, exiting products in market get replaced by another advance products in a very short time span.As the SPLs evolves, new cross-tree constraints get added or removed, this results in change of products.Due to such modification, few features or components in SPL may become live or dead.Do the component c is present in at Which are the components in the virtual machines that can be removed without impacting the user specification ?A component is redundant if it does not contribute to any feature in any implementation in the SPL.A component c ∈ C is redundant if for every C ∈ C, we have Provided_by(C) = Provided_by(C \ {c})).An SPL can be optimized by removing the redundant components without affecting the set of products.Redundant elements may not be dead.Due to the packaging, redundant elements can be part of useful implementations of the SPL and hence be live.Do the component c is required for any of the features in a user specification?A component c is critical for a feature f in the SPL scope F , when the component must be present in an implementation that implements the feature f : for all implementations C ∈ C, (c ∈ C =⇒ ¬implements(C, f )).This definition can be extended to specifications as well: a component c is critical for a specification F, if for all implementations C ∈ C, (c ∈ C =⇒ ¬Covers(C, F)).A virtual machine may contain components which may not be required for any of the features in a user specifications, but it may remain due to packaging.Such components are redundant but not critical.

Table 2 .
Properties and Formulae Properties Formula Valid Model ∃p f 1 . . .p f m ∃p c 1 . . .p c n [CON I 3 times more components then the number of features present in the corresponding feature model.SPLAnE generated the random traceability relation between feature model and component model to generate a complete SPL model.Further, to increase the complexity of experiments, each SPL model is generated using 10 different topologies and 10 different level of cross-tree constraints with percentage as {5, 10, 15, 20, 25, 30, 35, 40, 45, 50}, resulting in a total of 100 SPL models per SPLOT model.So from698 SPLOT models, we got 69800 SPL models.The percentage of cross-tree constraint is defined by the percentage of constraints over the number of features.Basically it is the number of constraints depending on the number of features.For example, if we specify a 50% percentage over a model with 10 features, then we have 5 cross-tree constraints.

Table 3 .
Hypotheses and design of experiments.

Table 4 .
Time Complexity for Properties and Formulae with SPLAnE reasoner -CirQit [46]n this experiment we have compared scalability of SPLAnE and FaMa based analysis 639 techniques over a large size randomly generated SPL models.The Betty toolsuite[45]is used to 640 generate random feature models relying on the approach of Thüm et al.[46].SPLAnE extended 641 the Betty toolsuite to generate a set of random SPL models.Those models were generated 642 for a number of features ranging from ten features to twenty thousand features.Concretely,