#### 4.2.1. RQ1: Does Q Satisfy Weyuker’s Nine Properties?

Weyuker’s nine properties are the widely used and most famous criteria to evaluate the efficiency and robustness of any software complexity metric [

14]. It is a theoretical framework that is designed to check whether a metric is qualified as an effective metric. In this section, we also validate our

Q using Weyuker’s nine properties property-by-property. In the following paragraphs,

M denotes any software complexity metric. In this work,

M refers to

Q.

**Property** **1** (Non-coarseness)

**.** $(\exists P)(\exists E)\left(M\right(P)\ne M(E\left)\right)$, where P and E are two different programs.

**Proof.** Two different Java software systems P and E usually have different feature sets and coupling sets (couplings between features). Furthermore, their class structures may also be different. Thus, we can assume that the FCNs built from the two software systems may be different, which results in different Q values for the two software systems. Therefore, Q does adhere to Property 1. □

**Property** **2** (Granularity)

**.** Let c be a non-negative number; then, there are only finitely many programs P with$M\left(P\right)=c$.

**Proof.** Since the universe of discourse deals with a finite number of applications. Thus, there are only a finite amount of software systems with the same FCNs and class structures which satisfy $Q=c$. Therefore, Q does adhere to Property 2. □

**Property** **3** (Non-uniqueness)

**.** There are two different programs P and E such that$M\left(P\right)=M\left(E\right)$.

**Proof.** A large number of software systems have been developed and deployed. It is a reasonable assumption that there might exist two software systems with a same FCN and class structure. Therefore, Q does adhere to Property 3. □

**Property** **4** (Design Details are Important)

**.** $(\exists P)(\exists E)(P\equiv E\&M(P)\ne M(E\left)\right)$.

**Proof.** There are many function-equivalent software systems with different FCNs and class structures. Thus, their Q values are different. For example, in Table 3, we list two different versions of software with the same set of functionalities. Obviously, the two different versions have different Q values. Therefore, Q does adhere to Property 4. □

**Property** **5** (Monotonicity)

**.** $(\exists P)(\exists E)\left(M\right(P)\le M(P+E)\&M(E)\le M(P+E\left)\right)$.

**Proof.** This property is originally proposed to check size-related metrics. Our Q metric is not a size-related metric. Therefore, Property 5 is not applicable to evaluate our Q metric. □

**Property** **6** (Non-equivalence of Interaction)

**.** $(\exists P)(\exists E)(\exists R)\left(M\right(P)=M(E\left)\right)\&\left(M\right(P+R)=M(E+R\left)\right)$.

**Proof.** P and E are two different software systems satisfying $M\left(P\right)=M\left(E\right)$. R is another software program that can be correctly combined with P and E. Though the combination of P and R may produce a different FCN when compared with the combination of E and R, their Q values may be same. For example, suppose that R is a very simple software only with one method defined in one class. The combination of P and R only adds one isolated node to the FCN of P, which will not affect the Q value of P. Similarly, the combination of E and R also only adds one isolated node to the FCN of E, which will not affect the Q value of E. Thus, $M\left(P\right)=M\left(E\right)$. Therefore, Property 6 is satisfied by Q. □

**Property** **7** (Significance of Permutation)

**.** For two programs, P and E (E are formed by permuting the order of the statements of P), and it can be found such that$M\left(P\right)\ne M\left(E\right)$.

**Proof.** This property is originally proposed for procedure-oriented metrics, and does not hold for OO metrics. Therefore, Property 7 is not satisfied by Q. □

**Property** **8** (No Change on Renaming)

**.** If P is a renaming of E, then$M\left(P\right)=M\left(E\right)$.

**Proof.** As FCN and class structures are independent of the name of software, Q satisfies Property 8. □

**Property** **9** (Interaction Increases Complexity)

**.** $(\exists P)(\exists E)$$\left(M\right(P)+M(E)<M(P+E\left)\right)$.

**Proof.** Suppose P and E are two very simple software systems only with one method defined in one class; then, we can obtain their Q values being 0, i.e., $M\left(P\right)=M\left(E\right)=0$. It is a reasonable assumption that combining P and E may result in a new FCN that has edges linking the two nodes together, making $Q>0$, i.e., $M(P+E)>0$. Therefore, Q satisfies Property 9. □

To sum up, our

Q metric passes the examination of a large part (7/9) of the Weyuker’s properties, only with two exceptions, i.e., Properties 5 and 7. As mentioned above, Property 5 is not applicable to

Q since it is proposed for size-related metrics, and our

Q is not a size-related metric. Property 7 is not applicable to

Q since it is proposed for procedure-oriented metrics, and our metric is an OO metric. These exceptions have also been observed in other work [

16,

34,

35,

36]. Therefore, our

Q metric is a well-structured metric. It can be used to compute software modularity as a whole.

#### 4.2.2. RQ2: What about the Q Values Obtained in Different Software Systems?

Different software systems usually have different Q values. In this section, we performed experiments to examine the Q values obtained in different software systems.

(1) Subject Systems

We randomly chose a set of twelve Java software systems (see

Table 1) to show the

Q values obtained in different software systems. These systems are open-source and can be downloaded from their websites.

Table 1 provides the basic information of the subject software systems, including their names, the domains that they belong to, the directory of the source code distribution that we analyzed, KLOC (thousand lines of code), and the URLs to download the corresponding software system. Without loss of generality, our subject software systems differ in size from each other, with the smallest KLOC being 2.705 and the largest KLOC being 97.880. Note that the KLOC counts the practical lines of code in the software. It does not include the comment lines and blank lines.

(2) Experiment Process and Results Analysis

According to the steps shown in

Figure 1, we analyze the source code, extract the structural information, and build the FCNs for the twelve software systems. For illustration purposes, we show the FCNs for the subject software systems jmeter and jfreechart in

Figure 3. Enlarging the corresponding figure can give you the details of the figure such as the feature name that each node denotes, the edge that exists between some pairs of features, and the weight on each edge.

Table 2 shows the

$\left|N\right|$ (number of nodes),

$\left|E\right|$ (number of edges), and

Q of the FCN of the corresponding software system. Obviously, a large part (8/12) of the subject software systems have a relative small value of

Q with values

$0.2<Q<0.4$. Only four software systems have a

Q value larger than 0.4 and smaller than 0.6.

#### 4.2.3. RQ3: Can Q Tell the Software Using Design Patterns from Two Function-Equivalent Software Systems?

Using design patterns in software development is regarded as an effective way to improve software quality [

37]. However, design pattern implementations may suffer from some of the typical problems and heavily affect the software modularity [

38]. Thus, it is reasonable to assume that

Q can be used to tell the software using design pattern from two function-equivalent software systems (one uses design patterns, and the other not).

(1) Subject Systems

We chose five simple software systems (see

Table 3), each of which has two function-equivalent versions. One version (“before” for short) does not apply any design pattern, and the other (“after” for short) applies one design pattern. The design pattern each software used is the same as the name of the software.

Table 3 provides the basic information of the subject software systems, including their names, LOC (lines of code), and

$\left|N\right|$ and

$\left|E\right|$ of the FCN of the corresponding software system. Note that the LOC counts the practical lines of code in the software. It does not include the comment lines and blank lines.

(2) Experiment Process and Results Analysis

According to the steps shown in

Figure 1, we analyze the source code, extract the structural information, and build the FCNs for the five software systems. For illustration purposes, we show the FCNs for software “Builder” before and after applying the “Builder” design pattern in

Figure 4. Enlarge the figures can give you the details of the figures such as the feature name that each node denotes, the edge that exists between some pairs of features, and the weight on each edge.

Table 3 also shows the

Q values that we computed from the five systems. Obviously, the

Q value for the software using design patterns is smaller than that of software that does not use design patterns. It confirms to our assumption that design pattern implementations indeed will affect software modularity.

#### 4.2.4. RQ4: Is Q Scalable to Large Software Systems?

In practise, a software metric will be applied to software systems with different sizes. Thus, we wish to know whether

Q can be applied to larger software systems. To this aim, we track the execution time of each main step to evaluate the scalability of

Q. As mentioned in

Section 3, our approach is mainly composed of three steps:

- (i)
Extracting structural information from the source code of software systems.

- (ii)
Building FCNs for software systems.

- (iii)
Computing the software modularity according to Equation (

2).

In

Table 4, we show the CPU time that is required to execute each step of our approach when applied to subject software systems we chose in

Section 4.2.2. We can observe that step (i) is the most time-consuming step of our approach, and the other two steps take less than one second. Though jfreechart and ant are large in size with the number of features being 11,946 and 11,858, respectively, the total CUP time used to compute

Q is less than one minute. Thus, our approach can be scalable to large software systems. It is the answer to RQ4.