Measuring Software Modularity Based on Software Networks

Modularity has been regarded as one of the most important properties of a successful software design. It has significant impact on many external quality attributes such as reusability, maintainability, and understandability. Thus, proposing metrics to measure the software modularity can be very useful. Although several metrics have been proposed to characterize some modularity-related attributes, they fail to characterize software modularity as a whole. A complex network uses network models to abstract the internal structure of complex systems, providing a general way to analyze complex systems as a whole. In this paper, we introduce the complex network theory into software engineering and employ modularity, a metric widely used in the field of community detection in complex network research, to measure software modularity as a whole. First, a specific piece of software is represented by a software network, feature coupling network (FCN), where methods and attributes are nodes, couplings between methods and attributes are edges, and the weight on the edges denotes the coupling strength. Then, modularity is applied to the FCN to measure software modularity. We apply the Weyuker’s criteria which is widely used in the field of software metrics, to validate the modularity as a software metric theoretically, and also perform an empirical evaluation using open-source Java software systems to show its effectiveness as a software metric to measure software modularity.


Introduction
"High cohesion and low coupling" is one of the most important principles in object-oriented (OO) designs [1]. 'Cohesion' is the indication of the coupling within a module, while 'coupling' is the indication of the coupling between modules. When designing a piece of software, we usually strive for high cohesion (a cohesive module) and low coupling (couplings between modules should be less), which promotes the formation of a modular structure in a piece of software. Modularity has been regarded as one of the most important properties of a software design, which has significant impact on many external quality attributes such as reusability, maintainability, and understandability. By saying modularity, it usually means the notion of interdependence within modules and independence between modules [2].
We cannot control what we cannot measure [3]. Thus, to control software modularity, we need quantitative techniques to assess it. One of the effective techniques is to provide some metrics to characterize modularity-related attributes such as module coupling, cohesion and interface size [4].

•
We characterize software modularity as a whole. The existing metrics usually only characterized some modularity-related attributes-either coupling or cohesion. They cannot take both the coupling and cohesion into consideration to character software modularity. In this work, we use software networks to represent software, and apply the metric modularity in complex network research to character software modularity. Thus, we can characterize software modularity as a whole.

•
The proposed metric modularity considers the coupling strength between software elements which has been neglected by the existing metrics. Our proposed software network, FCN, is a weighted software network. The weight on the edges denotes the coupling strength between software elements. The calculation of modularity considers the weight on the edge. Thus, our metric is more reasonable since it conforms to the reality of a specific piece of software.

•
The proposed modularity is validated theoretically using widely accepted evaluation criteria, and empirically using open source Java software systems. The data set and software used to compute software modularity are available for download [15] (see in Supplementary Materials). Figure 1 shows an overview of our approach to measure software modularity using complex network theories. Our approach is mainly composed of three steps. First, we extract software structural information from the source code of a software system by static analysis. Second, we propose the FCN to represent the extracted software structural information. Finally, we apply the modularity metric widely used in complex network theories to measure software modularity as a whole. We will detail the three parts in the following subsections.

Software Information Extraction
In this work, we choose to analyze software systems coded in Java simply for our own developed analysis software, SNAP [12], now can only analyze software systems coded in Java, and Java is one of the most successful and popular programming languages.
As mentioned above, we use a software network to represent software elements and their couplings at the method and attribute level. Thus, software elements and their couplings in a specific Java software system should be extracted first. To this aim, we perform a static analysis of the source code of the software and extract structural information at the feature (In this work, if not mentioned, the term "feature" designates both the methods and attributes from here on) level, i.e., we extract methods, attributes, and their couplings. Here, we consider two types of couplings between features, i.e., "method-call" couplings and "method-access-attribute" couplings. Note that, when computing the software modularity, we should refer to the class structure of the software. Thus, we also extract class information, i.e., we extract all classes, and the methods and attributes they contain. It should also be noted that we only consider the software elements that are actually defined in the software. For those classes that are defined in the imported libraries will be ignored, since their source codes are not always available in the source code distribution of the software [12].

Feature Coupling Network
The structural information obtained in Section 3.1 will be further represented by FCN, which is defined as follows.
Definition 1 (FCN). FCN is a weighted undirected graph representing features (methods and attributes) and their couplings in a specific software system. Specifically, nodes in FCN denote the methods and attributes in the software, and edges in FCN denote the coupling between methods and attributes, i.e., "method-call" couplings and "method-access-attribute" couplings. The weight on the edge denotes the multiplicity of couplings such as method m 1 calls method m 2 three times. Note that each method or attribute is represented by only one node. Thus, FCN can be defined as FCN = (N, E, ψ), (1) where N is the node set, E is the edge set, and ψ is a symmetric matrix storing the weight on the edge between all pairs of nodes if they are linked together by an edge in FCN. Specifically, if method i couples with method j, the entries ψ ij and ψ ji (ψ ij = ψ ji ) of ψ stores the weight on the edge between method i and method j. The weight on the edge provides us a more accurate representation of the software structure at the feature level and can be obtained by simply counting their occurrences in our extracted structural information (see Section 3.1). Figure 2 gives a simple example to show the process to build FCN from Java source code. The notes beside the nodes denote the method name or attribute name that the node denotes, and the notes beside the edges denote the weight on that edge. Since method "d()" accesses attribute "a" one time, calls method "b()" one time, and calls method "c()" two times, there are three edges between "d()" and "a", "d()" and "b()", and "d()" and "c()", with the weights being 1, 1, and 2, respectively. Since method "b()" calls method "c()" one time, there is one edge between "b()" and "c()" with weight being 1. Since method "f()" calls method "c()" one time, there is one edge between "f()" and "c()" with the weight being 1. Since method "e()" calls method "f()" one time, there is one edge between "e()" and "f()" with the weight being 1.

Software Modularity
In software engineering, developers are advocated to incorporate related attributes and methods into modules, and reduce the coupling between modules. The "module" in software is very similar to the concept "community" in complex network research. In complex networks, communities are subsets of densely connected nodes such there is a higher density of edges in the community than between communities. Community structure has become one of the most important network properties that can be observed in many networked systems. Software systems also have community structures when representing software systems as software networks [11,30]. Generally, packages are the natural communities of classes and interfaces, and classes and interfaces are the natural communities of methods and attributes [30]. Thus, we can use the quality index that is used to quantify the community structure in complex networks to measure software modularity.
In complex networks, many quality indexes have been proposed to evaluate the community structure such as MQ [31], EV M [32], and modularity (Q) [33]. Arguably, Q proposed by Newman and Girvan is the most widely used and famous quality index. It is also used to measure the density of edges within communities compared with edges between communities. In this work, we also use Q to compute the software modularity. For a weighted undirected network, our modularity metric can be defined as where m is the sum of the weights on all the edges in the network, A ij is the weight on the edge between nodes i and j, k i and k j are the sum of the weights on the edges attached to nodes i and j, respectively, c i and c j are the communities that nodes i and j belong to, respectively, and δ is a simple delta function that takes 1 when c i equals c j , 0, otherwise. Obviously, we can observe from Equation (2) that δ function makes sure a coupling between two nodes from two different communities makes no contribution to Q. Two nodes linked by one edge in a community make a positive contribution to Q while two isolated nodes in a community provide a negative contribution to Q.
Generally, a higher modularity value denotes a more reasonable community structure where nodes in the community densely coupled with each other than between communities. When computing Q of FCN, we take the class or interface structure in the software as the nature community of methods and attributes, i.e., methods and attributes defined in the same class or interface belong to the same community. Then, by using Q to FCN, we can obtain the software modularity, which is a measurement of the degree of the "high cohesion and low coupling" that the software adheres to.

Pseudo-Code of the Algorithm to Compute Q
Algorithm 1 shows the pseudo-code of the algorithm that we used to compute Q, where k is an array used to store the the sum of the weights on the edges attached to node i, and getClass(i) is a function used to return the class that feature i is defined in.
Algorithm 1 Pseudo-code of the algorithm to compute Q

Evaluations
In this section, we validated our software modularity metric theoretically using the widely accept criteria, and also empirically evaluated the metric using a set of Java software systems. Our empirical experiments were carried out on a ThinkPad E420S machine with Window 7 OS, a 2.30 GHz Intel Core i5-2410M CPU, and 6 GB RAM.
In the following sections, we list the research questions that we focus on (Section 4.1), and our answers to the research questions (Section 4.2).

Research Questions
In this work, our evaluations aimed at addressing the following four research questions (RQs): • RQ1: Does Q satisfy Weyuker's nine properties? Q is a metric used to measure software modularity and also can be classified into the category of complexity metrics. Weyuker's nine properties are widely used and famous criteria to validate the usefulness of software complex metrics. We wish to know whether our Q also satisfies Weyuker's nine properties.
• RQ2: What about the Q values obtained in different software systems? Different software systems may have different Q values. For interests, we wish to examine the Q values obtained in different software systems. • RQ3: Can Q tell the software using design patterns from two function-equivalent software systems? Using design patterns in software development is regarded as an effective way to improve software quality. However, design pattern implementations may suffer from some of the typical problems and heavily affect the software modularity. As an effective metric, Q should have the ability to reflect such a degradation in software modularity. Thus, we wish to know whether our Q has the ability to tell the software using design pattern from two function-equivalent software systems (one uses design patterns, and the other not). • RQ4: Is Q scalable to large software systems? In practise, a software metric will be applied to software systems with different sizes. Thus, we wish to know whether Q can be applied to larger software systems.

Answers to Research Questions
In this section, we performed theoretical analysis and empirical experiments to answer the RQs raised in Section 4.1.
To sum up, our Q metric passes the examination of a large part (7/9) of the Weyuker's properties, only with two exceptions, i.e., Properties 5 and 7. As mentioned above, Property 5 is not applicable to Q since it is proposed for size-related metrics, and our Q is not a size-related metric. Property 7 is not applicable to Q since it is proposed for procedure-oriented metrics, and our metric is an OO metric. These exceptions have also been observed in other work [16,[34][35][36]. Therefore, our Q metric is a well-structured metric. It can be used to compute software modularity as a whole. (1) Subject Systems We randomly chose a set of twelve Java software systems (see Table 1) to show the Q values obtained in different software systems. These systems are open-source and can be downloaded from their websites. Table 1 provides the basic information of the subject software systems, including their names, the domains that they belong to, the directory of the source code distribution that we analyzed, KLOC (thousand lines of code), and the URLs to download the corresponding software system. Without loss of generality, our subject software systems differ in size from each other, with the smallest KLOC being 2.705 and the largest KLOC being 97.880. Note that the KLOC counts the practical lines of code in the software. It does not include the comment lines and blank lines.

(2) Experiment Process and Results Analysis
According to the steps shown in Figure 1, we analyze the source code, extract the structural information, and build the FCNs for the twelve software systems. For illustration purposes, we show the FCNs for the subject software systems jmeter and jfreechart in Figure 3. Enlarging the corresponding figure can give you the details of the figure such as the feature name that each node denotes, the edge that exists between some pairs of features, and the weight on each edge. Table 2 shows the |N| (number of nodes), |E| (number of edges), and Q of the FCN of the corresponding software system. Obviously, a large part (8/12) of the subject software systems have a relative small value of Q with values 0.2 < Q < 0.4. Only four software systems have a Q value larger than 0.4 and smaller than 0.6.      Using design patterns in software development is regarded as an effective way to improve software quality [37]. However, design pattern implementations may suffer from some of the typical problems and heavily affect the software modularity [38]. Thus, it is reasonable to assume that Q can be used to tell the software using design pattern from two function-equivalent software systems (one uses design patterns, and the other not).
(1) Subject Systems We chose five simple software systems (see Table 3), each of which has two function-equivalent versions. One version ("before" for short) does not apply any design pattern, and the other ("after" for short) applies one design pattern. The design pattern each software used is the same as the name of the software. Table 3 provides the basic information of the subject software systems, including their names, LOC (lines of code), and |N| and |E| of the FCN of the corresponding software system. Note that the LOC counts the practical lines of code in the software. It does not include the comment lines and blank lines.
(2) Experiment Process and Results Analysis According to the steps shown in Figure 1, we analyze the source code, extract the structural information, and build the FCNs for the five software systems. For illustration purposes, we show the FCNs for software "Builder" before and after applying the "Builder" design pattern in Figure 4. Table 3 also shows the Q values that we computed from the five systems. Obviously, the Q value for the software using design patterns is smaller than that of software that does not use design patterns. It confirms to our assumption that design pattern implementations indeed will affect software modularity. 4.2.4. RQ4: Is Q Scalable to Large Software Systems?
In practise, a software metric will be applied to software systems with different sizes. Thus, we wish to know whether Q can be applied to larger software systems. To this aim, we track the execution time of each main step to evaluate the scalability of Q. As mentioned in Section 3, our approach is mainly composed of three steps: (i) Extracting structural information from the source code of software systems. (ii) Building FCNs for software systems. (iii) Computing the software modularity according to Equation (2).
In Table 4, we show the CPU time that is required to execute each step of our approach when applied to subject software systems we chose in Section 4.2.2. We can observe that step (i) is the most time-consuming step of our approach, and the other two steps take less than one second. Though jfreechart and ant are large in size with the number of features being 11,946 and 11,858, respectively, the total CUP time used to compute Q is less than one minute. Thus, our approach can be scalable to large software systems. It is the answer to RQ4. Table 4. CPU time required for each step.

Conclusions
In this paper, we defined a novel metric, modularity (Q), to measure software modularity from the perspective of software as a whole. Our metric is based on a network representation (i.e., FCN) of the software structure at the method and attribute level, and applied the metric (i.e., Q) widely used in complex network theories to compute the value of software modularity. FCN is a weighted undirected software network, which considers the coupling frequencies between methods and attributes to assign weights on the edges.
Our metric is evaluated theoretically using widely accepted criteria, and empirically using open source software systems. The results show the effectiveness of Q as a metric to measure software modularity.
Supplementary Materials: The whole data sets generated and/or analyzed during the current study are available from the corresponding author on reasonable request. The sample data and our own developed software are available online at https://www.icloud.com/iclouddrive/0zRViWdWuiQkYTueyNbCIB49A#2018modularity.