4.4. Convergence
Convergence of the network community detection algorithms is the least studied research area of network science. However, the rate of convergence is an important issue, and a low rate of convergence is the major pitfall of most of the existing algorithms. Due to the transformation into the metric space, our algorithm is equipped with the quick convergence facility of the k-partitioning on the metric space by providing a good set of initial points. Another crucial pitfall suffered by the majority of the existing algorithms is the validation of the objective function in each iteration during convergence. Our algorithm converges automatically to the optimal partition, thus reducing the cost of validation during convergence.
Theorem 2.
During the course of the k center partitioning algorithm, the cost monotonically decreases.
Proof.
Let , denote the centers and clusters at the start of the t-th iteration of the k partitioning algorithm. The first step of the iteration assigns each data point to its closest center; therefore, .
In the second step, each cluster is re-centered at its mean; therefore, .
☐
Theorem 3.
If T is the solution returned by farthest-first traversal and is the optimal solution, then .
Proof.
The proof of the theorem can be obtained in [
48].
☐
4.5. Data Complexity
The key characteristics of complex network are “high clustering coefficient” and “small average path length”. The first property justifies the community structure of the network, whereas the second property justifies the small world phenomena of real networks. Given a network, that is given a number of nodes and a number of edges, what are the bounds of the average distance and clustering coefficient? The two properties of the optimal complex network (OCN) are (1) the minimum possible average distance and (2) the maximum possible clustering coefficient. There is usually a unique graph with the largest average clustering, which at the same time has the smallest possible average distance. In contrast, there are many graphs with the same minimum average distance, ignoring their average clustering. The objective of this work is to measure the community detectability of the complex network, , where N is the number of vertices, m is the number of edges, L is the average path length and C is the average clustering coefficient.
Average path length: . The smallest possible average distance of a graph with N vertices and m edges we denote .
Clustering coefficient: If is the degree of a vertex u and is the number of edges among its neighbors, its clustering coefficient is .
In some graphs, community detection is easy, and most of the algorithms work very well (e.g., disjoint cliques). On the other hand, in some graphs, community detection is very difficult, and some algorithms rarely work well (e.g., circular graph).
Data complexity of community detection: Informally, Given a graph with N vertices and m edges , to what extent we can reveal the community structure is the data complexity for community detection of that graph. Data complexity for community detection (DCC) is denoted as (), near zero for a graph for which is is easy to detect community and near one with no community structure. DCC is calculated as the ratio between common edges of and with m the number of edges of G or , where is a graph with the same average path length constructed by adding the minimum number of edges to an empty graph of N nodes followed by the addition of more edges to obtain the total number m by maximizing the clustering coefficient.
A higher value of DCC for a particular network signifies that we can extract a good community structure of the network; however, a lower value of DCC signifies that none of the algorithms are very useful to capture the community structure of the network. Another advantage of DCC is that it can assess the quality of an algorithm. When DCC is high and the value of the evaluation measure is low, it simply signifies that there is enough room to improve the algorithm.