This section is divided into two parts: the definition of community density
D, reasoning
D solves the modularity resolution limitation to a certain extent. We want to define community density
D to measure the quality of community division is mainly due to the problems left by the history of modularity. Take the latest methods as examples. Chang et al. [
30] proposed modularity
suitable for bipartite network overlapping community structure, but it is still limited by modularity resolution. Li [
37] proposed Bipartite Partition Density, which can effectively alleviate the modularity resolution limitation and considers the community as a whole and cannot be used for overlapping communities. Although these methods can detect community structure, they lack to consider community structure from a micro perspective, such as the relationship between node pairs within the community. Based on this, from the micro point of view, this paper considers the relationship between the node pairs within the community and puts forward an evaluation standard suitable for the bipartite network overlapping community structure, which can effectively alleviate the modularity resolution limitation.
3.1. Key Definitions
Given a bipartite network
with no weight, no direction and the edges exist only between nodes of different types.
:
U is a one node typeset,
:
V is a another node typeset.
a represents the number of nodes of type
U, and
b represents the number of nodes of type
V.
E represents the set of edges connected by two types of nodes, and
represents a pair of nodes connected by edge
. An example diagram of a bipartite network is shown in
Figure 2. There are five U nodes and six V nodes, with a total of thirteen edges.
In a bipartite network
,
represents the set of all node pairs containing node
u,
represents the number of node pairs
u in
U, and then
is the total number of node pairs in the set. Likewise,
represents the set of all node pairs containing node
v, we let
be the number of node pairs
v in
V, and then
is the total number of node pairs in the set. As shown in
Figure 3: For node
,
; for node
,
. We express these as:
In a bipartite network
,
represents a set of neighbor node pairs of node pairs (u,v).
In
Figure 2, select a node pair
, and then its neighbor node pairs set is
.
Figure 4 shows a set of neighbor node pairs of node pair
.
In a bipartite network
, assume that the bipartite network
G is divided into
X communities, where the
t-th community is expressed as
, and
represents whether the node pair
is a node pair within the community.
represents the node pair density of node pairs
in community
; in
,
represents the total number of all V nodes connected to m, k represents the serial number of node
connected to m, and, when k = 1, it means that m is connected to V node
. For node pair
, if they are edges
within the
t-th community, then
. We express these as:
Suppose we use the bipartite network shown in
Figure 2 for community detection, and the results are shown in
Figure 1. There are two communities in
Figure 2. In community 1, select node pair
to calculate
. For node pair
,
,
,
,
. The calculation formula of
is shown in Formula (
9).
In the
t-th community
, by adding and averaging the node pair density
of node pairs
, the density
of the
t-th community can be shown as Formula (
10). Among them,
represents the sum of all U nodes in the
t-th community,
represents the sum of all V nodes in the
t-th community,
represents the sum of node pair density of all node pairs in the
t-th community,
represents the sum of all node pairs in the
t-th community.
In community 1,
,
, the calculation formula of
is shown in Formula (
11).
Likewise, we can obtain
. After obtaining the density of a certain community
, the density of the whole bipartite network community
D is the average density of all communities. X represents the number of communities in the bipartite network
.
For the community detection shown in
Figure 1,
.
When we use community density D as an evaluation criterion of community detection, the higher the community density is, the closer the node pairs within the community are, and the sparser the node pairs between the communities are. This trend is very consistent with the community structure required by community detection. Therefore, community density can be used to determine whether community detection results are good or bad.
3.2. Reasoning Community Density Alleviates the Modularity Resolution Limitation
The root cause of the problem of modularity resolution limitation [
18] is that modularity cannot effectively identify the results of community division. This problem has existed for quite a long time, but there has been no suitable method to completely solve this problem. The community density proposed in this paper can effectively solve the modularity resolution limitation in some cases. Now, let us look at an example.
There is a bipartite network G(U,V,E). This bipartite network is a complete graph, in which there are
nodes of type
u, and
nodes of type
v,
,
,
. Then, this bipartite network has
vertices and
edges. This type of picture is called SCBG (Special Complete Bipartite Graph).
The community detection of this graph yielded the following two results:
R1: Divide the result into a community G.
R2: Evenly divided into two communities . , , , . For community , , , the total number of edges in community is . Among them, the number of edges in the community accounts for half of the total number of edges in the community , and the number of edges outside the community accounts for the other half. Likewise, for community , , , owns half of node U and node V, and owns the other half of node U and V, and the total number of edges in community is . Among them, the number of edges in the community accounts for half of the total number of edges in the community , and the number of edges outside the community accounts for the other half.
An example of the above situation is given in
Figure 5, where
and
.
Figure 5a is divided according to R1, and
Figure 5b is divided according to R2.
Barber [
16] proposed the modularity
suitable for bipartite networks. Assuming that a bipartite network can be expressed as
, the formula for the modularity
of the bipartite network is as Formula (
15):
where
M represents the number of edges in the bipartite network,
C represents the set of community in the bipartite network, c means the c-th community
, and
m and
n represent the number of two types of nodes in the bipartite network, respectively.
represents the membership degree of node
i to community
c, and if node
i belongs to community
c, then
c is 1, otherwise
c is 0.
represents the degree of node i.
represents the adjacency matrix of bipartite network, if there is an edge between
i and
j,
, else
.
For the two kinds of community partition results,
and
, the modularity is calculated, respectively. For the community structure in
Figure 5a, the calculation of modularity
is shown in Formula (
16). For the community structure in
Figure 5b, the calculation of modularity
is shown in Formula (
17).
As Equations (
16) and (
17) show, the modularity of the two community detection results calculated by Barber’s modularity is 0. However,
and
have different community detection but
. In
, divide the network into a community. Logically, the best partition is the one where all the nodes are put in a single cluster. So, there are no cuts, no edges between clusters. So, the modularity proposed by Barber cannot effectively judge the merits and demerits of the results.
Let us use the community density proposed in this paper to calculate the two partition results. represents the community density divided according to R1, and represents the community density divided according to R2.
For R1, any node pair
in community
G, all node pairs containing
G are within the community, so the calculation formula of node pair
density
is as follows (
18).
For R2, in community
, for any node pair
, half of any node pair containing
is in the community, and the other half is outside the community. Similarly, the situation in community
is exactly the same as in
.
After calculation, . Therefore, community density can be used to evaluate the merits and demarcations of such results. Effectively alleviate the resolution limit of modularity.