Coupled Node Similarity Learning for Community Detection in Attributed Networks

Attributed networks consist of not only a network structure but also node attributes. Most existing community detection algorithms only focus on network structures and ignore node attributes, which are also important. Although some algorithms using both node attributes and network structure information have been proposed in recent years, the complex hierarchical coupling relationships within and between attributes, nodes and network structure have not been considered. Such hierarchical couplings are driving factors in community formation. This paper introduces a novel coupled node similarity (CNS) to involve and learn attribute and structure couplings and compute the similarity within and between nodes with categorical attributes in a network. CNS learns and integrates the frequency-based intra-attribute coupled similarity within an attribute, the co-occurrence-based inter-attribute coupled similarity between attributes, and coupled attribute-to-structure similarity based on the homophily property. CNS is then used to generate the weights of edges and transfer a plain graph to a weighted graph. Clustering algorithms detect community structures that are topologically well-connected and semantically coherent on the weighted graphs. Extensive experiments verify the effectiveness of CNS-based community detection algorithms on several data sets by comparing with the state-of-the-art node similarity measures, whether they involve node attribute information and hierarchical interactions, and on various levels of network structure complexity.


Introduction
Community detection is an important task in complex network analysis. So far, the definition of community is still ambiguous. In most state-of-the-art research, the concept of community is a group of nodes densely connected relatively to the rest of the network. Networks that consider both object interactions and attributes, i.e., attributed networks, can be represented by an attributed graph in which nodes represent the objects, edges represent the relationships between objects, and the feature vectors associated with nodes represent the attributes. The network topological structure reflects the interactions between nodes and the node attribute information reflects the common characteristics among nodes. They both play important roles in the formation of the network community structure. However, nowadays most community detection algorithms only use the network topological structure. Community detection on such attributed networks using both network topological structure and node attribute information is important yet challenging, and relies on appropriate similarity learning.
Community detection on attributed networks. Nowadays, many approaches have been proposed that incorporate node attributes and edges in the community detection process.
community George belongs to since he has the same relationship with Jones and Ying who separately belong to two different communities. Second, if both graph structure and node attributes are considered, which results in the diagram in Figure 1b, we still cannot cluster George to a proper community by using SMC to compute the similarity between two connected authors. Since SMC uses 0 and 1 to distinguish the similarity between categorical values, the similarity between authors who live in AU and the US is equal to that between authors who live in AU and CN. Therefore, the similarity between George and Jones is still same as the similarity between George and Ying. However, by involving the co-authoring relationships in Figure 1c, we observe that the similarity between AU and CN should be greater because authors from these two countries collaborate more frequently. Therefore, George is more similar to Ying than Jones, and we can correctly divide George to the right community.  refers to simple attribute similarity between two nodes, and represents the complex coupling relationships between two nodes.) The above three scenarios illustrate the importance of involving relevant information and relationships and learning their similarity in community detection. As shown in the limited work reported in the literature [23], engaging both structure and attribute similarities can generate more meaningful communities. However, existing methods do not consider the complex interactions within and between attributes, and between node attributes and structure. In most attributed networks, nodes prefer to connect to other nodes with similar attributes (i.e., homophily) [24]. The presence of homophily has been discovered in a vast array of network studies. More than 100 studies that have observed homophily in some form or another and they establish that similarity breeds connection [25]. The homophily property reflects the effect of node attributes on the network edges. On the other hand, the edges in the network should also reflect the difference between attributes. In this paper, we propose a novel coupled node similarity (CNS) learning method, which involves both node attributes and structure information in an attributed graph. The main idea behind CNS and its contributions to community detection are presented below: • CNS captures different levels of coupling relationships in an attributed graph, including value-to-value, value-to-node, and attribute-to-structure relationships. To the best of our knowledge, this is the first work that systematically represents the hierarchical interactions in terms of both structural and attribute aspects. • CNS learns the above respective relationships in terms of calculating and integrating the intra-attribute coupled similarity, the inter-attribute coupled similarity, and the coupled attribute-to-structure similarity. Hence, CNS captures not only the attribute value interactions within and between attributes, but also the interactions between node attributes and structure. This provides a comprehensive means of understanding the intrinsic driving forces and complexity in community formation.

•
We incorporate CNS into attributed graphs to generate weighted graphs, combining the topological structure and node attributes in a unified manner to detect communities in attributed networks.

•
We also empirically evaluate the effectiveness of CNS similarity in terms of whether node attributes are involved, what types of node interactions are learned, and different levels of network structure complexity.

Learning Coupled Node Similarity
In this section, we introduce the framework and specific similarity measures for learning coupled node similarity.

The CNS Framework
The framework for learning CNS is shown in Figure 2. CNS captures four sources of interactions and similarities: (1) the intra-attribute coupled similarity learns the interactions within a node attribute; (2) the inter-attribute coupled similarity models the interactions between node attributes; (3) the coupled attribute similarity integrates both of them; and (4) the coupled attribute-to-structure similarity captures the interactions between node attributes and network structure. Lastly, CNS integrates the coupled attribute similarity and the coupled attribute-to-structure similarity to represent the overall relationships and similarities in an attributed network.
An attributed network can be modeled as a graph G = (V, E, F), where V is the set of nodes, E is the set of edges, and F is the set of node attribute vectors. All the main notations are described in Table 1.

M
The number of node attributes (F m ) The set of all distinct values on the mth attribute F m (i) The value of the mth attribute for node i K The number of communities C The communities of the network, C = K k=1 C k c i The community to which node i belongs Γ i The neighbor set of node i A(i, j) The adjacency relationship between nodes i and j. A(i, j) = 1 if nodes i and j are connected; The weight between nodes i and j S(i, j) The similarity between nodes i and j l r (i) The received label of node i m w The sum of all edge weights in the network, The sum of edge weights which are connected to node i, The node set whose mth attribute value is x α n The weight parameter for the nth attribute, ∑ M n=1 α n = 1, α n ∈ [0, 1] δ m|n (x, y) the inter-relative attribute coupled similarities between values x and y of the mth attribute based on the nth attribute (n = m) B B = F n /B, the complement set of B under the complete distinct value set F n of the nth attribute g * n (B) the node set whose attribute value in the nth attribute is in B δ Ia m (x, y) The intra-attribute coupled similarity between the attribute values x and y of the mth attribute δ Ie m (x, y) The inter-attribute coupled similarity between the attribute values x and y of the mth attribute based on other attributes δ A m (x, y) The coupled attribute similarity between the attribute values x and y of the mth attribute δ AS m (x, y) The coupled attribute-to-structure similarity between the attribute values x and y of the mth attribute CAS(i, j) The coupled attribute similarity between nodes i and j CNS(i, j) The coupled node similarity between nodes i and j  indicates intra-attribute coupled similarity calculated using the interaction between attribute values within an attribute and ←→ refers to inter-attribute coupled similarity involved the couplings between attributes. The coupled attribute similarity in the second level integrates both of intra-attribute coupled similarity and inter-attribute coupled similarity. The coupled attribute-to-structure similarity in the second level captures the interactions between node attributes and network structure. In the last level, CNS integrates the coupled attribute similarity and the coupled attribute-to-structure similarity.)

Coupled Attribute Similarity
Coupled attribute similarity (CAS) is extended from the concept of Coupled Attribute Similarity for Object (CASO) in Wang et al. [18]. CASO is based on the coupled attribute similarity for values, by considering both the intra-coupled and inter-coupled attribute value similarities, which globally capture the attribute value frequency distribution and attribute dependency aggregation with high accuracy and relatively low complexity. CAS combines the intra-attribute coupled similarity (Defintion 1) and inter-attribute coupled similarity (Defintion 2) to cater for specific characteristics in network data.

Definition 1. (Intra-Attribute Coupled
Similarity) The intra-attribute coupled similarity δ Ia m (x, y) between node attribute values where x and y, x= F m (i) and y = F m (j) are the values of nodes i and j in the mth attribute, is calculated by considering the relationship between the frequency of their occurrence.
g m (x) and g m (y) are the node sets which have the same attribute value as nodes i and j, respectively, in the mth attribute. |g m (x)| and |g m (y)| are the occurrence times of node attribute values x and y across all nodes in the network.
In the toy example in Figure 1, for example, there are two authors from Australia {George, Pitt} and three from China {Ying, Hua, Jia}, so δ Ia country (AU, CN) = 6/11. Below, the inter-attribute coupled similarity is defined, which considers the couplings between node attributes when the node attribute value similarity is calculated.

Definition 2. (Inter-Attribute Coupled
Similarity) The inter-attribute coupled similarity δ Ie m (x, y) between values x and y of the mth attribute based on other attributes is defined as follows.
(2) α n is the weight parameter for the nth attribute, ∑ M n=1 α n = 1, α n ∈ [0, 1]. M is the total number of node attributes. δ m|n (x, y) is one of the inter-relative attribute coupled similarities between values x and y of the mth attribute based on the nth attribute (n = m).
F n represents the attribute values on the nth attribute. B is a subset of attribute values on the nth attribute. B = F n /B is the complement set of B under the complete distinct value set F n of the nth attribute. P n|m (B|x) is the information conditional probability (ICP) of B with respect to x, which is defined as follows.
g * n (B) is the node set whose attribute value in the nth attribute is in B. Intuitively, when given all the objects with the value x on mth attribute, ICP is the percentage of common objects whose values on the nth attribute fall in subset B and whose values on the mth attribute are exactly x as well.
In the toy example in Figure 1,

Definition 3. (Coupled Attribute
Similarity) The coupled attribute similarity δ A m (x, y) between values x and y of the mth attribute is the combination of the intra-attribute coupled similarity and the inter-attribute coupled similarity between x and y.
Lastly, the coupled attribute similarity (CAS) for the two nodes i and j is calculated as follows.

Coupled Attribute-to-Structure Similarity
In an attributed network, not all node attributes are equally important for community detection; even for an attribute, two different value pairs may not contribute the same. Based on the homophily property [24] of social networks, i.e., nodes to be connected with other nodes that share similar attributes, the consistency between node attributes and structure information could guide the community detection process. Therefore, the coupled attribute-to-structure similarity is proposed to measure the different contribution of different attribute value pairs. Definition 4. (Coupled Attribute-to-Structure Similarity) The coupled attribute-to-structure similarity δ AS m (x, y) between values x and y of the mth attribute is defined as the degree of consistency between the attribute value pair (x, y) and the linkage across all nodes in the network. It is equal to the number of edges between the two node sets whose attribute values are x and y, respectively, in the mth attribute divided by the total number of possible edges between them.
In the toy example in Figure 1, there are two authors from Australia and three from China, and there are three connections between the authors from these two countries, {George-Ying, Pitt-Ying, Pitt-Hua}, so δ AS country (AU, CN) = 0.5.

Coupled Node Similarity
Coupled node similarity is defined as the combination of the coupled attribute similarity and coupled attribute-to-structure similarity.

Definition 5. (Coupled Node
Similarity) The coupled node similarity CNS(i, j) between nodes i and j is calculated below: In the toy example in Figure 1, CNS(George, Ying) = 0.41, CNS(George, Jones) = 0.29, and George is more similar to Ying, so he belongs to community C 2 .

The Algorithm for Learning CNS
Algorithm 1 presents the process of learning coupled node similarity (CNS). It first calculates the coupled attribute similarity and the coupled attribute-to-structure similarity for all attribute value pairs (Lines 1-8) and then computes CNS for all nodes (Lines 9-17). The CASS function computes the coupled attribute-to-structure similarity (Lines 19-24).

Algorithm 1 Learning Coupled Node Similarity
for all value pairs x, y ∈ unique(F m ) do 3: δ Ia m (x, y) = CI AAS(x, y, m)

Complexity Analysis
CNS integrates three similarities, e.g., The intra-attribute coupled similarity, the inter-attribute similarity, and the coupled attribute-to-structure similarity. The time complexity analysis is as follows: (1) Compute intra-attribute coupled similarity: O(MR 2 |V|), where |V| is the number of nodes in the network; (2) Compute inter-attribute coupled similarity: O(M 2 R 2 2 R |V|), where R is the maximal number of values for each attribute and M is the number of node attributes; (3) Compute coupled attribute-to-structure similarity: O(MR 2 |V|). Therefore, the overall time complexity is O(M 2 R 2 2 R |V|).

Similarity-Based Community Detection
Our proposed method mainly concentrates on unweighted graphs. CNS is used to generate the edge weight (W(i, j)) where an edge exists when two nodes are linked structurally.
S(i, j) represents a similarity metric (e.g., CNS(i, j)) to be used to construct the weighted network. SLPA [26], BGLL [27], and K-medoids [28] then detect communities on the weighted networks. SLPA is an extension of LPA [29] that can analyze communities in weighted networks. It starts by giving each node a unique label and provides each node with a memory to store received labels. In every iteration, each node receives labels from its neighbors and adds the most popular label to its memory. The most popular label is that which carries the maximum weight according to nodes that send the same label. Lastly, every node chooses the maximum frequent label in its memory as its community label and nodes with the same label are assigned to one community.
l r (i) represents the received label of node i and l s (j) is the send label from node j. If l s (j) = l, then ϕ(l s (j), l) = 1, else ϕ(l s (j), l) = 0. BGLL is an iterative two-phase algorithm based on weighted modularity (WQ) optimization. In the first phase, all nodes are placed into different communities. For each node i, BGLL considers each neighbor j and evaluates the gain of WQ that would take place if i was removed from its community and placed in the community of j. Node i is then placed in the community for which this gain is maximum and positive. The second phase consists of building a new network whose nodes are now the communities found during the previous phase, and the weights of the edges between the new nodes are given by the sum of the weight of the edges between nodes in the corresponding two communities. , j), c i and c j respectively denote the community to which nodes i and j belong. If c i = c j , then ϕ(c i , c j ) = 1, else ϕ(c i , c j ) = 0.
K-medoids is a clustering algorithm related to the K-means algorithm [30]. Its inputs are the similarity matrix and the number of clusters K. In our experiments, K is set to the true number of clusters. The similarity between two connected nodes is equal to the edge weight that connects them, and the similarity of two disconnected nodes is 0. First, it selects K initial medoids randomly; clusters are then defined as the subsets of points that are similar to the respective medoids, and the objective function is defined as the similarity between a point and the corresponding medoid. The new medoids are then updated as the object of a cluster whose average similarity to all the objects in the cluster is maximal. This process is repeated until all medoids no longer change.

Experiments and Analysis
Similarity measures for comparison. This section compares CNS with several representative node similarity measures including Adjacency, Cosine, Jaccard, SMC, and CAS in terms of community detection performance. Table 2 shows the main formulas.

Similarity Formula
Adjacency

CAS Equation (6) CNS
Equation (8) Baseline methods. SLPA, BGLL, and K-medoids are used. Since SLPA and K-medoids are not stable, they are repeated 100 times and averaged for the final results. The value of parameter α n in CAS and CNS is 1/M. M is the total number of node attributes. We apply the algorithms on both synthetic and real networks to test their community detection performance.
Synthetic networks. The structure-only networks consisting of nodes, edges, and communities are generated according to the LFR benchmark networks [31], which are currently the most commonly used synthetic networks in community detection. An LFR network includes the following parameters: N is the number of nodes; avgk is the average degree of the nodes; maxk is the maximum degree of the nodes; minc is the number of nodes contained by the minimum community; maxc is the number of nodes contained by the biggest community; mu is a mixed parameter, which is the probability of nodes connected to nodes of an external community. The greater mu is, the more difficult it is to detect the community structure.
In real networks, not all node attributes are the same important for community detection. Some are critical for cluster nodes, and some are not as important or are not even relevant. Therefore, three kinds of value distributions are generated as follows. (1) Attribute 1: For each community, all of the nodes in a community are assigned the same domain value; (2) Attribute 2: All of the nodes in the network are assigned a random domain value; (3) Attribute 3: All of the nodes in each community are assigned the same domain value. Nodes in the community are selected to host the noise. The noise is a random domain value that is different from the cluster domain value. The noise level nl (the percentage of noise nodes) can be varied.
Real networks. Experiments are also conducted on three well-known real networks: the lawyer friendship network (Lazega) [32], the researcher relationship network (Research) [33], and the counselor relationship network (Consult) [33]. The detailed information of each network is shown in Table 3. Research is about a research team consisting of 77 employees in a manufacturing company. The dataset contains several attributes of each employee: location (1: Paris; 2: Frankfurt; 3: Warsaw; 4: Geneva), tenure (1: 1-12 months; 2: 13-36 months; 3: 37-60 months; 4: 61+ months), and the organizational level (1: Global Dept Manager; 2: Local Dept Manager; 3: Project Leader; 4: Researcher). Since the network is a weighted and directed network, we first convert it to an unweighted and undirected network.
Evaluation Criteria. For networks with known community structure, we use normalized mutual information (NMI) [34], F-Measure [35] and Accuracy as the evaluation criteria to compare results of different algorithms. The calculation formulas are shown as follows.
C = {C 1 , C 2 , · · · , C K } represents a community detection result generated by the evaluated algorithm, and U = {U 1 , U 2 , · · · , U R } represents the ground-truth community structure. |V| represents the number of nodes in the network. K and R are the number of communities.
P(U r , C k ) = |U r C k |/|C k |, and R(U r , C k ) = |U r C k |/|U r |.
TC represents the number of correct clustering nodes.

Detection Performance with vs. without Node Attribute Information
This section performs experiments to compare the results of three algorithms based on different similarity methods that do or do not involve node attribute information. The results are shown in Tables 4-6. Numbers in bold style means they are the biggest among six similarities.
Tables 4-6 show that community detection based on CNS achieves better NMI (e.g., maximally 35.45% improvement on the Consult data), F-Measure (e.g., maximally 12.14% improvement on the Consult data), and accuracy (e.g., maximally 15.14% improvement on the Consult data) when compared with the best result of other structure and attribute similarity measures. The results based on SMC are not always better than those based on structure similarities. This illustrates the importance of considering the complex hierarchical interactions within and between node attributes and network structure when calculating node similarity. When the similarity based solely on the node attribute is compared, CAS cannot guarantee better results than SMC. This means the interactions between node attributes and network structure play a vital role in capturing node similarity.

Effect of Differently Integrating Node Similarities
There are different ways to integrate the proposed node similarity components to form the coupled node similarity. Four combinations are used to obtain CNS: These CNSs are then fed into SLPA, BGLL, and K-medoids for community detection. Figures 3-5 show that CNS1 and CNS3 are better in most cases, e.g., CNS1 gains 42.40% improvement of NMI over CNS4 on the Lazega data, and CNS3 gains 47.41% improvement over CNS4. However, we cannot tell which works the best in all cases. Various combinations of the three types of similarities may lead to different results and sometimes the difference is significant (e.g., NMI between 54.21% and 76.02% on the Lazega data). This will be further explored in our future work.

Impact of Varying Network Structure Complexity
We generate nine LFR benchmark networks with N = 100, avgk = 5, maxk = 10, minc = 10, and maxc = 30, but mu ranging from 0.1 to 0.9 to form networks with different structure complexities. Three attributes (Attributes 1, 2 and 3) are generated for these LFR networks according to the rules of synthetic node attributes and the noise level nl = 0.3. Figure 6 reports the accuracy of the community detection results using BGLL on these networks. With the increase of mu, the level of separation between the communities decreases and the task of community detection is more difficult. Therefore, the accuracy of all methods decreases. However, CNS-based BGLL achieves better results than other similarity methods. Even when mu = 0.9, considering the complex interactions between node attributes and network structure still plays a positive role in the community detection process. BGLL based on Cosine and Jaccard similarity obtain almost the same results, with their two lines overlapping and are also the worst on all nine LFR networks. This verifies that simply considering common neighbors cannot accurately reveal their similarity.

Comparison Against Other Methods
We generate nine LFR benchmark networks with N = 5000, avgk = 5, maxk = 10, minc = 10, and maxc = 30, but mu ranging from 0.1 to 0.9 to form networks with different structure complexities. Three attributes (Attributes 1, 2 and 3) are generated for these LFR networks according to the rules of synthetic node attributes and the noise level nl = 0.3. We compare the results of CNS-based K-medoids with two other community detection algorithms on attributed networks, e.g., SA-cluster and CODICIL. The results are shown as Figure 7.   Figure 7, it is observed that the N MI of experimental results on nine different networks decreases with the increasing of parameter mu and the results of the proposed algorithm are optimal in most cases.

Conclusions
A novel coupled node similarity (CNS) measure is proposed to capture both explicit and implicit interactions between nodes using network structure and node attribute information in complex networks. Different levels of couplings in categorically attributed networks are learned, from node attribute values to nodes and between node attributes and network structure. Empirical analysis verifies the effectiveness of CNS-based community detection in beating several benchmark similarity methods, and, involving different node interactions and handling different levels of network structure complexity, highlights its strengths in terms of whether or not node attributes are involved. However, at present, our proposed method mainly concentrates on unweighted graphs. In the future, we will give some rules for the combination of the new and pre-existing weights to handle the weighted graphs. Our future work will also focus on using non-IID [36] learning on mixed attributed networks considering the coupling between different types of attributes at the attribute level.