Community Detection Fusing Graph Attention Network

: It has become a tendency to use a combination of autoencoders and graph neural networks for attribute graph clustering to solve the community detection problem. However, the existing methods do not consider the inﬂuence differences between node neighborhood information and high-order neighborhood information, and the fusion of structural and attribute features is insufﬁcient. In order to make better use of structural information and attribute information, we propose a model named community detection fusing graph attention network (CDFG). Speciﬁcally, we ﬁrstly use an autoencoder to learn attribute features. Then the graph attention network not only calculates the inﬂuence weight of the neighborhood node on the target node but also adds the high-order neighborhood information to learn the structural features. After that, the two features are initially fused by the balance parameter. The feature fusion module extracts the hidden layer representation of the graph attention layer to calculate the self-correlation matrix, which is multiplied by the node representation obtained by the preliminary fusion to achieve secondary fusion. Finally, the self-supervision mechanism makes it face the community detection task. Experiments are conducted on six real datasets. Using four evaluation metrics, the CDFG model performs better on most datasets, especially for the networks with longer average paths and diameters and smaller clustering coefﬁcients.


Introduction
Community detection is a fundamental task in complex network analysis, which aims to partition a network into multiple substructures (communities).Usually, a community is defined as a set of nodes with a different affiliation from the rest of the network [1].Community detection has been extensively studied and applied in many real-world network problems, such as recommendation [2], anomaly detection [3], and terrorist organization identification [4].Classical community detection methods usually utilize probabilistic models and statistical inference methods.These methods employ different varieties of prior knowledge to infer community structure.For example, classical community detection methods, spectral clustering [5], GN algorithm [6], etc.However, traditional community detection algorithms usually focus only on the network structure and often ignore the attributes of nodes, resulting in a lack of semantic community division.In the real world, the attributes of nodes are becoming more affluent and more prosperous, and a more reasonable solution is to consider both the relationships between nodes and semantic information.For community detection in attribute networks, a balance should be achieved between the following two properties: (1) structural closeness, i.e., nodes within a community are structurally close to each other, while nodes in different communities are not, and (2) attribute homogeneity, i.e., nodes in a community have similar attributes, while nodes in different ones are different [7].
Community detection is a typical application of graph clustering.For attributed graph clustering, capturing the network topology and utilizing the content information of nodes is a crucial problem.The method based on graph embedding obtains the node lowdimensional vector representation by learning the network topology and node content [8].On this basis, the current research focus is the application of clustering methods such as K-means to solve the problem of community detection.The autoencoder is the mainstream solution for graph embedding-based methods [9], because the autoencoder-based representation can be applied to unsupervised scenarios.Inspired by the above methods, we use an autoencoder and a graph neural network as the basic framework for attribute graph clustering.
In this paper, we propose a community detection fusing graph attention network (CDFG) model.The main contributions are: (1) we fuse the autoencoder and the graph attention network with high-order neighborhood information for the first time.(2) We design the feature fusion module.Specifically, the graph attention layer in the graph attention network [10] aggregates node feature information in the neighborhood by trainable weights and considers the different extent of influence of neighborhood nodes on the target node.Then we obtain the high-order neighborhood information of the target node by calculating the topological correlation matrix.The correlation between nodes is fully utilized.The feature fusion module calculates the self-correlation matrix by taking the hidden layer representation obtained from the graph attention network and then multiplies the obtained autocorrelation matrix with the matrix obtained from the graph attention module to obtain the final node representation using the principle of jump connection.The results of conducted experiments on six real datasets and evaluating the model using four evaluation metrics show that the model performs better than other methods.

Traditional Methods
Traditional methods are based on network topology for community detection, which can be divided into graph partitioning, statistical inference, hierarchical clustering, dynamic methods, spectral clustering, density-based methods, and optimization methods according to the principles applied [11].These methods only capture the shallow structure of the network and have a high computational complexity for large-scale network data.With the increasing richness of information in the real world, traditional community detection methods can no longer meet demands.

Graph Embedding Methods
Graph embedding methods can map high-dimensional sparse vectors into low-dimensional dense vectors with the advantage of using high-dimensional nonlinear features (e.g., network topology information) and high-dimensional relational features (e.g., node attribute information) represented by nodes, neighbors, edges, subgraphs (e.g., communities), and encoded features [12].In this kind of method, the nodes in a complex network are represented by low-dimensional real-value vectors, and the traditional clustering method can be used to solve the community detection problem.At present, deep clustering approaches focuses on the graph convolutional network-based approaches and autoencoder-based approaches.
Clustering methods based on graph convolutional networks (GCN) [13] to learn graph structure and node attributes have been widely studied [14][15][16][17][18][19][20][21][22][23].For attribute graphs, neighboring nodes and nodes with similar characteristics may gather in the same community.Graph autoencoders (GAE) and variational graph autoencoders (VGAE) [14] integrate graph structures into node attributes by iteratively aggregating the neighborhood representation around each central node.Deep attentional embedded graph clustering (DAEGC) [15] uses a graph structure and node attributes at the same time.It captures the importance of the neighborhood nodes through a graph attention network as an encoder and uses KL-divergence loss to supervise the training process of graph clustering.On the basis of DAEGC, the deep-neighbor-aware embedding for node clustering in attributed graphs (DNENC) [16] uses GCN as the encoder, which complements the contrast experiments.The experimental results show the effectiveness of the proposed framework.According to the setting of DAEGC, the adversarially regularized graph autoencoder (ARGA) [17] further developed an adversarial regulator to guide the learning of potential representations.Structural deep clustering network (SDCN) [18] integrates an information transfer operator, a dual self-supervised learning mechanism, an autoencoder, and a graph convolution network into a unified framework for better representation learning better.Experiments show that the autoencoder can alleviate the over-fitting phenomenon.The hierarchical attention network (HiAN) [23] designs the hierarchical attentive aggregator to fuse rich interpretable interactive information.These GCN-based methods still have the problem of smoothing.Meanwhile, GCN only aggregates neighborhood information equally when learning structural representation.
Many deep clustering methods based on the autoencoder also have been proposed [24][25][26][27].The autoencoder (AE) [28] is the most commonly used solution for unsupervised community detection.Deep embedded clustering (DEC) [24] first trains the encoder and then uses a pretrained network to iteratively optimize the KL divergence-based clustering loss with the help of self-learning-assisted target distributions so that the representation learned by the autoencoder is closer to the center of the clusters and improves the cohesiveness of the clusters.To improve the accuracy of the target distribution, the improved deep embedded clustering (IDEC) [25] jointly optimizes the clustering assignment and learns features suitable for clustering while maintaining local structure.These methods help the autoencoder learn data representations with higher relevance to clustering by computing clustering loss, exploiting the features of the data itself but not incorporating structural information.

The Proposed Model
In this section, we present the community detection fusing graph attention network (CDFG) shown in Figure 1.We first introduce the notation and the problem definition.In the following subsections, we describe the CDFG model in detail in four modules, i.e., the autoencoder module (AE), the graph attention network module (GAT), the feature fusion module, and the self-supervision learning module.
A complex network whose nodes have attributes is an attribute network.Given an attribute network G = {V, E, X}, where V = {v 1 , v 2 , . . .v N } is the set of nodes, E is the set of edges.N is the number of nodes.X = {x i } N 1 is the feature matrix, where x i ⊆ R d denotes the attribute vector of node v i .Here, d is the attribute dimension.The topology of the graph G can be represented by the adjacency matrix Problem Definition.Given an attribute network G, it is divided into several disjoint groups, i.e., {G 1 , G 2 , . . .G k }.Each community G i is a partition of the network G, and k represents the number of communities divided from the original network G with common properties of clustering.Nodes in the same community should satisfy to be structurally tightly connected and have more similar attributes, while nodes in different communities are sparsely connected and have different attributes.G i ∩ G j = ø means that there is no intersection between communities, indicating non-overlapping community detection.is the transpose of  . is the self-correlation matrix. is the representation obtained after the fusion of the GAT module and the AE module.⊗ denotes the computation of matrix multiplication, and  is the final representation obtained after fusion. is obtained based on the Student's t-distribution and denotes the distance relationship between  and the clustering centre. is the target distribution calculated from .
A complex network whose nodes have attributes is an attribute network.Given an attribute network  = , ,  , where  =  ,  , ⋯  is the set of nodes,  is the set of edges. is the number of nodes. =  is the feature matrix, where  ⊆ ℝ  denotes the attribute vector of node  .Here,  is the attribute dimension.The topology of the graph  can be represented by the adjacency matrix  =  ∈ ℝ and if  ,  ∈ ,  = 1, otherwise  = 0. Problem Definition.Given an attribute network , it is divided into several disjoint groups, i.e.,  ,  , ⋯  .Each community  is a partition of the network , and  represents the number of communities divided from the original network  with common properties of clustering.Nodes in the same community should satisfy to be structurally tightly connected and have more similar attributes, while nodes in different communities are sparsely connected and have different attributes. ∩  = ∅ means that there is no intersection between communities, indicating non-overlapping community detection.

AE Module
A basic autoencoder [28] is used for unsupervised representation learning of the data from the perspective of generality to extract valid information from the attribute features of the data itself.The autoencoder consists of an encoder and a decoder.The encoder maps the input data to a particular feature space to obtain the hidden layer representation, and the decoder maps the hidden layer representation to the input space.It makes the hidden layer representation retain the features of the input data through the reconstruction of the A denotes the adjacency matrix of the graph.X is the feature matrix of the node.A and X are both used as input to the GAT module.X is used as input to the AE module.Z AE is the hidden layer representation obtained by the AE module.Z (1) denotes the first layer representation obtained by the graph attention module.Z (1) T is the transpose of Z (1) .S is the self-correlation matrix.Z (l) is the representation obtained after the fusion of the GAT module and the AE module.⊗ denotes the computation of matrix multiplication, and Z is the final representation obtained after fusion.Q is obtained based on the Student's t-distribution and denotes the distance relationship between Z AE and the clustering centre.P is the target distribution calculated from Q.

AE Module
A basic autoencoder [28] is used for unsupervised representation learning of the data from the perspective of generality to extract valid information from the attribute features of the data itself.The autoencoder consists of an encoder and a decoder.The encoder maps the input data to a particular feature space to obtain the hidden layer representation, and the decoder maps the hidden layer representation to the input space.It makes the hidden layer representation retain the features of the input data through the reconstruction of the input data.We suppose there are L layers in the autoencoder and l denotes the l-th layer of the autoencoder, then the learning representation H (l) of the l-th layer of the encoder part is formulated as: where σ is the activation function, W e (l) is the l-th layer weight matrix, and b e (l) is the bias of the l-th layer in the encoder.
The decoder and the encoder are symmetric, and the corresponding learning representation H (l) is calculated as: where W d (l) is the l-th layer weight matrix and b d (l) is the bias of the l-th layer in the decoder.The objective function can be obtained by minimizing the loss between the original data and the reconstruction: where X is the reconstruction of the original data X and • F denotes the Frobenius norm.
The reconstruction loss of the autoencoder is used as part of the global loss of the model.

GAT Module
After obtaining the attribute features of the data from the autoencoder, it lacks the structural information of the data.We use the graph attention network to encode the structural features and fuse the representation obtained from the two sub-module complexes by balancing a parameter with the autoencoder.The representation of each layer learned by the autoencoder is transferred to the graph attention layer by the balancing parameter, which realizes the fusion of structure information and attribute information.
The essential component of the graph attention network is the graph attention layer, which is based on the principle of learning the implicit representation of nodes by aggregating their neighbors.Each neighbor is given a different weight in the attention mechanism to measure the importance of different neighbors.Furthermore, the high-order neighborhood information of the nodes is considered when calculating the topology.We use both the attribute values and the adjacency matrix as the input of the graph attention module.Then, the learned representation of the autoencoder is combined with the learned representation of the graph attention neural network to obtain more comprehensive information.

The learning representation z (l)
i of the l-th layer of node v i in the attention layer of the graph is calculated as: where z (l) i is the hidden layer representation of node v i , N i denotes the set of neighbors of node v i , a ij is the attention coefficient of node v j to node v i , and W (l−1) denotes the learnable parameter matrix.The attention coefficients are calculated from the attribute values and topological distances, respectively.In terms of attributes, it can be regarded as a single-layer neural network with weight vector attribute value concatenation.
From a topological point of view, neighboring nodes influence the target node through connected edges.While the classical GAT considers only first-order neighborhoods, the graph attention layer used in this paper considers high-order neighborhoods of the graph.
where B is the transition matrix and B ij = 1/m i if edges exist at nodes v i and v j , otherwise B ij = 0 and m i is the degree of node v i .R ij denotes the topological correlation of node v i and node v j up to t orders.Here, t can be chosen flexibly according to different datasets.The attention coefficients are usually normalized for comparison between nodes using the softmax function.The final attention coefficient after adding the topological weights and activation functions can be expressed as: The representation obtained by the autoencoder is passed to the graph attention layer for each layer by balancing the parameters ε.The fused representation Z (l) of the autoencoder, and the graph attention layer is obtained as follows: where Z (l) is the output of the graph attention layer and Z (l−1) AE is the output of the autoencoder l − 1 layer, and the final representation is obtained by fusing the two learned representations layer by layer with a balancing parameter.In this way, the hidden layer representation inherits more attributes from the attribute space of the original graph, preserving features that can be better clustered.

Feature Fusion Module
This module performs feature fusion of the node representations obtained from the graph attention module using the principle of skip connection.
Firstly, a self-correlated learning mechanism is introduced.The latent representation Z (1) obtained by encoding the first layer of the graph attention module is transposed and then multiplied with itself.The normalized self-correlated matrix S is obtained using the softmax function: Then, S is used as the correlation coefficient and multiplied with the node representation Z (l) obtained from the graph attention module to calculate the node representation Z F : Finally, we use the softmax function for multiple classifications: The result z ik denotes that the probability of the node v i belongs to the k-th clustering centre.Z is considered as a probability distribution.

Feature Fusion Module
After unifying the above three components in a framework, it is necessary to make it oriented to the clustering task.We adopt the self-supervised learning mechanism for model optimization.
For the i-th node and the k-th clustering centre, the similarity between the embedding representation and the clustering centre is measured using the Student's t-distribution as a kernel: where AE , µ k is obtained by the autoencoder initialized by Kmeans pretraining, υ is the degree of freedom of the Student's t-distribution, q ik is viewed as the probability of assigning the i-th sample to the k-th clustering centre, and Q = q ij is the distribution of all samples.
In order to make the obtained embedding representation closer to the cluster center, the target distribution is calculated as: The values of Q in the objective distribution P are normalised by the sum of squares so that the results obtained have a high confidence level and the target function is obtained in the following form: By minimizing the KL-divergence loss between the Q and P distributions, the target distribution P can help the autoencoder module learn a better representation of the clustering task by bringing the data closer to the cluster centres.Similarly, in order to train the graph attention neural network, the KL-divergence loss is as follows: By this way, GAT and AE optimize on the same objective, making their results converge during the training process.Since the objective of the AE module and the GAT module is to approximate the target distribution P, these two modules can supervise each other's learning.
The reconstruction loss, cluster learning loss, and graph attention neural network classification loss obtained from the autoencoder are jointly optimized, and the final loss function is: where α is a hyperparameter to balance the cluster optimisation and local structure preservation, and β is a coefficient to control the interference of GAT on the embedding space.
The final clustering result of the nodes is the soft distribution of the distribution Z, i.e., the clustering result of i-th sample:

Experiments 4.1. Datasets
We conduct experiments on six public datasets with the statistical information shown in Table 1, including three non-graph datasets and three graph datasets.USPS [29], HHAR, [30] and REUT [31] are non-graph datasets lacking graph structure information.For non-graph datasets, the method of constructing a KNN graph in SDCN is used for graph construction with the values of K being 3, 5, and 1, respectively.ACM, DBLP, and CITE are classical graph datasets.There are significant differences among the six datasets in average path length, clustering coefficient, and diameter.

Experiments Setup
Baselines.We compare our method with two types of methods: AE-based clustering and GCN-based graph clustering.
• AE [28] is a deep clustering approach that performs a K-means algorithm on the representation learned by the autoencoder.
• DEC [24] designs a clustering target to guide the embedding process.

•
IDEC [25] adds reconstruction loss to DEC to learn better embedding results.• GAE and VGAE [14] are unsupervised graph embedding methods that use GCN to learn data representations.

•
DAEGC [15] uses graph attention networks as encoders to learn node representations and uses clustering losses to supervise the graph clustering process.
Parameters Setting.First, we pretrain the autoencoder to initialize the clustering center using all data with 30 iterations, and the learning rate is 0.001.For fair comparisons, the number of neural units for GAT and autoencoder is set to d-500-500-2000-10 as in the SDCN, with d being the feature dimension of the input data.The structure of nodes with two-hop neighbors is more common in graph data, and t is set to 2 for the generality of the model.α is 0.1 and β is set to 0.01 in the loss function.The value of the balance parameter ε is kept the same as that of the SDCN.To ensure the convergence of the clustering results, we trained uniformly for all datasets for 500 iterations.To prevent extreme cases, we run 10 times for each dataset.The average and standard deviation are calculated as the final results.
Evaluation Metrics.We use the four common metrics to evaluate the effect of clustering, including accuracy (ACC), normalized mutual information (NMI), adjusted rand index (ARI), and F1 score (F1).The higher value of each metric indicates a better result of clustering.

Clustering Results
The clustering results of our proposed method on the six datasets are shown in Table 2, with the bolded numbers indicating the optimal results.From the clustering results in Table 2, we notice that the model with the addition of the graph attention layer and feature fusion module performs well on most of the datasets.It is due to GAT's better representation learning ability compared to GCN in the learning graph structure.In addition, the feature fusion module further enhances the features.The model can learn better about neighborhood information after adding the graph attention layer and also considers the high-order neighborhood information of the nodes.The feature fusion module makes a secondary fusion of structural and attribute features.
For the three non-graph datasets, USPS, HHAR, and REUT, all metrics perform well.As far as the average is concerned, compared to the SDCN method, there is a more significant improvement in HHAR.Our approach improves 4.2% on ACC, 2.8% on NMI, 4.7% on ARI, and 5.7% on F1.The HHAR dataset has a longer average path length and network diameter compared with the other two non-graph datasets.We consider the high-order neighborhood information and learn more informative node representations.For REUT, there is a more significant improvement on ACC, NMI, and ARI than USPS.From the viewpoint of network features, REUT has a smaller clustering coefficient and higher feature dimension than USPS.
For DBLP, the improvement is 6.7% on ACC, 2.1% on NMI, 6.5% on ARI, and 6% on F1.For CITE, the improvement is 3.2% on ACC, 3.5% on NMI, and 4.2% on ARI.For DBLP, there is a greater improvement on ACC and ARI than CITE because DBLP has a longer average path length and network diameter than CITE, and a smaller clustering coefficient.The effect is not improved for the ACM dataset because the network clustering coefficient is higher and aggregation is higher than the other two graph datasets.There will not be much difference in learning between GCN and GAT.The average path length and network diameter are smaller for the ACM dataset, and there will not be much difference considering high-order information.
In all metrics, our method significantly improved the graph dataset DBLP compared to the non-graph dataset HHAR.In other words, the model performs better for the data with graph structure than the data constructing the KNN graph.Since the edges in the KNN graph are not real and there is some noise, it is necessary to construct an effective KNN graph to improve the model's effectiveness.In terms of the characteristics of the network, the model proposed in this paper is more suitable for networks with longer average path length and network diameter and smaller clustering coefficient.

Ablation Study
We conduct an ablation study to evaluate the effectiveness of the GAT module and the feature fusion module.The results are reported in Figure 2.
Analysis of the GAT module.From Figure 2, we can see that CDFG-w has a 0.1% to 4.6% improvement over the SDCN method, which shows the effectiveness of the graph attention network module.For non-graph datasets, the CDFG-w method performs better on HHAR than the other two non-graph datasets.For graph datasets, the CDFG-w method performs better on DBLP and CITE.The improvement is greater for networks with longer average path lengths and network diameters and smaller clustering coefficients, i.e., HHAR, DBLP, and CITE, which means that the graph attention layer considering high-order neighborhood information is more effective for this type of network.
Analysis of the feature fusion module.The CDFG has a 0.1% to 4.3% improvement over the CDFG-w method, which demonstrates the effectiveness of the feature fusion module.We can find that CDFG method performs better on graph datasets than non-graph datasets.For graph datasets, the CDFG method performs better on DBLP than on CITE.Similarly, the feature fusion module is more effective for networks with long average path length and network diameter and a small clustering coefficient.In addition, the feature fusion module improves the dataset with actual graph structure to a greater extent than the KNN graph because the dataset with actual graph structure reflects the characteristics of the data more accurately.Moreover, the model learns better after further enhancement of the features.
graph datasets.For graph datasets, the CDFG method performs better on DBLP than on CITE.Similarly, the feature fusion module is more effective for networks with long average path length and network diameter and a small clustering coefficient.In addition, the feature fusion module improves the dataset with actual graph structure to a greater extent than the KNN graph because the dataset with actual graph structure reflects the characteristics of the data more accurately.Moreover, the model learns better after further enhancement of the features.CDFG-w is the method with the addition of a graph attention network module.CDFG is the method with the addition of a feature fusion module.

Parameter α Sensitivity Analysis
We conduct parameter sensitivity analysis of  in the loss function, which is an important parameter for balancing the clustering loss and other losses.To evaluate the effect of parameter  on model performance, the CDFG model is experimented with three graph datasets, including ACM, DBLP, and CITE, by setting  = [0.01,0.1, 1, 10, 100] CDFG-w is the method with the addition of a graph attention network module.CDFG is the method with the addition of a feature fusion module.

Parameter α Sensitivity Analysis
We conduct parameter sensitivity analysis of α in the loss function, which is an important parameter for balancing the clustering loss and other losses.To evaluate the effect of parameter α on model performance, the CDFG model is experimented with three graph datasets, including ACM, DBLP, and CITE, by setting α = [0.01, 0.1, 1, 10, 100] with fixed β = 0.01.We do not discuss β here because the CDFG method is insensitive to β.We ran our method 10 times independently for each dataset and report the average results.The results of each metric are shown in Figure 3.
(e) DBLP (f) CITE Comparison of experimental ablation effects of graph attention network module and feature fusion module.SDCN is the method using only graph convolution network and autoencoder.CDFG-w is the method with the addition of a graph attention network module.CDFG is the method with the addition of a feature fusion module.

Parameter α Sensitivity Analysis
We conduct parameter sensitivity analysis of  in the loss function, which is an important parameter for balancing the clustering loss and other To evaluate the effect of parameter  on model performance, the CDFG model is experimented with three graph datasets, including ACM, DBLP, and CITE, by setting  = [0.01,0.1, 1, 10, 100] with fixed  = 0.01.We do not discuss  here because the CDFG method is insensitive to .We ran our method 10 times independently for each dataset and report the average results.The results of each metric are shown in Figure 3. From Figure 3, we can observe that the parameter  has a certain influence on the clustering effect, and all three datasets reach the optimal value when  = 0.1.For ACM and CITE, the trend of changes is gentler with the parameter .However, for DBLP, the variation is more significant compared to ACM and CITE.From the viewpoint of network characteristics, networks with longer average path length and network diameter and smaller clustering coefficient are more sensitive to the change of parameter .

Network Visualisation
In order to verify the validity of the model more intuitively, we conduct an experiment visualizing the clustering results.For the original data, we use PCA firstly to reduce the dimension as the embedding representation obtained by CDFG, and use the t-SNE [32] method for 2D visualization.For the learned embedding results, we directly visualize the data samples in 2D space by using the t-SNE method.The visualization results are shown in Figure 4.The data points of the same color indicate the same category.The clearer the boundary between clusters composed of sample points of different colors, the better the clustering results.From Figure 3, we can observe that the parameter α has a certain influence on the clustering effect, and all three datasets reach the optimal value when α = 0.1.For ACM and CITE, the trend of changes is gentler with the parameter α.However, for DBLP, the variation is more significant compared to ACM and CITE.From the viewpoint of network characteristics, networks with longer average path length and network diameter and smaller clustering coefficient are more sensitive to the change of parameter α.

Network Visualisation
In order to verify the validity of the model more intuitively, we conduct an experiment visualizing the clustering results.For the original data, we use PCA firstly to reduce the dimension as the embedding representation obtained by CDFG, and use the t-SNE [32] method for 2D visualization.For the learned embedding results, we directly visualize the data samples in 2D space by using the t-SNE method.The visualization results are shown in

Conclusions
In this paper, we propose a community detection model fusing the graph attention layer and the autoencoder.The innovation of the model is that it fuses the autoencoder and the graph attention network with high-order neighborhood information for the first time.In addition, the feature fusion module is designed to achieve secondary fusion.The graph attention layer learns structural features better by considering the importance of neighborhood information.Adding high-order neighborhood information is more robust and effective for networks with longer mean paths and network diameters.The autoencoder learns the data characteristics, and a balance parameter fuses the two parts.The feature fusion module makes a secondary fusion of structural and attribute features.The experimental results show that the proposed model performs better on network datasets with longer mean paths and network diameters and smaller clustering coefficients.Compared with various state-of-the-art methods, CDFG has better performance for community detection.The visualization results show that the original data distribution is more scattered, and the boundary is more confusing, while after the representation learning of the model, the same category is more aggregated, and the boundary between categories is clearer, which can verify the validity of the CDFG.

Conclusions
In this paper, we propose a community detection model fusing the graph attention layer and the autoencoder.The innovation of the model is that it fuses the autoencoder and the graph attention network with high-order neighborhood information for the first time.In addition, the feature fusion module is designed to achieve secondary fusion.The graph attention layer learns structural features better by considering the importance of neighborhood information.Adding high-order neighborhood information is more robust and effective for networks with longer mean paths and network diameters.The autoencoder learns the data characteristics, and a balance parameter fuses the two parts.The feature fusion module makes a secondary fusion of structural and attribute features.The experimental results show that the proposed model performs better on network datasets with longer mean paths and network diameters and smaller clustering coefficients.Compared with various state-of-the-art methods, CDFG has better performance for community detection.This is because the use of the graph attention network calculates the influence weight of the neighborhood node on the target node through the attention mechanism, taking into account the difference of the influence of different neighborhood nodes.At the same time, high-order neighborhood information is added to the graph attention layer.The final node representation contains more information.On the other hand, the feature fusion module further fuses the structure information and the attribute information, and enhances the feature.
Currently, the model only supports non-overlapping community detection and it is a research direction to make model support overlapping community detection in the future.How to improve the efficiency of the algorithm while keeping the excellent performance of the model is also worth studying.

Figure 1 .
Figure 1.The architecture of the CDFG. denotes the adjacency matrix of the graph. is the feature matrix of the node. and  are both used as input to the GAT module. is used as input to the AE module. is the hidden layer representation obtained by the AE module. denotes the first layer representation obtained by the graph attention module. is the transpose of  . is the self-correlation matrix. is the representation obtained after the fusion of the GAT module and the AE module.⊗ denotes the computation of matrix multiplication, and  is the final representation obtained after fusion. is obtained based on the Student's t-distribution and denotes the distance relationship between  and the clustering centre. is the target distribution calculated from .

Figure 1 .
Figure 1.The architecture of the CDFG.A denotes the adjacency matrix of the graph.X is the feature matrix of the node.A and X are both used as input to the GAT module.X is used as input to the AE module.Z AE is the hidden layer representation obtained by the AE module.Z(1) denotes the first

Figure 2 .
Figure 2.Comparison of experimental ablation effects of graph attention network module and feature fusion module.SDCN is the method using only graph convolution network and autoencoder.CDFG-w is the method with the addition of a graph attention network module.CDFG is the method with the addition of a feature fusion module.

Figure 2 .
Figure 2.Comparison of experimental ablation effects of graph attention network module and feature fusion module.SDCN is the method using only graph convolution network and autoencoder.CDFG-w is the method with the addition of a graph attention network module.CDFG is the method with the addition of a feature fusion module.

Figure 3 .
Figure 3.The influence of parameter  on the model effect.

Figure 3 .
Figure 3.The influence of parameter α on the model effect.

Figure 4 . 15 (
The data points of the same color indicate the same category.The clearer the boundary between clusters composed of sample points of different colors, the better the clustering results.Mathematics 2022, 10, x FOR PEER REVIEW 13 of

Figure 4 .
Figure 4.The visualization results on six datasets.The first and second rows correspond to the original data and CDFG, respectively.

Figure 4 .
Figure 4.The visualization results on six datasets.The first and second rows correspond to the original data and CDFG, respectively.

Table 1 .
The details of the datasets.