MSDA-NMF: A Multilayer Complex System Model Integrating Deep Autoencoder and NMF

: In essence, the network is a way of encoding the information of the underlying social management system. Ubiquitous social management systems rarely exist alone and have dynamic complexity. For complex social management systems, it is difﬁcult to extract and represent multi-angle features of data only by using non-negative matrix factorization. Existing deep NMF models integrating multi-layer information struggle to explain the results obtained after mid-layer NMF. In this paper, NMF is introduced into the multi-layer NMF structure, and the feature representation of the input data is realized by using the complex hierarchical structure. By adding regularization constraints for each layer, the essential features of the data are obtained by characterizing the feature transformation layer-by-layer. Furthermore, the deep autoencoder and NMF are fused to construct the multi-layer NMF model MSDA-NMF that integrates the deep autoencoder. Through multiple data sets such as HEP-TH, OAG and HEP-TH, Pol blog, Orkut and Livejournal, compared with 8 popular NMF models, the Micro index of the better model increased by 1.83, NMI value increased by 12%, and link prediction performance improved by 13%. Furthermore, the robustness of the proposed model is veriﬁed.


Introduction
Non-negative matrix factorization (NMF) is now known to be a relatively new method of matrix factorization [1]. Since D.D. Lee et al. proposed a new method of feature subspace in Nature in 1999, Non-negative Matrix Factorization [2] has been widely used in image analysis, text clustering, data mining, speech processing and other aspects. With the deepening of research, many application analyses have developed, from the early singlestructure feature analysis to the joint mining of multiple network structures and the layered analysis of multi-source information. In addition, abundant data indicate that the model based on a single pairwise interaction may not capture complex dependencies between network nodes [3]. The interaction of user with both the video and its enrichments results in a lot of explicit and implicit relevance feedback, which enables some works to provide personalized and rich multimedia content [4]. Lambiotte and Rosvall et al. [5] described the shortcomings of the traditional network model and the existing ideal high-order model in their article published in NaturePhysics and discussed that the multi-layer network model played an important role in the analysis of various types of interactions of many actual complex systems.
In the analysis and mining of multi-relational data, the method based on network representation learning has attracted the attention of many scholars because of its excellent performance in many practical tasks. Network representation learning [6] (also known as network embedding) is an effective network analysis method. Based on preserving network structure information, it embeds graphs into a low-dimensional and compact space. For complex multi-level data, however, with only the NMF decomposition singlelayer network composed of a single decomposition, it tends to reach a higher accuracy even with the corruption of a severe proportion of data, as well as the function of the local minimum in split, leaving it unable to express the characteristics of data from many angles, so decomposition results are not always satisfied [7].
To effectively represent high-order and multi-layer complex data, traditional vectorbased machine learning and deep learning algorithms can directly use the mapping of low-dimensional space representations to efficiently complete network analysis tasks, which greatly enrich the selection of algorithms and models for network mining tasks [8,9]. Therefore, how to use an effective deep network structure for the hierarchical feature extraction of complex data and how to combine the advantages of non-negative matrix decomposition and deep networks have important practical implications for the collaborative discovery of data research knowledge from multiple information sources. Therefore, in this paper, NMF is introduced into a multi-layer NMF structure, and the feature representation of input data is realized by using a complex hierarchical structure. By adding regularization constraints for each layer, the essential features of data are obtained by characterizing the feature transformation layer-by-layer, and further, the multi-layer NMF model MDA-NMF of a deep autoencoder and NMF is constructed. The method proposed in this paper can effectively improve the detection accuracy and prediction accuracy of social groups in the complex social management system. The main contributions of the proposed system can be summarized as follows: 1. The model integrates the multi-layer structure features of deep self-coding; 2. The model introduces multi-layer NMF structure, which can effectively use a complex hierarchical structure to achieve feature representation of input data; 3. Through the evaluation of multiple data sets, it is proved that the proposed method is superior to the existing multi-layer NMF method.
The rest of the paper is organized as follows: Section 2 discusses the related work. Section 3 introduces the proposed method through model description and model optimization. The experimental results are shown in Section 4 and discussed by parametric sensitivity analysis, multiclassification experiment, node clustering experiment, and link prediction experiment. Finally, Section 5 introduces the conclusion.

Study on Non-Negative Matrix Factorization
Non-negative matrix factorization is a popular machine learning technique. The essence of a non-negative matrix algorithm is to transform high-dimensional data into low-dimensional data by a linear combination of variables. There is non-negative value in the decomposition result, which dramatically improves the interpretability of the model, and development is also a widespread concern. In 2001, Lee and Seung put forward the multiplicative iteration algorithm [10], and from then the NMF algorithm was widely used in areas such as NLP, CV, a recommendation system and other fields. However, as the result of matrix decomposition, the feature matrix and coefficient matrix are not sparse enough, so the feature extraction is not apparent enough. Therefore, researchers have begun to study the sparsity of the feature matrix and coefficient matrix, and the repeatability of data in the feature matrix and sparse matrix will also be reduced. Similarly, many methods have been proposed. For example, Hoyer et al. [11] reconstructed the data and established the objective function by calculating the error between the actual and Euclidean distances. In addition, L1 was added as the penalty term, and the eigenmatrix and coefficient moment were optimized by iteration through the gradient descent method. Ren et al. [12] first used the given reference image to create nonorthogonal and non-negative basis components and then used these components to model the target. A factor then scales the constructed model to compensate for the contribution from the disk. Jia et al. [13] proposed a new semi-supervised model that simultaneously learns similar matrices with supervised information and generates clustering results. Huang et al. [14] proposed a robust multi-feature collective non-negative matrix decomposition (RMCNMF) model for ECG biometric noise and sample variation.
In summary, many studies implement complex function approximation and learn data sets from small sample sets through distributed representation of input data. Therefore, more and more researchers have focused on deep non-negative matrix factorization based on deep learning.

Deep Non-Negative Matrix Decomposition Analysis
The development of deep learning has gone through a long process. SVM, Boosting, Logistic Regression (LR) and other methods have been successively proposed. The structures of these methods either have no hidden layer nodes or contain a layer of them, so they are collectively referred to as shallow layer models. In 2006, The DBN (Deep Belief Network) model proposed by Hinton et al. [15] made it possible to construct Deep models for learning. Subsequently, deep models such as Deep Boltzmann Machines (DBM) [16] and Fuzzy Deep Belief Networks (FDBN) [17] were successively proposed. These deep networks with multiple hidden layers have excellent feature learning abilities. The features acquired by learning have an essential description of data, which is more favourable in visualization or classification applications. Ye et al. [18] proposed a new model for community detection-Depth-like autoencoder NMF (DANMF) based on the deep non-negative matrix method. DANMF adopts the architecture of hierarchical mapping between the original network and the final network community allocation and implicitly learns the hidden attributes from the low-level to the high-level of the original network in the middle layer. According to the deep decomposition architecture, De et al. [19] reviews the MF model based on deep learning and introduces the algorithm and application.
Because of deep networks' excellent data learning performance, some scholars have proposed multi-layer non-negative matrix decomposition algorithms based on NMF and related algorithms in recent years [20]. However, the above methods and the deep NMF model integrating multi-layer information struggle to explain the results of the middle layer NMF.

Multilayer Non-Negative Matrix Factorization
Unlike single-layer learning, multi-layer NMF reveals more intuitive feature levels through the relationship between features of each layer. Layered structures learn meaningful and helpful features. The first model to extend CLRMA to multiple levels was the multi-layer NMF proposed by Cichocki et al., 2006 [21,22]. At the first level, the low-rank factor factorization of X is computed. At the next level, the matrix factorization is performed until the matrix factorization is performed. In 2013, a hierarchical non-negative matrix decomposition algorithm was proposed by [23] for hierarchical data representation. Multi-layer NMF obtains a multi-layer structure through multiple iterations of a nonsmooth non-negative matrix decomposition algorithm. Rajabi et al. [24] proposed using multilayer NMF (MLNMF) to achieve hyperspectral decomposition. The spectral eigenmatrices are modelled as the product of sparse matrices. Chen et al. [25] proposed a constrained multi-layer NMF method for hyperspectral data processing. In this approach, at each level, and two constraints are implemented on the objective function. One is the sparsity on the abundance matrix, and the other is the minimum volume on the spectral matrix. The hierarchical processing decomposed the abundance matrix into a series of matrices, making the sparsity feature more evident and meaningful. Yuan et al. [26] introduced Hoyer projectors to provide the iterative directivity of structures in the decomposition of processes. They proposed a multi-layer non-negative matrix decomposition framework based on Hoyer projection-HP-MLNMF, which completely reconstructed and enhanced the methods.
However, in the above methods, there are few studies on the description of the middle layer of multi-layer network and the fusion model of NMF.

Encoder-Based Model
The basic idea is to map the context matrix to the embedding matrix and then use the embedding matrix to reconstruct the original context matrix. Deep neural graph representation (DNGR) [27], Structural Deep Network Embedding (SDNE) [28] and MVC-DNE [29] use deep neural networks to combine graph structures into coder algorithms directly. The basic idea behind these methods is to use autoencoders to compress information approximately node-local neighbor characteristics. The key to SDNE's ability to preserve node neighbour characteristics is deep autoencoders and multiple nonlinear layers. The model is used to protect first and second-order approximations of nodes by using Laplacian features in the middle layers of the encoder and by using modified autoencoder reconstruction errors. In addition to the domain autoencoder method mentioned above, there is also an encoder method that iteratively aggregates domain information to generate node embedding [30] and the GraphSAGE algorithm [31]. Researchers design encoders that rely on node-local neighbor characteristics rather than the whole graph and use node embedding methods to overcome significant limitations of shallow embedding and self-coding methods.
To sum up, most of the research based on the encoder model are based on local graphs, and few on multi-layer global graphs.

Model Description
There is only one layer of mapping between the original network and the embedded result features. The organization patterns of real-world networks is complex and diverse, and the mapping of the original network and community member space is likely to contain not only complex hierarchical and structural information, but also imply low-level hidden features, which cannot be extracted using classical shallow NMF-based methods. Furthermore, deep autoencoders are an excellent scheme for bridging the gap between low-level and high-level abstractions of raw data [32]. Inspired by the deep autoencoders, we can assume that we can obtain better structural features between nodes (i.e., more accurate structural feature matrix V) by further decomposition of mapping V and quality extraction at a deeper level from a lower to a higher level.
Based on the above discussion, this paper proposes a new model called the fusion deep autoencoder multi-layer STRUCTURE NMF model (MSDA-NMF). NMF introduces a multi-layer NMF structure and combines a deep autoencoder with NMF. Figure 1 illustrates an encoder component and a decoder component constituting the MSDA-NMF with a deep structure. Similar to the deep autoencoder, the encoder component attempts to transform raw networks into a low-dimensional hidden feature matrix captured in the middle layer. Like the deep autoencoder, the encoder component changes the primary network into a low-dimensional invisible feature matrix captured in the middle layer. Every intermediate layer explains the resemblance between nodes of various sequence. The decoder has symmetry with the encoder. It reconstructs the original network from the final embedded eigenmatrix through the hierarchical mapping learned in the encoder component. Unlike traditional NMF-based loss functions that only consider the decoder components, MSDA-NMF integrates encoder components and decoder components into unified loss functions.Using this method, the quasi-autoencoder NMF can learn the relationships between cross-layer features and obtain the extraction process from first-order to the high-order similarity of network structure in complex data intuitively and easily. MSDA-NMF using this hierarchical structure feature extraction algorithm, the optimal depth class autoencoder class NMF structure suitable for classification tasks is studied. To better explain the terms and symbols used in this paper, we have unified the terms and symbols used. See Table 1 for details.

A
The adjacency matrix of graph G r i Embedding dimension of layer I NMF Z i Layer I auxiliary matrix V i I layer embeds the eigenmatrix B Random walk eigenmatrix C Second order node similarity eigenmatrix The capacity of figure G

Model Solution
NMF directly learns a layer of auxiliary matrix Z and basis matrix V. However, realworld networks are often complex and diverse organizational patterns. Therefore, the mapping between the original network and the community member space will probably contain complicated hierarchical and structural information with implicit low-level hidden attributes. It is well known that deep learning can bridge the gap between low-level and high-level abstractions of raw data. In this sense, we propose to factor the mapping Z further, hoping to add an extra layer of abstraction for each layer of embedding results and extract the similarity between nodes from low order to high order. To be specific, the adjacency matrix A is decomposed into the product of p + 1 non-negative matrices, as follows: A hierarchy in Formula (1) that allows p-level abstract understanding of the original network can look similar to the following Sample decomposition: We also retain the non-negative constraint V i (1 ≤ i ≤ p). V i is the embedded characteristic matrix of the i layer, and Z i is the auxiliary matrix of the i layer. By doing so, each layer of abstract V i captures node similarity of different orders, from first-order proximity to structural identification and finally to community-level similarity. To learn the embedding matrix, we derive the following objective function. This paper uses a three-layer NMF structure, so p = 3: After the optimization of Formula (3), we can obtain the learning result of a network representation of each layer

Fusion Depth-like Autoencoder Matrix Decomposition
It can be seen that Formula (1) is a reconstruction of the original network corresponding to the decoder part of the autoencoder. To improve the representation learning capability of the auto encoder, the encoder components must be integrated into the community detection model based on NMF to form the NMF model similar to the autoencoder. The rationality of the NMF model of class automatic coders is quite simple. For the ideal basis matrix V, it is supposed to be able to reconstruct the primal network by reconstructing the mapping Z with a small error; meanwhile, it should be able to precisely project the primal network Z to the community membership space by the help of the mapping Z, that is, Integrating the encoder component and the decoder component with a unified loss function allows them to know each other during the learning process, from which we can obtain the ideal community members of the node. To accomplish this vision in the depth model, the objective function of the encoder is as follows.
The current deep class autoencoder matrix decomposition model cannot explain what kind of network representation results can be obtained at the impenetrable levels. To enhance the interpretability of this model, we add standard terms for different levels.
In [34], the second-order similarity matrix of LINE is expressed in the form: log(.) is the logarithm of all the matrix elements, D is the degree matrix of graph G, The final objective function is: In this way, the first-order similarity of the network structure can be obtained after first-layer non-negative matrix decompositions, the second-order similarity of the network structure can be obtained after second-layer non-negative matrix decompositions, and higher-order features can be obtained after third-layer non-negative matrix decompositions.

Model Optimization
Optimization problems are highly related to scientific research [35], and in this paper in order to speed up the model's approximation of the factor matrix, we pretrain each layer to obtain the initial approximation Z i and V i of the factor matrix. The training time of the model can be significantly reduced by the pretraining process. The effectiveness of pretraining has been proven in deep autoencoder networks. For pretraining, we first decompose the adjacency matrix The third layer is similar. Then, fine-tuning the objective function (6) for each layer by alternately minimizing the objective function proposed in the equation until the update of the objective function is very small, that is, the convergence ends. The updated algorithm is shown in the objective function of Algorithm 1.

12
Update W using Formula (18) 13 Use Formula (6)  The objective function (6) requires a parameter matrix consisting of V 3 , Z 1 , Z 2 , Z 3 , W, and their update rules are introduced in detail below. So here we have U = Z 1 Z 2 Z 3 .

V 3 Subproblems
When V 3 is updated, the matrices Z 1 , Z 2 , Z 3 and W are fixed to obtain the objective function is Equation (6).
By introducing Lagrangian multiplication matrix Θ, the following can be obtained: Let αL(V 3 ) αV 3 = 0 be obtained: Initialize V 3 and update V 3 according to the following rules: 3.2.2. Z 1 Subproblem When updating Z 1 , other variables are fixed to receive the following objective function: Similar to the update matrix V 3 , Lagrangian multiplication matrix Θ is introduced to obtain: Identical to the optimization calculation of V 3 , we define the update rules of Z 1 as follows:

Z 2 Subproblem
When Z 2 is updated, other variables are fixed and the following objective function is obtained: Referring to the V 3 optimization algorithm, we define the update rules of Z 2 as follows:

Z 3 Subproblem
When updating Z 3 , fix other variables and obtain the following objective function: Refer to the V 3 optimization algorithm, we define the update rules of Z 3 as follows:

W Subproblem
When W is updated, other variables are fixed and the following objective function is obtained: Refer to the V 3 optimization algorithm, we define the update rule of W as follows:

Model Complexity Analysis
The computational complexity of the five updated formulas in algorithm Algorithm 1 is respectively O(r 1 r 2 r 3 n 2 + r 1 r 2 r 3 n + r 2 r 3 n + r 2 1 r 2 2 r 2 3 n + r 2 2 r 3 n), O(r 3 n 2 + r 2 r 3 n + r 1 r 2 n + n 3 + r 1 n 2 ), O(r 1 n 2 + r 1 r 3 n + r 1 r 2 r 3 + r 2 1 n + r 2 1 r 2 ), O(r 1 r 2 n + r 2 n 2 + r 2 r 3 n + r 1 r 2 2 + r 2 2 r 3 ), O(r 2 r 3 n). Since r 1 , r 2 , r 3 can be considered input constants, and r 1 ≤ r 2 ≤ r 3 , the computational complexity is O(n 3 ). Most real complex networks are sparse, so only nonzero values are computed in the matrix multiplication of real complex networks. The calculation is simplified to O(n 2 e); here, we use E to represent the number of edges in the complex network. In addition, the model, the matrices Z 1 ∈ R n×r 1 , Z 2 ∈ R r 2 ×r 3 , Z 3 ∈ R r 2 ×r 3 , W ∈ R r 3 ×r 3 are parametric, the space complexity is denoted by O(n 2 e). Since r 1 , r 2 , and r 3 are much smaller than n, the calculation method of space complexity is simplified as O(n 2 ), but the complexity increases as the embedded dimensions increase.

Results and Discussion
In this paper, MSDA-NMF model is constructed by using the complex hierarchical structure to realize the feature representation of input data. With 8 different data sets and 6 methods, effective comparison is achieved. The experiments are performed on a computer with Windows 7, 3.10 GHz and 32.00 GB RAM.

Experimental Objects
The parameters of the msDA-NMF model in this paper include three hyperparameters α, β and γ, different layer dimensions r i , and convergence coefficient δ. In this complex network experiment, we first defined α, β and γ ∈ [1, 101] or ∈ [0, 1], r i ∈ {100, 200, 300, 400, 500}. Based on these parameters, we obtain the optimal parameters of the model. Here, the number of clusters, K, is a variable that changes according to the tags in the data set.

Data Sets and Comparison Methods
This section provides a brief overview of the open data set and advanced complex network representation learning models used in various fields.

Data Set
We complete the task of multi-label node classification through four popular networks. To better verify our model's clustering and link prediction robustness, we use three data sets with basic facts here. The statistical characteristics of the data set are shown in Table 2: • GR-QC, Hep-TH, Hep-PH [36]: This collaborative network is coauthored by authors from three different fields (general relativity and quantum cosmology, theory of highenergy physics, and phenomenology of high-energy physics) and extracted in the arXiv. The vertices of the network represent the authors, and the edges of the network represents an author who co-authored a scientific paper in this network. The GR-QC data set covers the smallest graph with 19,309 nodes (5855 authors, 13,454 articles) and 26,169 edges. The HEP-TH data set covers documents during January 1993 to April 2003 (124 months). It began during the beginning of arXiv and therefore essentially stands for the entire history of the HEP-TH section, with the citation graph covering all citations in a data set with N = 29,555 and E = 352,807 edges. If a paper I references a paper J, the chart contains directed edges from I to j. If an article is cited or cited, the diagram does not collect pieces of information about the paper. The HEP-PH data set is the second citation graph, taken from the HEP-PH section of arXiv, which covers all the citations in the data set with N = 30,501 and E = 347,268 edges.

Control Methods
We compared the proposed SDA-NMF model based on three NMFs with the four most advanced network embedding methods. The comparison results are as follows.
• M-NMF [41]: M-NMF combines the community structure characteristics and 2-step proximity of nodes in the NMF framework to learn node embedding in network structure. Node representation is used to show consistency with the network community structure, and an auxiliary community representation matrix is used to link local characteristics (first-and second-order similarity). Community structure features in the network structure to make joint optimization through the optimization formula. The embedding aspect of the experiment is set to 128, and the other parameters are set according to the original paper. • NetMF [34]: NetMF proves that models with negative sampling (DeepWalk, PTE and LINE) may be considered enclosed matrices, and demonstrates their superiority over DeepWalk and LINE in traditional network analysis and mining missions. • AROPE [42]: This method moves the singular value decomposition frame, and the embedding vector to any order, and learns the higher-order proximity of nodes. Thus, further, it reveals its internal relations. • DeepWalk [43]: Deep walk generates random paths for each node and treats the paths of these nodes as sentences in a language model. It subsequently proceeds to learn the embedding vectors using the Skip-Gram model. In the experiment, the parameters are set according to the original paper. • Node2vec [44]: This method extends the use of a biased random walk in DeepWalk. All parameters are the default settings of the algorithm, but two offset parameters P and q are introduced to optimize the process of random walk. • LINE [45]: LINE learns the embedding of the nodes through the definition of two loss functions, preserving the first-and second-order of proximity separately. The standard parameter setup is applied by default in this article, but the negative ratio is 5. • SDNE [28]: SDN utilizes a deep autoencoder with a semi-supervised architecture to optimize first-and second-order similarity of nodes and explicits objective functions to elucidate ways to retain network structure. In the experiment, the parameters are set according to the original paper. • GAE [30]: GAE model has some advantages in link prediction tasks of citation networks. The algorithm is based on a variational autoencoder and has the same convolution structure as GCN.

Evaluation Index
The robustness of the proposed model is verified by the following evaluation indicators: NMI [46]: The accuracy of comparison between algorithmically divided communities and generated standard communities is an important measure of community discovery. The measurement value is generally between 0 and 1. The higher the value is, the more accurate the detection result of the algorithm will be. When the value is 1, the result consistent with the label community can be obtained. The formula is as follows: NMI [47]: AUC is defined as the area under the receiver operating characteristic (ROC) curve, which is initially used to evaluate the classification effect of a classifier. Specifically, given the order of edges that are not observed, the AUC value is a random selection of the edge of a lost edge (e.g., an edge in the EP) and the probability is higher than the edge of a randomly selected edge not existing (for example, an edge in the US-E), a probability in the process of the realization of the algorithm, because, considering the time complexity, we usually calculate the probability of each observed no-edge value instead of a sorted list. To better estimate the value of AUC in the sorted list case, at each step we randomly select a missing edge and a nonexistent edge and compare their values. If in n independent comparisons, the value of the missing edge is higher than that of the non-existent edge for n times, and they are equal for n times, then AUC can be defined as:

Parameter Sensitivity Analysis
This section analyzes the effects of parameters α, β, γ, r 1 , r 2 and r 3 of the MSDA-NMF model on the clustering performance. These effects are on the real network, where r i , i ∈ {1, 2, 3} are the i-level embedding dimensions. To determine the specific parameters of the model, this paper first fixed all the other parameters except the two changing numbers based on the OAG data set. Second, the effect of each change is verified by adjusting the parameters of the two changes. This paper takes the OAG data set as an example to investigate influences of different parameters. The effects of each parameter are explored through varying parameters and simultaneously keeping the others fixed. For example, we observe the effect of α, β by changing α, β and fixing γ with r 1 , r 2 , r 3 and so on. Specifically, we change r i , r ∈ {1, 2, 3} from 100, 200, 300, 400, 500, and ask for r 3 < r 2 < r 1 . Figure 2 shows the influence of dimensions r 1 , r 2 and r 3 embedded in different layers on the effects of the three data clustering classes. The lighter the color of the point in the figure is, the larger the NMI value under the point coordinates. The NMI value of the yellow point is the maximum, and the NMI value of the purple point is the minimum. The size of a point also indicates the NMI value of the point. The larger the point is, the greater the NMI value of the point.   From the figures, we can see that: 1. In Figure 2, when r 1 is approximately 200, r 2 is approximately 170 and r 3 is 150, NMI is the maximum. If r 2 is controlled and r 3 remains unchanged, NMI decreases with the increase of r 1 , and the dimension of r 3 is not lower, but the NMI value is higher.
2. In Figure 3a: (1) When α is less than 30, and β is greater than 80 and less than 100, the model has the worst value of robustness.
(2) When α is greater than 40 and less than 80, the NMI value tends to be stable with an increase in β to less than 50.
(3) When α is greater than 50 and β is less than 30, the model can obtain better clustering performance at this time.
3. In Figure 3b, when y is in [1,61], NMI does not change much, indicating that when γ is in [81,101], and the clustering performance is relatively stable with increasing γ.
4. In Figure 3c, we notice that in a particular range, when both β and γ increase, NMI tends to be stable, while when γ and β are in the range [61, 81], NMI reaches its maximum value.
As for the relationship between α, β, γ and the cluster evaluation index NMI, experimental results show that when r 1 = 200, r 2 = 170 and r 3 = 150, at this point, more effective network structure features can be obtained, even in low-dimensional space. Therefore, we set the data in the following experiment when r 1 = 200, r 2 = 170 and r 3 = 150. Table 3 illustrates the experimental results. In the HEP-TH dataset, the MSDA-NMF (V 2 ) model has a micro result of 31.15 and a macro result of 17.01, just slightly below the optimal values of 33.87 and 25.58 for all the comparative models, respectively, but the results of this model are better than all the other comparative models except for the optimal value. The MSDA-NMF (V 3 ) model has a micro result of 35.7, better than all comparative models and 1.83 higher than the AROPE model, which has the highest micro result among them. In the macro comparison, the macro result of this model of 19.02 is only slightly lower than the optimal value of 25.58 of the GAE model, but still performs better in the optimal value than the other comparative models.

Multiclassification Experiment
The results of the multiclassification experiments of the MSDA-NMF (V 2 ) and the MSDA-NMF (V 2 ) are compared. In the HEP-TH dataset, the results of the MSDA-NMF (V 2 ) model are lower than those of the three-layer model. It can be seen that the more intermediate levels there are in the classification experiments, the more effective the MSDA-NMF can be in the model. In other citation network data sets, OAG and HEP-PH, the performance of the three-layer models proposed in this paper is superior to other models.
In addition, the micro and macro data results of the MSDA-NMF (V 2 ) model proposed in this paper also outperformed the other models in the HEP-PH dataset, and the macro indicators were higher than the comparison models in the OAG dataset. In the HEP-TH, OAG and HEP-PH data sets, the micro-F1 and macro-F1 multi-classification evaluation performance of MSDA-NMF is better than those of NetMF, GAE and other popular feature models. This proves the effectiveness of our network-embedding model.
What is special here is that the AREOP model performs better than the MSDA-NMF method for the micro-F1 and macro-F1 metrics in our comparison based on the Wikipedia data set. This is probably because Wikipedia is a dense word co-occurrence network, so a relatively low order is sufficient to characterize Wikipedia's web structure. Therefore, the MSDA-NMF method based only on matrix decomposition performs poorly in the classification task on sparse data sets.

Node Clustering Experiment
In this section, the behavior of node clusters is assessed according to the normalized mutual information (NMI) of typical metrics. In this paper, we use real data (including Polbog, Livejournal and Orkut) to assess the clustering performance of the model on real data sets. The NMI varies from 0 to 1, with a larger value indicating better cluster performance. In experiments to verify the clustering effect of the model, the standard K-means algorithm is used. Because the initial value has a significant impact on the clustering result, the clustering should be repeated 50 times and its average value should be considered as the result. Figure 4 demonstrates the clustering ability of nodes with related NMI. It can be seen from the figure that: 1. In the Polblog and Orkut data sets, MSDA-NMF (V 2 ) and MSDA-NMF (V 3 ) obtained better results based on NMI compared with all models. In particular, in the Polblog data set, the MSDA-NMF (V 2 ) and MSDA-NMF (V 3 ) modes also have a great advantage in NMI value compared with the best DeepWalk mode among the comparison modes. This is because our method integrates lower-order structural features and multi-layer features to capture diverse and comprehensive structural features of the network, and can obtain better NMI values in data sets with a lower number of categories.
2. The model in this paper also achieves better results in Orkut data sets with fewer categories. What is special here is that in the relatively high number of categories in the Livejournal data set, the NMI value obtained by this model is slightly lower than that obtained by the GAE model. The main reason for this is that GAE has the same convolution structure as GCN and is based on a variational autoencoder, so GAE has better robustness in link prediction for citation networks.
3. SDNE and LINE only retain the proximity between network nodes and cannot effectively preserve the community structure. Random walk-based DeepWalk and Node2VEC can better capture the second-order and higher-order similarity. Although AROPE can capture the similarity of different nodes and capture more global structure information as the length increases, the omission of community structure makes the algorithm ignore module information. But for the sparse network and the network without prominent community structure, the modularity of the NMF is constrained by the similarity of nodes to each other. Therefore, its performance is relatively low. This paper also compares the performance of the MSDA-NMF model of three-layer NMF and the MSDA-NMF model of two-layer NMF in multiclassification experiments. The results show that the NMI values obtained by the MSDA-NMF (V 3 ) model are higher than those obtained by the MSDA-NMF (V 2 ) model in all data sets. This proves the validity of the MSDA-NMF (V 3 ) model in network embedding.

Link Prediction Experiment
Link prediction mainly detects the accuracy of prediction by predicting which pairs of nodes may form edges and comparing them with the actually deleted edges. In the experiment, we randomly hide the 10%, 20%, 30%, 40% and 50% edges as test data, and the other edges are connected. We use the remaining edges to train the robust results of node embedding vectors. We evaluated the effectiveness of our model based on typical AUC (Area Under A Curve) evaluation indexes. To verify the validity of our proposed model, we first delete the 10% edge on all network data sets. As shown in Table 4: 1. Compared with other algorithms, MSDA-NMF (V 3 ) improves the prediction performance by 10% compared to the optimal prediction model GAE and 64%.
2. Compared to the worst prediction model, MSDA-NMF (V 2 ) has a low prediction performance of compared to the optimal prediction model in the comparison model.
3. In the comparison data set of Orkut and GR-QC: (1) The proposed MSDA-NMF (V 2 ) and MSDA-NMF (V 3 ) models obtained the optimal prediction results.
(2) However, in the GR-QC data set, the accuracy of the MSDA-NMF (V 3 ) model was higher than that of the MSDA-NMF (V 2 ).
What is remarkable here is that in the Polblog data set, none of the methods proposed in this paper can obtain the optimal prediction effect. As shown in Table 2, the Polblog data set has a small number of pairs of categories, which makes the model unable to obtain better prediction results in hierarchical operation.
Specifically, to further verify the influence of the proportion of training data on the model, this paper conducted tests through Livejournal and Orkut data. The results in Figure 5 show that our method has a certain superiority over all the mainstream methods for removing different parts of edges in the two data sets. Because networks have different structural characteristics, the remaining edge of the Livejournal data set is close to the optimal level at 90%, and MSDA-NMF (V 2 ) obtains prediction results similar to those of MSDA-NMF (V 3 ). It can be seen that our method has an excellent performance in link prediction, indicating that network embedding results retain the structural characteristics of data sets.

Conclusions and Future Work
This paper introduces the multilayer structure NMF, a complex hierarchical structure to realize the characteristics of the input data by adding the regularization constraint for each layer, the essential feature of depicting feature transformation to obtain the data one by one, to further merge their depth due to the multilayer structure of the encoder MSDA NMF model. It can effectively improve the detection accuracy and prediction accuracy of social groups in the complex social management system. Eight popular models, NetMF, M-NMF, DeepWalk, Node2vec, LINE SDNE, AROPE and GAE were compared with eight real data sets to verify the robustness of the algorithm.
Although the proposed method (the data set presented in this paper) has a good detection effect, there are still some problems to be solved; for example, the results of the model are not optimal in the network with a large number of categories. Therefore, how to overcome the multistructure characteristics of multicategories in real networks and models simultaneously is the key to effectively improving the detection performance of the model in all data sets, and it is also challenging and necessary work [48]. We will take these factors into account in our future work.