A Useful Criterion on Studying Consistent Estimation in Community Detection

In network analysis, developing a unified theoretical framework that can compare methods under different models is an interesting problem. This paper proposes a partial solution to this problem. We summarize the idea of using a separation condition for a standard network and sharp threshold of the Erdös–Rényi random graph to study consistent estimation, and compare theoretical error rates and requirements on the network sparsity of spectral methods under models that can degenerate to a stochastic block model as a four-step criterion SCSTC. Using SCSTC, we find some inconsistent phenomena on separation condition and sharp threshold in community detection. In particular, we find that the original theoretical results of the SPACL algorithm introduced to estimate network memberships under the mixed membership stochastic blockmodel are sub-optimal. To find the formation mechanism of inconsistencies, we re-establish the theoretical convergence rate of this algorithm by applying recent techniques on row-wise eigenvector deviation. The results are further extended to the degree-corrected mixed membership model. By comparison, our results enjoy smaller error rates, lesser dependence on the number of communities, weaker requirements on network sparsity, and so forth. The separation condition and sharp threshold obtained from our theoretical results match the classical results, so the usefulness of this criterion on studying consistent estimation is guaranteed. Numerical results for computer-generated networks support our finding that spectral methods considered in this paper achieve the threshold of separation condition.

1. Introduction Estimating mixed memberships of network whose node may belong to multiple communities has received a lot of attention [7,9,47,22,6,8,30,38,48,27,34,35,36].To capture the structure of network with mixed memberships, [7] proposes the popular mixed membership stochastic blockmodel (MMSB), which is an extension of the famous stochastic blockmodels [24] for non-overlapping networks.It is well known that the degree corrected stochastic blockmodel (DCSBM) [29] is an extension of SBM by considering degree heterogeneity of nodes to fit the real world networks with various nodes degrees, similarly, [27] proposes a model named degree corrected mixed membership (DCMM) model as is an extension of MMSB by considering degree heterogeneity of nodes.There are alternative models based on MMSB such as the OCCAM model of [48] and the stochastic blockmodel with overlap (SBMO) proposed of [30] which can also model networks with mixed memberships.As discussed in Section 5, OCCAM equals DCMM while SBMO is a special case of DCMM.For these models, many researchers focus on designing algorithms with provable consistent theoretical guarantees.[33] studies the consistences of two spectral clustering algorithms under SBM and DCSBM.[36] designs an algorithm SPACL based on the finding that there exists simplex structure in the eigen-decomposition of the population adjacency matrix and studies SPACL's theoretical properties under MMSB.To fit DCMM, [27] designs Mixed-SCORE algorithm based on the finding that there exists a simplex structure in the entry-wise ratio matrix obtained from the eigen-decomposition of the population given in [18], Theorem 4.6 [12] and the first bullet in Section 2.5 [1].In this paper, n is the number of nodes in a network, A is the adjacency matrix, Ω is the expectation of A under some models, Are is a regularization of A, ρ is the sparsity parameter such that ρ ≥ max i,j Ω(i, j) and it controls the overall sparsity of a network, • denotes spectral norm, and ξ > 1.
model separation condition sharp threshold [35,36] using Are − Ω ≤ C √ ρn (original) MMSB&DCMM adjacency matrix, where the entry-wise ratio idea comes from [26] which designs the SCORE algorithm with theoretical guarantee under DCSBM.[35] finds the cone structure inherent in the normalization of eigenvectors of the population adjacency matrix under DCMM as well as OCCAM, and develops an algorithm to hunt corners in the cone structure.
In this paper, we focus on the consistency of spectral method in community detection.The study of consistency is developed by obtaining theoretical upper bound of error rate for a spectral method through analyzing the properties of the population adjacency matrix under statistical model.To compare consistencies of theoretical results under different models, it is meaningful to study that whether the separation condition of a balanced network and sharp threshold of the Erdös-Rényi (ER) random graph G(n, p) [18] obtained from upper bounds of theoretical error rates for different methods under different models are consistent or not.Meanwhile, separation condition and sharp threshold can also be seen as alternative unified theoretical frameworks to compare all methods and model parameters mentioned in the concluding remarks of [33].Furthermore, when Method a and Method b are designed under the framework of a same model, theoretical results about error rates developed for Method a should be consistent with those developed for Method b at least under mild conditions.Based on the three ideas, now we are ready to describe some phenomenons of inconsistency in community detection area.We find that the separation conditions of a balanced network obtained from the error rates developed in [27,36,35] under DCMM or MMSB are not consistent with that obtained from main results of [33] under SBM, sharp threshold obtained from main results of [36,35] do not match classical results.Meanwhile, though both [27] and [35] study the consistencies of their spectral algorithms under DCMM, their theoretical upper bounds of error rates do not match even under mild conditions.A summary of these inconsistencies are provided in Tables 1 and 2. Furthermore, after delicate analysis, we find that the requirement on network sparsity of [36,35] are stronger than that of [27,33], and [32] also finds that [36]'s requirement of network sparsity is sub-optimal.
For readers' convenience to have a better understanding of Tables 1 and 2, the definitions of separation condition, sharp threshold and alternative separation condition are given here.Consider a network with K communities and n nodes where sizes of each communities are in the same order, nodes have close degrees and K is small.Such network is called standard network (or balanced network) in this paper.In a standard network, nodes connect  √ ρn (original) MMSB&DCMM log ξ−0.5 (n) [35,36] using A − Ω ≤ C ρnlog(n) MMSB&DCMM log ξ (n) [33] using Are − Ω ≤ C √ ρn (original) SBM&DCSBM 1 log(n) [33] with probability p in within clusters and p out across clusters.When K ≥ 2, the lower bound requirement on |pin−pout| √ pin for consistent estimation of spectral methods is called separation condition; when K = 1 such that p = p in = p out , the network degenerates to Erdös-Rényi (ER) random graph G(n, p).The lower bound requirement on p for generating a connected ER random graph is sharp threshold.Let n .The alternative separation condition is defined as the lower bound requirement on |αin−αout| √ αin for consistent estimation when K ≥ 2.
The separation condition of a standard network under SBM has been studied in [37,33,36,41,40] for their spectral methods.Especially, [37] finds that for large enough constant c, spectral methods can exactly recover communities with high probability as n → ∞ if pin−pout √ pin ≥ c log(n) n (i.e., pin−pout √ pin ≫ log(n) n ) when K = 2 for the case p in > p out , and this condition is the same as requiring that αin−αout √ αin ≥ c (i.e., αin−αout √ αin ≫ 1).The sharp threshold of ER random graph G(n, p) has been studied in [18,12,1,41,40].Especially, [18] finds that the ER random graph is connected with high probability if p ≥ log (n)  n .Instead of showing or designing algorithms that can exactly recover labels with high probability when α in and α out are close to the above limits, we find that separation condition |pin−pout| √ pin ≫ log(n) n (or the alternative separation condition |αin−αout| √ αin ≫ 1) and sharp threshold p ≥ log (n)   n are useful to compare different spectral methods under various models.In this paper, we summarize the idea of using separation condition and sharp threshold to study the consistencies, compare the error rates and requirements on network sparsity of different spectral methods under different models as a four step criterion which we call separation condition and sharp threshold criterion (SCSTC for short).With an application of this criterion, this paper provides an attempt to answer the following questions: how the above inconsistency phenomenons occur, and how to obtain consistency results with weaker requirements on network sparsity of [36] and [35].To answer the two questions, we use the recent techniques on row-wise eigenvector deviation developed in [16] and [15] to obtain consistent theoretical results directly related with model parameters for the SPACL algorithm of [36] and the SVM-cone-DCMMSB algorithm of [35].The two questions are then answered by delicate analysis with an application of SCSTC to theoretical upper bounds of error rates in this paper and some previous spectral methods.The main contributions in this paper are as follows: (i) We summarize the idea of using separation condition of a standard network and sharp threshold of the ER random graph G(n, p) to study consistent estimations of different spectral methods designed via eigen-decomposition or singular value decomposition of the adjacency matrix or its variants under different models that can degenerate to SBM under mild conditions as a four step criterion SCSTC.The separation condition is used to study the consistency of theoretical upper bound for spectral method, and the sharp threshold can be used to study the network sparsity.Theoretical results of upper bounds for different spectral methods can be compared by SCSTC.Using this criterion, a few inconsistent phenomenons of some previous works are found.(ii) Under MMSB and DCMM, we study the consistencies of the SPACL algorithm proposed in [36] and its extended version using recent techniques on row-wise eigenvector deviation developed in [16,15].Compared with the original results of [36,35], our main theoretical results enjoy smaller error rates by lesser dependence on K and log(n).Meanwhile, our main theoretical results have weaker requirements on the network sparsity and the lower bound of the smallest nonzero singular value of the population adjacency matrix.
For detail, see Tables 3 and 4. (iii) Our results for DCMM are consistent with those for MMSB when DCMM degenerates to MMSB under mild conditions.Using SCSTC, under mild conditions, our main theoretical results under DCMM are consistent with that of [27].This answers the question that the phenomenon that main results of [36] and [35] do not match those of [27] occurs due to the fact [36] and [35]'s theoretical results of error rates are sub-optimal.We also find that our theoretical results (as well as that of [27]) under both MMSB and DCMM match classical results on separation condition and sharp threshold.Using the bound of A − Ω instead of A re − Ω to establish upper bound of error rate under SBM in [33], the separation condition of a standard network obtained from [33]'s error rate matches classical results, this answer the question that why separation condition obtained from error rate of [27] does not match that obtained from error rate of [33].Using A re − Ω or A − Ω influences the row-wise eigenvector deviations in Theorem 3.1 of [36] and Theorem I.3 of [35], therefore whether using A re − Ω or A − Ω influences the separation conditions and sharp thresholds of [35,36].For comparison, our bound on row-wise eigenvector deviation is obtained by using techniques developed in [16,15] and that of [27] is obtained by applying the modified Theorem 2.1 of [4], therefore whether using A re − Ω or A − Ω has no influences on separation conditions and sharp thresholds of ours and that of [27].
For detail, see Tables 1 and 2.
The article is organized as follows.In Section 2, we give formal introduction to the mixed membership stochastic blockmodel and review the algorithm SPACL considered in this paper.The theoretical results of consistency for mixed membership stochastic blockmodel are presented and compared to related works in Section 3.After delicate analysis, the separation condition and sharp threshold criterion is presented in Section 4. Based on an application of this criterion, improvement consistent estimation results for the extended version of SPACL under the degree corrected mixed membership model are provided in Section 5. Conclusion is given in Section 6.
Notations.We take the following general notations in this paper.Write [m] := {1, 2, . . ., m} for any positive integer m.For a vector x and fixed q > 0, x q denotes its l q -norm.We drop the subscript if q = 2 occasionally.For a matrix M , M ′ denotes the transpose of the matrix M , M denotes the spectral norm, M F denotes the Frobenius norm, M 2→∞ denotes the maximum l 2 -norm of all the rows of M , and M ∞ := max i j |M (i, j)| denotes the maximum absolute row sum of M .Let rank(M ) denote the rank of matrix M .Let σ i (M ) be the i-th largest singular value of matrix M , λ i (M ) denote the i-th largest eigenvalue of the matrix M ordered by the magnitude, and κ(M ) denote the condition number of M .M (i, :) and M (:, j) denote the i-th row and the j-th column of matrix M , respectively.M (S r , :) and M (:, S c ) denote the rows and columns in the index sets S r and S c of matrix M , respectively.For any matrix M , we simply use Y = max(0, M ) to represent Y ij = max(0, M ij ) for any i, j.For any matrix M ∈ R m×m , let diag(M ) be the m × m diagonal matrix whose i-th diagonal entry is M (i, i). 1 and 0 are column vectors with all entries being ones and zeros, respectively.e i is a column vector whose i-th entry is 1 while other entries are zero.In this paper, C is a positive constant which may vary occasionally.f (n) = O(g(n)) means there exists a constant c > 0 such that |f (n)| ≤ c|g(n)| holds for all sufficiently large n.
x y means there exists a constant c > 0 such that |x| ≥ c|y|.f

Mixed membership stochastic blockmodel
Let A ∈ {0, 1} n×n be a symmetric adjacency matrix such that A(i, j) = 1 if there is an edge between node i to node j, and A(i, j) = 0 otherwise.Mixed membership stochastic blockmodel (MMSB) [7] for generating A is as follows.
where Π ∈ R n×K is called the membership matrix with Π(i, k) ≥ 0 and K k=1 Π(i, k) = 1 for i ∈ [n] and k ∈ [K], P ∈ R K×K is an nonnegative symmetric matrix with max k,l∈[K] P (k, l) = 1 for model identifiability under MMSB, ρ is called the sparsity parameter which controls the sparsity of the network, and Ω ∈ R n×n is called the population adjacency matrix since E[A] = Ω.As mentioned in [27,36], σ K ( P ) is a measure of the separation between communities, and we call it separation parameter in this paper.ρ and σ K ( P ) are two important model parameters directly related with the separation condition and sharp criterion, and they will be considered throughout this paper.DEFINITION 2.1.Call model (2.1) the mixed membership stochastic blockmodel (MMSB), and denote it by M M SB n (K, P , Π, ρ).
Unless specified, we treat conditions (I1) and (I2) as default from now on.
For k ∈ [K], let I (k) be the set of pure nodes in community k such that , select one node from I (k) to construct the index set I, i.e., I is the indices of nodes corresponding to K pure nodes, one from each community.W.L.O.G., let Π(I, :) = I K where I K is the K × K identity matrix.Recall that rank(Ω) = K.Let Ω = U ΛU ′ be the compact eigen-decomposition of Ω such that U ∈ R n×K , Λ ∈ R K×K , and U ′ U = I K .Lemma 2.1 [36] gives that U = ΠU (I, :) and such form is called Ideal Simplex (IS for short) [27,36] since all rows of U form a K-simplex in R K and the K rows of U (I, :) are the vertices of the K-simplex.Given Ω and K, as long as we know U (I, :), we can exactly recover Π by Π = U U −1 (I, :) since U (I, :) ∈ R K×K is a full rank matrix.As mentioned in [27,36], for such IS, the successive projection (SP) algorithm [21] (i.e., Algorithm 3) can be applied to U with K communities to exactly find the corner matrix U (I, :).For convenience, set Z = U U −1 (I, :).Since Π = Z, we have Π(i, :) = Z(i,:) Z(i,:) 1 for i ∈ [n].Based on the above analysis, we are now ready to give the ideal SPACL algorithm.Input Ω, K. Output: Π.
• Run SP algorithm on the rows of U assuming that there are K communities to obtain I.
• Recover Π by setting Π(i, :) = Z(i,:) Z(i,:) 1 for i ∈ [n].With given U and K, since SP algorithm returns U (I, :), we see that the ideal SPACL exactly (for detail, see Appendix A) returns Π.Now, we review the SPACL algorithm of [36].Set Ã = Û Λ Û ′ be the top K eigendecomposition of A such that Û ∈ R n×K , Λ ∈ R K×K , Û ′ Û = I K , and Λ contains the top K eigenvalues of A. For the real case, use Ẑ, Π given in Algorithm 1 to estimate Z, Π, respectively.Algorithm 1 is the SPACL algorithm [36] where we only care about the estimation of the membership matrix Π, and omit the estimation of P and ρ.Meanwhile, Algorithm 1 is a directly extension of the ideal SPACL algorithm from oracle case to real case, and we omit the prune step in the original SPACL algorithm of [36].
Algorithm 1 SPACL [36] Input: The adjacency matrix A ∈ R n×n and the number of communities K. Output: The estimated n × K membership matrix Π.

Consistency under MMSB Our main result under MMSB provides an upper
bound on estimation error of each node's membership in terms of several model parameters.Throughout this paper, K is a known positive integer.Assume that (A1) ρn ≥ log(n).
Assumption (A1) provides a requirement on the lower bound of the sparsity parameter ρ such that it should be at least log(n)/n.Then we have the following lemma.LEMMA 3.1.Under M M SB n (K, P , Π, ρ), when Assumption (A1) holds, with probability at least 1 − o(n −α ) for any α > 0, we have In Lemma 3.1, instead of simply using a constant C α to denote , we keep the explicit form here.REMARK 3.2.When Assumption (A1) holds, the upper bound of A − Ω in Lemma 3.1 is consistent with Corollary 6.5 in [13] since Var(A(i, j)) ≤ ρ under M M SB n (K, P, Π, ρ).Lemma 3.1 is obtained via Theorem 1.4 (Bernstein inequality) in [45].For comparison, [36] applies Theorem 5.2 [33] to bound A − Ω (see, for example, Eq (14) of [36]) and obtains a bound as C √ ρn for some C > 0. However, C √ ρn is the bound between a regularization of A and Ω as stated in the proof of Theorem 5.2 [33], where such regularization of A is obtained from A with some constraints in Lemmas 4.1 and 4.2 of the supplement material [33] [33] and Theorem 2 [50] as long as ρ ≥ max i,j Ω(i, j) (here, let Ω = E[A] without considering models, a ρ satisfying ρ ≥ max i,j Ω(i, j) is also the sparsity parameter which controls the overall sparsity of a network).Note that A re is not Ã where Ã = Û Λ Û ′ is obtained by the top K eigen-decomposition of A, while A re is obtained by adding constrains on degrees of A, see Theorem 2 [50] for detail.
In [27,35,36], main theoretical results for their proposed membership estimating methods hinge on a row-wise deviation bound for the eigenvectors of the adjacency matrix whether under MMSB or DCMM.Different from the theoretical technique applied in Theorem 3.1 [36] which provides sup-optimal dependencies on log(n) and K, and needs sub-optimal requirements on the sparsity parameter ρ and the lower bound of σ K (Ω), to obtain the row-wise deviation bound for the singular eigenvector of Ω, we use Theorem 4.3.1 [16] and Theorem 4.2 [15].LEMMA 3.3.(Row-wise eigenspace error) Under M M SB n (K, P , Π, ρ), when Assumption (A1) holds, suppose σ K (Ω) ≥ C ρnlog(n), with probability at least 1 − o(n −α ), • when we apply Theorem 4.2.1 of [16], we have • when we apply Theorem 4.2 of [15], we have ).

Note that When
ρn ), therefore we simply let ̟ 2 be the bound since its form is slightly simpler than ̟ 1 .
Compared with Theorem 3.1 of [36], since we apply Theorem 4.2.1 of [16] and Theorem 4.2 of [15] to obtain the bound of row-wise eigenspace error under MMSB, our bounds do not rely on min(K 2 , κ 2 (Ω)) while Theorem 3.1 [36] does.Meanwhile, our bound in Lemma 3.3 is sharper with lesser dependence on K and log(n), has weaker requirements on the lower bounds of σ K (Ω), λ K (Π ′ Π) and the sparsity parameter ρ.The details are given below: • We'd emphasize that the bound of Theorem 3.1 of [36] for ξ > 1 where the function ψ is defined in Eq (7) of [36], and this is also pointed out by Table 2 of [32].The reason is: in the proof part of Theorem 3.1 [36], from their step (iii) to step (iv), they should keep the term log ξ (n) since this term is much larger than 1.And we can also find that bound in Theorem 3.1 [36] should multiply log ξ (n) from Theorem VI.1 [36] directly.For comparison, this bound O( ψ(Ω) ) is K 0.5 log ξ−0.5 (n) times than our bound in Lemma 3.3.Meanwhile, by the proof of the bound in Theorem 3.1 of [36], we see that the bound depends on the upper bound of A − Ω , and [36] applies Theorem 5.2 of [33] such that A re − Ω ≤ C √ ρn with high probability.Since C √ ρn is the upper bound of the difference between a regularization of A and Ω.Therefore, if we are only interested in bounding A − Ω instead of A re − Ω , the upper bound of Theorem 3.1 [36] should be O( ψ(Ω) √ Knlog ξ+0.5 (n) σK ( P )λ 1.5   K (Π ′ Π) ), which is at least K 0.5 log ξ (n) times than our bound in Lemma 3.3.Furthermore, the upper bound of the row-wise eigenspace error in Lemma 3.3 does not rely on the upper bound of A − Ω as long as σ K (Ω) ≥ C ρnlog(n) holds.Therefore, whether using A re − Ω ≤ C √ ρn or A − Ω ≤ C ρnlog(n) does not change the bound in Lemma 3.3.
• Since Ω = ρΠ P Π ′ ≤ Cρn by basic algebra, the lower bound requirement on σ K (Ω) in Assumption 3.1 of [36] gives that 4 gests that Theorem 3.1 [36] requires ρn ≥ Clog 2ξ (n), and this also matches with the requirement on ρn in Theorem VI.1 of [36] (and this is also pointed out by Table 1 of [32]).For comparison, our requirement on sparsity given in Assumption (A1) is If we further assume that ρn , which is consistent with the row-wise eigenvector deviation of [32]'s result shown in their Table 2. Next theorem gives theoretical bounds on the estimations of memberships under MMSB.THEOREM 3.4.Under M M SB n (K, P , Π, ρ), suppose conditions in Lemma 3.3 hold, there exists a permutation matrix P ∈ R K×K such that with probability at least 1 − o(n −α ), we have REMARK 3.5.(Comparison to Theorem 3.2 [36]) Consider a special case by setting We focus on comparing the dependencies on K in bounds of our Theorem 3.4 and Theorem 3.2 [36].Under this case, the bound of our Theorem 3.4 is proportional to K 2 by basic algebra; since 1) and the bound in Theorem 3.2 [36] in Eq (45) [36], the power of K is 2 by checking the bound of Theorem 3.2 [36].Meanwhile, note that our bound in Theorem 5.9 is l 1 bound while bound in Theorem 3.2 [36] is l 2 bound, when we translate the l 2 bound of Theorem 3.2 [36] into l 1 bound, the power of K is 2.5 for Theorem 3.2 [36].Hence, our bound in Theorem 3.4 has less dependence on K than that of Theorem 3.2 [36], and this is also consistent with the first bullet given after Lemma 3.3.
Table 3 summaries the necessary conditions and dependence on model parameters of rates in Theorem 3.4 and Theorem 3.2 [36] for comparison.The following corollary is obtained by adding conditions on model parameters similar as Corollary 3.1 in [36].

TABLE 3
Comparison of error rates between our Theorem 3.4 and Theorem 3.2 [36] under M M SBn(K, P , Π, ρ).The dependence on K is obtained when κ(Π ′ Π) = O(1).For comparison, we have adjusted the l 2 error rates of Theorem 3.2 [36] into l 1 error rates.Note that as analyzed in the first bullet given after Lemma 3.3, whether using √ ρn does not change our ̟, and has no influence on bound in Theorem 3.4.For [36]: using Are − Ω √ ρn, the power of log(n) in their Theorem 3.2 is ξ; using , with probability at least 1 − o(n −α ), we have REMARK 3.7.Consider a special case in Corollary 3.6 by setting σ K ( P ) as a constant, we see that the error bound O( log(n) ρn ) in Corollary 3.6 is directly related with Assumption (A1), and for consistent estimation, ρ should grow faster than log(n)  n .

Separation condition and sharp threshold criterion
√ ρn ), follow similar analysis, we see that the separation condition for [36] is √ ρn ), follow similar analysis, we see that the separation condition for [36] now is log ξ+0.5 (n) √ n . (d) For comparison, the error bound of Corollary 3.2 [33] built under SBM for community detection is O( 1 1) and K = O(1).Then follow similar analysis, we see that the separation condition for [33] should grow faster than 1 √ n .However, as we analyzed in the first bullet given after lemma 3.3, [33] applies A re − Ω ≤ C √ ρn to build their consistency results.Instead, we apply A − Ω ≤ C ρnlog(n) to built [33]'s theoretical results, the error bound of Corollary 3.2 [33] is O( log(n) σ 2 K ( P )ρn ), which returns same separation condition as ours Lemma 3.6 and [27]'s Theorem 2.2 now.As analyzed in the first bullet given after Lemma 3.3, whether using √ ρn does not change our error rates.By carefully analyzing the proof of 2.1 of [27], we see that whether using √ ρn also does not change their row-wise large deviation, hence it does not influence their upper bound of error rate for their Mixed-SCORE.
√ ρn ), then the alternative separation condition for [36] now and K = O(1).Follow similar analysis, the alternative separation condition for [33] is , the error bound of Corollary 3.2 [33] is O( log(n) σ 2 K ( P )ρn ), which returns same alternative separation condition as ours Lemma 3.6 and [27]'s Theorem 2.2 now.REMARK 4.1.A large body of literature in statistics and computer science [3,2,23,5,10] has focused on detecting communities of network with 2 equal size clusters under SBM, and finds that recovering the communities is possible when . This threshold can be achieved by semidefinite relaxations [2,23,5,10] and spectral methods with local refinements [3,19] .For our alternative separation condition αin−αout √ αin ≫ 1, though it has more rougher form than that of , it is useful in studying optimality of estimation and comparing the error rates of different spectral methods under different models, as shown in Table 2.
Since the error rate is O( 1 pn ), for consistent estimation, we see that p should grow faster than log(n) n , which is just the sharp threshold in [18], Theorem 4.6 [12], strongly consistent of [49], and the first bullet in Section 2.5 [1] (call the lower bound requirement of p for ER random graph to enjoy consistent estimation as sharp threshold).Since the sharp threshold is obtained when K = 1 which means a connected ER random graph G(n, p), and this is also consistent with the connectivity in Table 2 of [2].Meanwhile, since our Assumption (A1) requires ρn ≥ log(n), it gives that p should grow faster than log(n) n since p = ρ under G(n, p), which is consistent with the sharp threshold.Since [27]'s Theorem 2.2 enjoys same error rate as ours under the settings in Corollary 3.6, [27] also reaches the sharp threshold as log (n)  n .Furthermore, Remark 3.9 says that bound for error rate in Eq (3) [36] should be O( log ξ (n) σK ( P ) √ ρn ) when using A re − Ω ≤ C √ ρn, follow similar analysis, we see that the sharp threshold for [36] is , which is sub-optimal compared with ours.When using A − Ω ≤ C ρnlog(n), the sharp threshold for [36] is log 2ξ+1 (n) n .Similarly, the error bound of Corollary 3.2 [33] is O( 1 Hence, the sharp threshold obtained from the theoretical upper bound for error rates of [33] is 1  n , which does not match classical result.Instead, we apply A − Ω ≤ C ρnlog(n) with high probability to build [33]'s theoretical results, the error bound of Corollary 3.2 [33] is O( log(n) pn ), which returns the classical sharp threshold log(n) n now.Table 1 summaries the comparisons of separation condition and sharp threshold.Table 2 records the respective alternative separation condition.The delicate analysis given above supports our statement that the separation condition of a standard network and sharp threshold of ER random graph G(n, p) can be seen as unified criterions to compare theoretical results of spectral methods under different models.To conclude the above analysis, here we summarize the main steps to apply the separation condition and shrap threshold criterion (SCSTC for short) to check the consistency of theoretical results or compare results of spectral methods under different models, where spectral methods means methods developed based on the application of the eigenvectors or singular vectors of the adjacency matrix or its variants for community detection.The four-stage SCSTC is given below: step 1 Check whether the theoretical upper bound of error rate contains σ K ( P ), where the separation parameter σ K ( P ) always appears when considering the lower bound of σ K (Ω).
If it contains σ K ( P ), move to the next step.Otherwise, it suggests possible improvements for the consistency by considering σ K ( P ) in the proofs.step 2 Let the number of communities as O(1) and the network degenerate to standard network whose numbers of nodes in each community are in the same order and can been seen as O( n K ).Let the model degenerate to SBM and then obtain the newly theoretical upper bound of error rate.Note that if the model does consider degree heterogeneity, the sparsity parameter ρ should be considered in the theoretical upper bound of error rate in step 1 .If the model considers degree heterogeneity, when it degenerates to SBM, ρ appears at this step.Meanwhile, if ρ is not contained in the error rate of step 1 when the model does not consider degree heterogeneity, it suggests possible improvements by considering ρ. step 3 Let P = ωI K + (1 − ω)11 ′ for 0 < ω < 1 (note that σ K ( P ) = ω), set P = ρ P as the probability matrix when the model degenerates to SBM.Next compute the lower bound requirement of ω for consistency estimation through analyzing the newly bound obtained in the last step (note that, we have p in = ρ, p out = ρ(1 − ω) and p in − p out = ρω under the above settings of SCSTC).Compute the separation condition |pin−pout| √ pin = ω √ ρ using the lower bound requirement for ω.The sharp threshold for ER random graph G(n, p) is obtained from the lower bound requirement on ρ for consistency estimation under the setting that K = 1, σ K ( P ) = 1 and p = ρ.step 4 Compare the separation condition and sharp threshold obtained in the last step with the classical results in Corollary 1 of [37] and the first bullet in Section 2.5 [1] (or our results given in Table 1), respectively.If the sharp threshold ≫ log(n) n or separation condition ≫ log(n) n , then this leaves improvements on the network sparsity or theoretical upper bound of error rate.If the sharp threshold is log(n) n and the separation condition is log (n)  n , the optimality of theoretical results on both error rates and requirement of network sparsity is guaranteed.Finally, if the sharp threshold ≪ log(n) n or separation condition ≪ log (n)  n , this suggests that the theoretical result is obtained based on A re − Ω instead of A − Ω .
Below remarks gives some explanations on the four steps of SCSTC.REMARK 4.2.
• In step 1 , we give a few examples.When applying SCSTC to the main results of [39,44,48], we stop at step 1 as analyzed in Remark 4.3, suggesting possible improvements by considering σ K ( P ) for these works.Meanwhile, for theoretical result without considering σ K ( P ), we can also move to step 2 to obtain the newly theoretical upper bound of error rate which is related with ρ and n.Discussions on theoretical upper bounds of error rates of [26,46] given in Remark 4.3 [39,26,33], setting Θ = √ ρI makes DCMM and DCSBM degenerates to SBM when all nodes are pure; similar for the ScBM and DCScBM considered in [44,46,50,41], the OCCAM model of [48], the stochastic blockmodel with overlap proposed in [30], the BiMMSB model in [42], the DiDCMM model in [40], the extensions of SBM and DCSBM for hypergraph networks considered in [20,31,17], and so forth.Meanwhile, when we say that a model degenerates to SBM, we means that the model can degenerates to a special case of SBM and do not mean that it can exactly degenerate to SBM.For example, the DCM M n (K, P , Π, Θ) considered in next Section 5, it requires P has unit diagonal entries, which suggests that all diagonal entries of P considered in step 3 should be the same while SBM can model network whose probability matrix has various entries.
• In step 3 , the probability matrix P has diagonal entries p in and non-diagonal entries p out .When p in > p out , such P is always full rank, and it is considered by various models (to name a few, DCSBM [29], MMSB [7], OCCAM [48], DCMM [27], ScBM and DCScBM [44], BiMMSB [42], DiDCMM [40], and so forth.)that can degenerate to SBM.Meanwhile, P is set such that it has unit diagonals and ω ∈ (0, 1] as off-diagonals because we have assumed the maximum entry of P is 1 under MMSB for model identifiability.Actually, for the case that P has unit diagonals and β − 1 > 1 as off diagonals such that P 's diagonal entries are smaller than non-diagonal entries, we can also obtain similar separation condition, see discussions after Corollary 5.11.Sure, in step 3 and step 4 , the separation condition can be replaced by alternative separation condition.Furthermore, when we say "a model degenerates to SBM", we do not mean that the model can degenerate to SBM exactly.Instead, we mean that when a SBM models a network generated by the above P , the model can degenerate to such SBM.
The above analysis shows that SCSTC can be used to study the consistent estimation of model based spectral methods.Use SCSTC, the following remark lists a few works whose main theoretical results leave possible improvements.REMARK 4.3.The unknown separation condition, or sub-optimal error rates, or a lack of requirement of network sparsity of some previous works, suggest possible improvements of their theoretical results.Here, we list a few works whose main results can be possibly improved until considering separation condition.
• Theorem 4.4 of [39] proposes upper bound of error rate for their regularized spectral clustering algorithm RSC under DCSBM.However, since [39] does not study the lower bound (in [39]'s language) of λ K and m, we can not directly obtain separation condition from their main theorem.Meanwhile, main result of [39] does not consider the requirement on the network sparsity, which leaves some improvements.• [43] and [28] study two algorithms designed based on Laplcaian matrix and its regularized version under SBM.They obtain meaningful results, but do not consider the network sparsity parameter ρ and separation parameter σ K ( P ).• Theorem 2.2 of [26] provides upper bound of their SCORE algorithm under DCSBM.
However, since they does not consider the influence of σ K ( P ), we can not directly obtain separation condition from their main result.Meanwhile, by setting their Θ = √ ρI, then DCSBM degenerates to SBM, which gives that their ) by their assumption Eq (2.9).Hence, when Θ = √ ρI, upper bound of Theorem 2.2 in [26] is Since the upper bound of error rate in Corollary 3.2 of [33] is O( log(n) ρn ) when using A − Ω ≤ C ρnlog(n) under the setting that κ(Π) = O(1), K = O(1) and σ K ( P ) = O (1).We see that log 3 (n) ρ 2 n grows faster than log(n) ρn , which suggests that there leaves space to improve main result of [26] in the aspects of separation condition and error rates.
• [44] proposes two models ScBM and DCScBM to model directed networks and an algorithm DiSIM based on directed regularized Laplacian matrix to fit DCScBM.However, similar as [39], their main theoretical result in their Theorem C.1 does not consider the lower bound of (in [44]'s language) σ K , m y , m z and γ z , which causes that we can not obtain separation condition when DCScBM degenerates to SBM.Meanwhile, their Theorem C.1 also lacks a lower bound requirement on network sparsity.Hence, there leaves space to improve [44]'s theoretical guarantees.• [46] mainly studies the theoretical guarantee for the D-SCORE algorithm proposed by [25] to fit a special case of DCScBM model for directed networks.By setting their θ(i) = √ ρ, δ(j) = √ ρ for i, j ∈ [n], then their directed-DCBM degenerates to SBM.Meanwhile, since their ), which matches that of [33] under SBM when setting T n as a constant.However, if setting T n as log(n), then the error rate is O( log 3 (n) ρn ), which is sub-optimal compared with that of [33].Meanwhile, similar as [26], [46]'s main result does not consider the influences of K and σ K ( P ), causing a lack of separation condition.Hence, main results of [46] can be improved by considering K, σ K (P ), or a more optimal choice of T n to make their main results be comparable with that of [33] when directed-DCBM degenerates to SBM.

Degree corrected mixed membership model
Using SCSTC to Theorem 3.2 of [35], as shown in Tables 1 and 2, results in Theorem 3.2 [35] are sub-optimal.To obtain improvement theoretical results, we give a formal introduction of the degree corrected mixed membership (DCMM) model proposed in [27] first, then we review the SVM-cone-DCMMSB algorithm of [35] and provide improvement theoretical results.A DCMM for generating A is as follows.
where Θ ∈ R n×n is a diagonal matrix whose i-th diagonal entry is the degree heterogeneity of node i for i ∈ Note that if we set Π = ΘΠ and choose Θ such that Π ∈ {0, 1} n×K , then we have Ω = Π P Π′ , which means that the stochastic blockmodel with overlap (SBMO) proposed in [30] is just a special case of DCMM.Meanwhile, if we write Θ as Θ = ΘD o where Θ, D o are two positive diagonal matrices and let Π o = D o Π, then we can choose D 0 such that Π o (i, :) F = 1.By Ω = ΘΠ P Π ′ Θ = ΘΠ o P Π ′ o Θ, we see that the OCCAM model proposed in [48] equals DCMM model actually.By Eq (1.3) and Proposition 1.1 of [27], the following conditions are sufficient for the identifiability of DCMM, when θ max Pmax ≤ 1, • (II1) rank( P ) = K and P has unit diagonals.• (II2) There is at least one pure node for each of the K communities.
Note that though diagonal entries of P are ones, Pmax may be larger than 1 as long as θ max Pmax ≤ 1 under DCMM, and this is slightly different as the setting that max k,l∈[K] P (k, l) = 1 under MMSB.Similar as Eq (2.14) [27], let Pmax ≤ C for convenience.Meanwhile, from Condition (II1), though DCMM is an extension of SBM,MMSB and DCSBM, it can only model networks whose probability has equal positive entries.
Without causing confusion, under DCM M n (K, P , Π, Θ), we still let Ω = U ΛU ′ be the top-K eigen value decomposition of Ω such that U ∈ R n×K , Λ ∈ R K×K and U ′ U = I K .Set U * ∈ R n×K by U * (i, :) = U (i,:) U (i,:) F and let N U ∈ R n×n be a diagonal matrix such that Then U * can be rewritten as U * = N U U .The existence of the Ideal Cone (IC for short) structure inherent in U * mentioned in [35] is guaranteed by the following lemma.U N M is an n × n positive diagonal matrix, we have With given Ω and K, we can obtain U, U * and Λ.The above analysis shows that once U * (I, : ) is known, we can exactly recover Π by Eq. ( 5.3) and Eq.(5.4).From Lemma 5.2, we know that U * = Y U * (I, :) forms the IC structure.[35] proposes SVM-cone algorithm (i.e., Algorithm 4) which can exactly obtain U * (I, :) from the Ideal Cone U * = Y U * (I, :) with inputs U * and K.
Based on the above analysis, we are now ready to give the ideal SVM-cone-DCMMSB algorithm.Input Ω, K. Output: Π.
, where N U is an n × n diagonal matrix whose i-th diagonal entry is Z * (i,:) 1 for i ∈ [n].With given U * and K, since SVM-cone algorithm returns U * (I, :), the ideal SVM-cone-DCMMSB exactly (for detail, see Appendix A) returns Π.Now, we review the SVM-cone-DCMMSB algorithm of [35], where this algorithm can be seen as an extension of SPACL designed under MMSB to fit DCMM.For the real case, use Ŷ * , Ĵ * , Ẑ * , Π * given in Algorithm 2 to estimate Y * , J * , Z * , Π, respectively.
Output: The estimated n × K membership matrix Π * .
Apply SVM-cone algorithm (i.e., Algorithm 4) on the rows of Û * assuming there are K communities to obtain Î * , the index set returned by SVM-cone algorithm.When all nodes are pure, DCMM degenerates to DCSBM [29], then the upper bound of A − Ω in Lemma 5.3 is also consistent with Lemma 2.2 of [26].Meanwhile, this bound is also consistent with Eq (6.34) in the first version of [27] which also applies the Bernstein inequality to bound A − Ω .However, the bound is C θ max θ 1 in Eq (C.53) of the latest version for [27] which applies Corollary 3.12 and Remark 3.13 of [11] to obtain the bound.Though the bound in Eq (C.53) of the latest version for [27] is sharper by a log(n) term, corollary 3.12 of [11] has constraints on W (i, j) (here, W = A − Ω) such that W (i, j) can be written as W (i, j) = ξ ij b ij , where {ξ i,j : i ≥ j} are independent symmetric random variables with unit variance and {b i,j : i ≥ j} are given scalars, see the proof of Corollary 3.12 [11] for detail.Therefore, without causing confusion, we also use A re to denote the constraint A used in [27] such that A re − Ω ≤ C θ max θ 1 .Furthermore, if we set ρ ≥ max i,j Ω(i, j) such that ρ ≥ θ 2 max , the bound in Lemma 5.3 also equals A − Ω ≤ C ρnlog(n) and the assumption (A2) reads Pmax ρn ≥ log(n).The bound A re − Ω ≤ C θ max θ 1 in Eq (C.53) of [27] reads A re − Ω|| ≤ C √ ρn.
REMARK 5.7.(Comparison to Theorem I.3 [35]) Note that the ρ in [35] is θ 2 max , which gives that the row-wise eigenspace concentration in Theorem I.3 [35] ) ).Since by Lemma II.1 of [35] and σ K (Ω) ≥ θ 2 min σ K ( P )λ K (Π ′ Π) by the proof of Lemma 5.5, we see that the upper bound of Theorem I.3 [35] is O( times than our ̟ 2 .Again, Theorem I.3 [35] has stronger requirements on the sparsity of θ max θ 1 and the lower bound of σ K (Ω) than our Lemma 5.5.When using the bound of A − Ω in our Lemma 5.3 to obtain the row-wise eigenspace concentration in Theorem I.3 [35], their upper bound is √ Klog ξ (n) times than our ̟ 2 .Similar as the first bullet given after Lemma 3.3, whether using REMARK 5.8.(Comparison to Lemma 2.1 [27]) The fourth bullet of Lemma 2.1 [27] is the row-wise deviation bound for the eigenvectors of the adjacency matrix under some assumptions translated to our κ(Π ′ Π) = O(1) , Assumption (A2) and lower bound requirement on σ K (Ω) since they applies Lemma C.2 [27].The row-wise deviation bound in the fourth bullet of Lemma 2.1 [27] reads O( ), where the denominator is F instead of our θ 3 min σ K ( P )λ 1.5 K (Π ′ Π) due to the fact that [27] uses σK ( P ) θ 2 F K to roughly estimate σ K (Ω) while we apply θ 2 min σ K ( P )λ K (Π ′ Π) to strictly control the lower bound of σ K (Ω).Therefore, we see that the row-wise deviation bound in the fourth bullet of Lemma 2.1 [27] is consistent with our bounds in Lemma 5.5 when κ(Π ′ Π) = O(1) while our row-wise eigenspace errors in Lemma 5.5 are more applicable than that of [27] since we do not need to add constraint on Π ′ Π such that κ(Π ′ Π) = O(1).The upper bound of A − Ω of [27] is C θ max θ 1 given in their Eq (C.53) under DCM M n (K, P , Π, Θ), while ours is C θ max θ 1 log(n) in Lemma 5.3, since our bound of row-wise eigenspace error in Lemma 5.5 is consistent with the fourth bullet of Lemma 2.1 [27], this supports the statement that the row-wise eigenspace error does not rely on A − Ω given in the first bullet after Lemma 3.3.
Let π min = min 1≤k≤K 1 ′ Πe k , where π min measures the minimum summation of nodes belong to a certain community.Increasing π min makes the network tend to be more balanced, vice verse.Meanwhile, the term π min appears when we propose a lower bound of η defined in Lemma C.2 to keep track of model parameters in our main theorem under DCM M n (K, P , Π, Θ).Next theorem gives theoretical bounds on estimations of memberships under DCMM.THEOREM 5.9.Under DCM M n (K, P , Π, Θ), suppose conditions in Lemma 5.5 hold, there exists a permutation matrix P * ∈ R K×K such that with probability at least 1 − o(n −α ), we have For comparison, Table 4 summaries the necessary conditions and dependence on model parameters of rates for Theorem 5.9 and Theorem 3.2 [35], where the dependence on K and log(n) are analyzed in Remark 5.10 given below.TABLE 4 Comparison of error rates between our Theorem 5.9 and Theorem 3.2 [35] under DCM Mn(K, P, Π, Θ).The dependence on K is obtained when κ(Π ′ Π) = O(1).For comparison, we have adjusted the l 2 error rates of Theorem 3.2 [35] into l 1 error rates.Since Theorem 5.9 enjoys the same separation condition and sharp threshold as Theorem 3.4, and Theorem 3.2 [35] enjoys the same separation condition and sharp threshold as Theorem 3.2 [36], we do not report them in this table.Note that as analyzed in Remark 5.7, whether using A − Ω ≤ C θmax θ 1 log(n) or Are − Ω ≤ C θmax θ 1 does not change our ̟ under DCMM, and has no influence results in Theorem 5.9.For [35]: using Are − Ω θmax θ 1 , the power of log(n) in their Theorem 3.2 is ξ; using A − Ω θmax θ 1 log(n), the power of log(n) in their Theorem 3.2 is ξ + 0.5.
REMARK 5.10.(Comparison to Theorem 3.2 [35]) Our bound in Theorem 5.9 is written as combinations of model parameters and Π can follow any distribution as long as Condition (II2) holds where such model parameters related form of estimation bound is convenient for further theoretical analysis, see Corollary 5.11, while bound in Theorem 3.2 [35] is built when Π follows a Dirichlet distribution and κ(Π ′ Θ 2 Π) = O(1).Meanwhile, since Theorem 3.2 [35] applies Theorem I.3 [35] to obtain the row-wise eigenspace error, bound in Theorem 3.2 [35] should multiple log ξ (n) by Remark 5.7, and this is also supported by the fact that in the proof of Theorem 3.1 [35], when computing bound of ǫ 0 (in [35]'s language), [35] ignores the log ξ (n) term.
Consider a special case by setting √ ρ, where such case matches the setting κ(Π ′ Θ 2 Π) = O(1) in Theorem 3.2 [35].Now we focus on analysing the powers of K in our Theorem 5.9 and Theorem 3.2 [35].Under this case, the power of K in the estimation bound of our Theorem 5.9 is 6 by basic algebra; since min(K 2 , κ 2 (Ω)) = min(K 2 , O(1)) = O(1), where η in Lemma C.2 follows same definition as that of Theorem 3.2 [35], and the bound in Theorem 3.2 [35] should multiply √ K because (in [35]'s language) ( ŶC Ŷ ′ C ) −1 F should be no larger than instead of in the proof of Theorem 2.8 [35], the power of K is 6 by checking the bound of Theorem 3.2 [35].Meanwhile, note that our bound in Theorem 5.9 is l 1 bound while bound in Theorem 3.2 [35] is l 2 bound, when we translate the l 2 bound of Theorem 3.2 [35] into l 1 bound, the power of K is 6.5 for Theorem 3.2 [35], suggesting that our bound in Theorem 5.9 has less dependence on K than that of Theorem 3.2 [35].
The following corollary is obtained by adding some conditions on model parameters.COROLLARY 5.11.Under DCM M n (K, P , Π, Θ), when conditions of Lemma 5.5 hold, suppose , with probability at least 1 − o(n −α ), we have Meanwhile, when ), this P is the standard setting considered for separation condition in Section 4. Instead, we consider the case that β ∈ (2, ∞) here.Then we have σ K ( P ) = β − 2, and it should grow faster than log(n) ρn for consistent estimation.Set P = ρP as the probability matrix for such P , we have p out = ρ(β − 1), p in = ρ, and the diagonal entries of P are p in (note that p out > p in when β > 2).
Since p in = ρ, we see that pout−pin √ pin should grow faster than log(n) n for consistent estimation.

For the alternative separation condition, set
(note that α in < α out when β > 2), and we have For consistent estimation, (α out − α in ) log(n) n should grow faster than ρlog(n) n , which gives that αout−αin √ αin ≫ 1. Follow similar analysis as that of separation condition and alternative separation condition, we obtain results in Tables 1 and 2.

Conclusion
In this paper, the four step separation condition and sharp threshold criterion SCSTC is summarized as a unified framework to study consistencies and compare theoretical error rates of spectral methods under models that can degenerate to SBM in community detection area.With an application of this criterion, we find some inconsistent phenomena of a few previous works.Especially, using SCSTC we find that the original theoretical upper bounds on error rates of the SPACL algorithm under MMSB and its extended version under DCMM are sub-optimal at error rates and requirements on network sparsity.To find how the inconsistent phenomena occur, we re-establish theoretical upper bounds of error rats for both SPACL and its extended version by using recent techniques on row-wise eigenvector deviation.The resulting error bounds explicitly keep track of seven independent model parameters (K, ρ, σ K ( P ), λ K (Π ′ Π), λ 1 (Π ′ Π), θ min , θ max ), which allow us to have further delicate analysis.Compared with the original theoretical results, ours have smaller error rates with lesser dependence on K and log(n), weaker requirements on the network sparsity and the lower bound of the smallest nonzero singular value of population adjacency matrix under both MMSB and DCMM.For DCMM, we have no constraint on the distribution of the membership matrix as long as it satisfies the identifiability condition.When considering the separation condition of a standard network and the probability to generate a connected Erdös-Rényi (ER) random graph by using SCSTC, our theoretical results match classical results.Meanwhile, our theoretical results also match that of Theorem 2.2 [27] under mild conditions, and when DCMM degenerates to MMSB, theoretical results under DCMM are consistent with those under MMSB.Using the SCSTC criterion, we find that reasons behind the inconsistent phenomena are the sup-optimality of the original theoretical upper bounds on error rates for SPACL as well as its extended version, and whether using a regularization version of the adjacency matrix when builds theoretical results for spectral methods designed to detect nodes labels for non-mixed network.The processes of finding these inconsistent phenomena, sub-optimality theoretical results on error rates and the formation mechanism of these inconsistent phenomena, guarantee the usefulness of the SCSTC criterion.As shown by Remark 4.3, theoretical results of some previous works can be improved by applying this criterion.A limitation of this criterion is, it is only used for studying the consistency of spectral methods for a standard network with constant number of communities.It would be interesting to develop a more general criterion that can study consistency of all methods besides spectral methods and models besides those can degenerate to SBM for non-standard network with large K.

APPENDIX A: VERTEX HUNTING ALGORITHMS
The SP algorithm is written as below.

6:
k=k+1.8: end while Based on Algorithm 3, the following theorem is Theorem 1.1 in [21], and it is also the Lemma VII.1 in [36].This theorem provides bound between the corner matrix S sp and its estimated version returned by letting Y sp as input of SP algorithm when M ′ sp S ′ sp enjoys the ideal simplex structure.THEOREM A.1.Fix m ≥ r and n ≥ r.Consider a matrix Y sp = S sp M sp + Z sp , where S sp ∈ R m×r has a full column rank, M sp ∈ R r×n is a nonnegative matrix such that the sum of each column is at most 1, and , where σ min (S sp ) and κ(S sp ) are the minimum singular value and condition number of S sp , respectively.If we apply the SP algorithm to columns of Y sp , then it outputs an index set K ⊂ {1, 2, . . ., n} such that |K| = r and max 1≤k≤r min j∈K S sp (:, k) − Y sp (:, j) F = O(ǫκ 2 (S sp )), where S sp (:, k) is the k-th column of S sp .
For the ideal SPACL algorithm, since inputs of the ideal SPACL are Ω and K, we see that the inputs of SP algorithm are U and ), and M sp = Π ′ .Then, we have max i∈[n] U (i, :) − U (i, :) F = 0.By Theorem A.1, SP algorithm returns I up to permutation when the input is U assuming there are K communities.Since U = ΠU (I, :) under M M SB n (K, P, Π, ρ), we see that U (i, :) = U (j, :) as long as Π(i, :) = Π(j, :).Therefore, though I may be different up to permutation, U (I, :) is unchanged.Therefore, follow the four steps of the ideal SPACL algorithm, we see that it exactly returns Π.
Algorithm 4 below is the SVM-cone algorithm provided in [35].As suggested in [35], we Algorithm 4 SVM-cone [35] Input: Ŝ ∈ R n×m with rows have unit l 2 norm, number of corners K, estimated distance corners from hyperplane γ.Output: The near-corner index set Î.
1: Run one-class SVM on Ŝ(i, :) to get ŵ and b 2: Run K-means algorithm to the set { Ŝ(i, :)| Ŝ(i, :) ŵ ≤ b + γ} that are close to the hyperplane into K clusters 3: Pick one point from each cluster to get the near-corner set Î can start γ = 0 and incrementally increase it until K distinct clusters are found.Meanwhile, for the ideal SVM-cone-DCMMSB algorithm, when setting U * and K as the inputs of the SVM-cone algorithms, since U * − U * 2→∞ = 0, Lemma F.1.[35] guarantees that SVMcone algorithm returns I up to permutation.Since U * = Y U * (I, :) by Lemma 5.2 under DCM M n (K, P, Π, Θ), we have U * (i, :) = U * (j, :) when Π(i, :) = Π(j, :) by basic algebra, which gives that U * (I, :) is unchanged though I may be different up to permutation.Therefore, the ideal SVM-cone-DCMMSB exactly recovers Π. PROOF.We apply Theorem 1.4 (Bernstein inequality) in [45] to bound A − Ω , and this theorem is written as below THEOREM B.1.Consider a finite sequence {X k } of independent, random, self-adjoint matrices with dimension d.Assume that each random matrix satisfies Then, for all t ≥ 0, ), Let e i be an n × 1 vector, where e i (i) = 1 and 0 elsewhere,for i ∈ [n].For convenience, set W = A − Ω.Then we can write W as W = n i=1 n j=1 W (i, j)e i e ′ j .Set W (i,j) as the n × n matrix such that W (i,j) = W (i, j)(e i e ′ j + e j e ′ i ), which gives W = 1≤i<j≤n W (i,j) where E[W (i,j) ] = 0 and For the variance parameter Next we bound σ 2 as below ρnlog(n) for any α > 0, combine Theorem B.1 with σ 2 ≤ ρn, R = 1, d = n, we have where we have used Assumption (A1) such that ≤ O(1) holds by Assumption (A1) where µ is the incoherence parameter defined as µ = n U 2 2→∞
Note that when λ K (Π ′ Π) = O( n K ), the above bound turns to be C , which is consistent with that of Eq (B.1).Also note that this bound [36] by Assumption (A1).
Similar as the proof of Lemma 3.3, we have , where the last inequality holds since σ if we use Theorem 4.2 of [15].
C.4.Proof of Theorem 5.9 and Π * (i, :) = Ẑ * (i,:) Ẑ * (i,:) 1 , where N M and M are defined in the proof of Lemma 5.2 such that . Now, we provide a lower bound of e ′ i Z * 1 as below Therefore, by Lemma C.4, we have C.5.Proof of Corollary 5.11 PROOF.Under conditions of Corollary 5.11, we have ), which gives that By basic algebra, this corollary follows.
where x is a K × 1 vector whose l 2 norm is 1.Then, for i ∈ [n], we have where we use the fact that min i Π(i, :) F ≥ 1 √ K since K k=1 Π(i, k) = 1 and all entries of Π are nonnegative.
Since U * = N U U , we have where we set N U,max = max i∈[n] N U (i, i) and we have used the facts that N U , Θ are diagonal matrices, and

C.7. Bounds between Ideal SVM-cone-DCMMSB and SVM-cone-DCMMSB
Next lemma focus on the 2nd step of SVM-cone-DCMMSB and is the corner stone to characterize the behaviors of SVM-cone-DCMMSB.LEMMA C.3.Under DCM M n (K, P , Π, Θ), when conditions of Lemma 5.5 hold, there exists a permutation matrix P * ∈ R K×K such that with probability at least 1 − o(n −α ), we have where U * ,2 = U * U ′ , Û * ,2 = Û * Û ′ , i.e., U * ,2 , Û * ,2 are the row-normalized versions of U U ′ and Û Û ′ , respectively.PROOF.Lemma G.1. of [35] says that using Û * ,2 as input of the SVM-cone algorithm returns same result as using Û * as input.By Lemma F.1 of [35], there exists a permutation matrix P * ∈ R K×K such that where the last inequality holds by Lemma C. where the last inequality holds by the proof of Lemma 5.5.Then we have ).
Combine the above results, we have ).

model alternative separation condition Ours using Are − Ω ≤ C √ ρn MMSB&DCMM 1
Ours using A − Ω ≤ C ρnlog(n) MMSB&DCMM 1 [27] using Are − Ω ≤ C √ ρn (original) DCMM 1 [27] using A − Ω ≤ C ρnlog(n) DCMM 1 [35, 36] using Are − Ω ≤ C After obtaining the Corollary 3.6 under MMSB, now we are ready to give our criterion after introducing separation condition of a standard network and sharp threshold of ER random graph G(n, p) in this section.Separation condition.Consider a standard network by setting P = ωI K + (1 − ω)11 ′ for ω ∈ (0, 1] (we have σ K ( P ) = ω) under the settings of Corollary 3.6.Note that we have Ω = Πρ P Π ′ = Π ′ P Π ′ , where P = ρ P and P is the probability matrix.For convenience, set p in = ρ, p out = ρ(1 − ω) (note that we have p in > p out when ω ∈ (0, 1].).(a) Under such P and settings in Corollary 3.6, since the error rate is O( 1 ω log(n) ρn ), to obtain consistency estimation, ω should grow faster than log(n) ρn .Therefore, the separation condition |pin−pout| √ pin = ω √ ρ (also known as relative edge probability gap) should grow faster than log(n) n we can obtain an alternative version of separation condition αin−αout √ αin such that if αin−αout √ αin ≫ 1, recovering the memberships for with high probability is possible, and vice verse.In this paper, we call |αin−αout| √ αin as alternative separation condition.Now we provide the details.Since p in = α in log(n) n = ρ and p out = α out log(n) n = ρ(1 − ω), we have p in − p out = (α in − α out ) log(n) n = ρω and ρ = α in log(n) n .(a') Under such P and settings in our Corollary 3.6, since the error rate is O( 1 ω log(n) ρn ), for consistent estimation, ω should grow faster than log(n) ρn .Hence, ρω should grow faster than ρlog
APPENDIX B: PROOF OF CONSISTENCY UNDER MMSB B.1.Proof of Lemma 3.1

TABLE 1
[37]arison of separation condition and sharp threshold.Details of this table are given in Section 4. The classical result on separation condition given in Corollary 1 of[37]is

TABLE 2
Comparison of alternative separation condition.
. Meanwhile, Theorem 2 [50] also gives that the bound between a regularization of A and Ω is C √ ρn where such regularization of A should also satisfy few constraints on A, see Theorem 2 [50] for detail.Instead of bounding the difference between a regularization of A and Ω, we are interested in bounding A − Ω by Bernstein inequality which has no constraints on A. For convenience, use A re to denote the regularization of A in this paper.Hence, A re − Ω ≤ C √ ρnwith high probability, and this bound is model independent as shown by Theorem 5.2 are examples of this case.• In step 2 , letting K = O(1) and the network be balanced can always simplify the theoretical upper bound of error rate, as shown by our Corollaries 3.6 and 5.11.Here, we provide some examples about how to make a model degenerate to SBM.For M M SB n (K, P , Π, ρ) in this paper, when all nodes are pure, MMSB degenerates to SBM; for the DCM M n (K, P , Π, Θ) model introduced in Section 5 or DCSBM considered in Lemma 5.5holds naturally.By the proof of Lemma 5.5, σ K (Ω) has a lower bound θ 2 min σ K ( P )λ K (Π ′ Π) = O(θ 2 min σ K (P )n).To make the requirement σ K (Ω) ≥ Cθ max Pmax nlog(n) always hold, we just need θ 2 min σ K ( P )n ≥ Cθ max Pmax nlog(n), and it gives σ K ( P ) ≥ C log(n) ρn , which matches the requirement of consistent estimation in Corollary 5.11.Using SCSTC to Corollary 5.11, let Θ = √ ρI such that DCMM degenerates to MMSB, it is easy to see that bound in Lemma 5.11 is consistent with that of Lemma 3.6.Therefore, separation condition, alternative separation condition and sharp threshold obtained from Corollary 5.11 for the extended version of SPACL under DCMM are consistent with classical results, as shown in Tables 1 and 2. Meanwhile, when θ max = O( our bound in Corollary 5.11.Consider a mixed membership network under the settings of Corollary 5.11 when Θ = √ ρI such that DCMM degenerates to SBM.By Corollary 5.11, σ K ( P ) should grow faster than log(n) ρn .We further assume that P = (2 − β)I K + (β − 1)11 ′ for β ∈ [1, 2) ∪ (2, ∞), we see that this P with unit diagonals and β − 1 as non-diagonal entries still satisfies Condition (II1).Meanwhile, σ K 2. Proof of Lemma 3.3 PROOF.Let H = Û ′ U , and H = U H Σ H V ′ H be the SVD decomposition of H Û with U H , V H ∈ R n×K , where U H and V H represent respectively the left and right singular matrices of H. Define sgn 1. Similar as the proof of Lemma 3.3, by Theorem 4.2 of [15], when σ