Estimating Mixed Memberships in Directed Networks by Spectral Clustering

Huan Qing

doi:10.3390/e25020345

School of Mathematics, China University of Mining and Technology, Xuzhou 221116, China

Entropy2023, 25(2), 345;https://doi.org/10.3390/e25020345

This article belongs to the Section Complexity

Version Notes

Order Reprints

Abstract

Community detection is an important and powerful way to understand the latent structure of complex networks in social network analysis. This paper considers the problem of estimating community memberships of nodes in a directed network, where a node may belong to multiple communities. For such a directed network, existing models either assume that each node belongs solely to one community or ignore variation in node degree. Here, a directed degree corrected mixed membership (DiDCMM) model is proposed by considering degree heterogeneity. An efficient spectral clustering algorithm with a theoretical guarantee of consistent estimation is designed to fit DiDCMM. We apply our algorithm to a small scale of computer-generated directed networks and several real-world directed networks.

Keywords:

community detection; directed networks; overlapping networks; spectral clustering

1. Introduction

Many real-world complex networks have community structure such that nodes within the same community (also known as cluster or module) have more links than across communities. For example, in social networks, communities can be groups of students in the same department; in co-authorship networks, a community can be formed by researchers in the same field. However, community structure for a real-world network is usually not directly observable. To process this problem, community detection, also known as graph clustering, is a popular tool for uncovering a latent community structure in a network [1,2]. For decades, many community detection methods have been proposed for non-overlapping undirected networks in which each node belongs to a single community, and the interactions between two nodes are symmetric or undirected. The stochastic block model (SBM) [3] is a popular generative model for non-overlapping undirected networks. In SBM, it is assumed that each node only belongs to one community and that nodes in the same community have the same expectation degrees. Ref. [4] proposes the classical degree corrected stochastic block model (DCSBM) which extends SBM by considering variation in node degree. In recent years, numerous algorithms have been developed to estimate node community for non-overlapping undirected networks generated from SBM and DCSBM, see [5,6,7,8,9,10,11,12,13,14,15]. For recent developments about SBM, see the wonderful review paper [16].

However, in most real-world networks, a node may belong to more than one community at a time. In recent years, the problem of estimating mixed memberships for the undirected network has received a lot of attention [17,18,19,20,21,22,23,24,25,26,27,28,29], and references therein. Ref. [17] extends the SBM model from non-overlapping undirected networks to mixed membership undirected networks and designs the mixed membership stochastic block (MMSB) model. Based on the MMSB model, ref. [24] designs a model called the degree corrected mixed membership (DCMM) model by considering degree heterogeneity, where DCMM can also be seen as an extension of the non-overlapping model DCSBM, and ref. [24] also develops an efficient and provably consistent spectral algorithm. Ref. [27] presents a spectral algorithm under MMSB and establishes per-node rates for mixed memberships by sharp row-wise eigenvector deviation. Ref. [29] proposes an overlapping continuous community assignment model (OCCAM), which is also an extension of MMSB, by considering degree heterogeneity. To fit OCCAM, ref. [29] develops a spectral algorithm requiring a relatively small fraction of mixed nodes when building theoretical frameworks. Ref. [26] finds the cone structure inherent in the normalization of the eigen-decomposition of the population adjacency matrix under DCMM and develops a spectral algorithm to hunt corners in the cone structure.

Though the above works are encouraging and appealing, they focus on undirected networks. In reality, there exist substantial directed networks, such as citation networks, protein–protein interaction networks, and the hyperlink network of websites. In recent years, a lot of works with encouraging results have been developed for directed networks. Ref. [30] proposes a stochastic co-block model (ScBM) and its extension DC-ScBM by considering degree heterogeneity to model non-overlapping directed networks, where ScBM and DC-ScBM can model directed networks whose row nodes may be different from column nodes, and the number of row communities may also be different from the number of column communities. Ref. [31] studies the theoretical guarantees for the algorithm DSCORE [32] and its variants designed under DC-ScBM. Ref. [33] studies the spectral clustering algorithms designed by a data-driven regularization of the adjacency matrix under ScBM. Ref. [34] studies higher-order spectral clustering of directed graphs by designing a nearly linear time algorithm. Based on the fact that the above works only consider non-overlapping directed networks, ref. [35] develops a directed mixed membership stochastic block model (DiMMSB), which is an extension of ScBM, and models directed networks with mixed memberships. DiMMSB can also be seen as a direct extension of MMSB from an undirected network to a directed network.

Recall that DCSBM, DCMM, and DCScBM are extensions of SBM, MMSB, and ScBM by considering node degree variation, respectively, this paper aims at proposing a model as an extension of DiMMSB by considering node degree heterogeneity and building an efficient spectral algorithm to fit the proposed model. In this paper, we focus on the directed network with mixed membership. Our contributions are as follows:

(i): We propose a novel generative model for directed networks with a mixed membership, the directed degree corrected mixed membership (DiDCMM) model. DiDCMM models a directed network with mixed memberships when row nodes have degree heterogeneities, while column nodes do not. We present the identifiability of DiDCMM under popular conditions which are also required by models modeling mixed membership networks when considering degree heterogeneity. Meanwhile, our results also show that modeling a directed network with mixed membership when considering degree heterogeneity for both row and column nodes needs nontrivial conditions. DiDCMM can be seen as an extension of the DCScBM model from a non-overlapping directed network to an overlapping directed network. DiDCMM also extends the DCMM model from an undirected network to a directed network and extends the DiMMSB model by considering node degree heterogeneity. For a detailed comparison of our DiDCMM with previous models, see Remark 2.
(ii): To fit DiDCMM, we present a spectral algorithm called DiMSC, which is designed based on the investigation that there exists an ideal cone structure inherent in the normalized version of the left singular vectors and an ideal simplex structure inherent in the right singular vectors of the population adjacency matrix. We prove that our DiMSC exactly recovers the membership matrices for both row and column nodes in the oracle case under DiDCMM, and this also supports the identifiability of DiDCMM. We obtain the upper bounds of error rates for each row (and column) node and show that our method produces asymptotically consistent parameter estimations under mild conditions. Our theoretical results are consistent with classical results when DiDCMM degenerates to SBM and MMSB under mild conditions. Numerical results of simulated directed networks support our theoretical results and show that our approach outperforms its competitors. We also apply our algorithm to several real-world directed networks to test the existence of highly mixed nodes and asymmetric structures between row and column communities.

Notations.

We take the following general notations in this paper. For a vector x and fixed

q > 0

,

{∥ x ∥}_{q}

denotes its

l_{q}

-norm. For a matrix M,

M^{'}

denotes the transpose of the matrix M,

∥ M ∥

denotes the spectral norm,

{∥ M ∥}_{F}

denotes the Frobenius norm, and

{∥ M ∥}_{2 \to \infty}

denotes the maximum

l_{2}

-norm of all the rows of M. Let

rank (M)

denote the rank of matrix M. Let

σ_{i} (M)

be the i-th largest singular value of matrix M, and

λ_{i} (M)

denote the i-th largest eigenvalue of the matrix M ordered by the magnitude.

M (i, :)

and

M (:, j)

denote the i-th row and the j-th column of matrix M, respectively.

M (S_{r}, :)

and

M (:, S_{c})

denote the rows and columns in the index sets

S_{r}

and

S_{c}

of matrix M, respectively. For any matrix M, we simply use

Y = \max (0, M)

to represent

Y_{i j} = \max (0, M_{i j})

for any

i, j

. For any matrix

M \in R^{m \times m}

, let

diag (M)

be the

m \times m

diagonal matrix whose i-th diagonal entry is

M (i, i)

. Here,

1

and

0

are column vectors with all entries being ones and zeros, respectively;

e_{i}

is a column vector whose i-th entry is one, while other entries are zero. C is a positive constant that may vary occasionally.

2. The Directed Degree Corrected Mixed Membership Model

Consider a directed network

N = (V_{r}, V_{c}, E)

, where

V_{r} = {1, 2, \dots, n_{r}}

is the set of row nodes,

V_{c} = {1, 2, \dots, n_{c}}

is the set of column nodes (

n_{r}

and

n_{c}

indicate the number of row nodes and the number of column nodes, respectively), and

E

is the set of edges. Note that when

V_{r} = V_{c}

such that row nodes are the same as column nodes,

N

is a traditional directed network [31,36,37,38,39,40,41,42]; when

V_{r} \neq V_{c}

,

N

is a bipartite network (also known as a bipartite graph) [30,33,35,43,44,45]; see Figure 1 for illustrations of the topological structures for a directed network and a bipartite network. Without confusion, we also call bipartite networks directed networks occasionally in this paper.

Figure 1. Illustration for directed network and bipartite network. Panel (a): directed network; Panel (b): bipartite network.

We assume that the row nodes of the directed network

N

belong to K perceivable communities (called row communities in this paper)

\begin{matrix} C_{r}^{(1)}, C_{r}^{(2)}, \dots, C_{r}^{(K)}, \end{matrix}

(1)

and the column nodes of the directed network

N

belong to K perceivable communities (called column communities in this paper)

\begin{matrix} C_{c}^{(1)}, C_{c}^{(2)}, \dots, C_{c}^{(K)} . \end{matrix}

(2)

Define an

n_{r} \times K

row nodes membership matrix

Π_{r}

and an

n_{c} \times K

column nodes membership matrix

Π_{c}

such that

Π_{r} (i, :)

is a

1 \times K

probability mass function (PMF) for row node i,

Π_{c} (j, :)

is a

1 \times K

PMF for column node j, and

\begin{matrix} Π_{r} (i, k) is the weight of row node i on C_{r}^{(k)}, 1 \leq k \leq K, \end{matrix}

(3)

\begin{matrix} Π_{c} (j, k) is the weight of column node j on C_{c}^{(k)}, 1 \leq k \leq K . \end{matrix}

(4)

Call row node i ‘pure’ if

Π_{r} (i, :)

is degenerate (i.e., one entry is 1, all other

K - 1

entries are 0) and ‘mixed’ otherwise. The same definitions hold for column nodes. Note that mixed nodes considered in this article are not the boundary nodes introduced in [46] since boundary nodes are defined based on non-overlapping networks, while mixed nodes belong to multiple communities.

Let

A \in {0, 1}^{n_{r} \times n_{c}}

be the bi-adjacency matrix of

N

such that for each entry,

A (i, j) = 1

if there is a directional edge from row node i to column node j, and

A (i, j) = 0

otherwise. So, the i-th row of A records how row node i sends edges, and the j-th column of A records how column node j receives edges. Let P be a

K \times K

matrix such that

\begin{matrix} P (k, l) \geq 0 for 1 \leq k, l \leq K . \end{matrix}

(5)

Note that since we consider a directed network in this paper, P may be asymmetric.

Without loss of generality, suppose that row nodes have degree heterogeneities, while column nodes do not i.e., row nodes have variation in degree, while column nodes do not. Note that in a directed network, if column nodes have degree heterogeneities while row nodes do not, to detect memberships of both row nodes and column nodes, we set the transpose of the adjacency matrix as input when applying our algorithm DiMSC. Meanwhile, in a directed network, if both row and column nodes have degree heterogeneity, to model such a directed network with mixed memberships, we need nontrivial constraints on the degree heterogeneities between row nodes and column nodes for model identifiability, for detail, see Remark 1.

Let

θ_{r}

be an

n_{r} \times 1

vector whose i-th entry is the positive degree heterogeneity of row node i. For all pairs of

(i, j)

with

1 \leq i \leq n_{r}, 1 \leq j \leq n_{c}

, DiDCMM models the entries of A such that

A (i, j)

are independent Bernoulli random variables satisfying

\begin{matrix} P (A (i, j) = 1) = θ_{r} (i) \sum_{k = 1}^{K} \sum_{l = 1}^{K} Π_{r} (i, k) Π_{c} (j, l) P (k, l) . \end{matrix}

(6)

Equation (6) means that

P (A (i, j) = 1) = θ_{r} (i) Π_{r} (i, :) P Π_{c}^{'} (j, :)

, i.e., the probability of generating a directional edge from row node i to column node j is

θ_{r} (i) Π_{r} (i, :) P Π_{c}^{'} (j, :)

, and this probability is controlled by the degree heterogeneity parameter

θ_{r} (i)

of row node i, the connecting matrix P, and the memberships of nodes i and j. Equation (6) functions similarly to Equation (1.4) in [24], and both equations define the probability of generating an edge. For comparison, Equation (6) defines the probability of generating a directional edge under DiDCMM for a directed network, while Equation (1.4) in [24] defines the probability of generating an edge under DCMM for an undirected network, i.e., DiDCMM can be seen as an extension of DCMM from an undirected network to a directed network.

Introduce the degree heterogeneity diagonal matrix

Θ_{r} \in R^{n_{r} \times n_{r}}

for row nodes such that

\begin{matrix} Θ_{r} (i, i) = θ_{r} (i) for 1 \leq i \leq n_{r} . \end{matrix}

(7)

Equation (7) uses a diagonal matrix

Θ_{r}

to contain all degree heterogeneities, and

Θ_{r}

is useful for further theoretical analysis through Equation (8).

Definition 1.

Call model (1)–(6) the directed degree corrected mixed membership (DiDCMM) model, and denote it by

D i D C M M_{n_{r}, n_{c}} (K, P, Π_{r}, Π_{c}, Θ_{r})

.

The following conditions are sufficient for the identifiability of DiDCMM:

(I1) $rank (P) = K$ , and P has unit diagonals.
(I2) There is at least one pure node for each of the K row and K column communities.

When building statistical models for a network in which nodes can belong to multiple communities, the full rank requirement of connecting matrix P and pure nodes condition are always necessary for model identifiability, see models for an undirected network such as MMSB considered in [23,27], DCMM considered in [24,26], and OCCAM considered in [26,29]. Meanwhile, if models modeling networks with mixed memberships consider degree heterogeneity, the unit diagonals requirement on connecting matrix P is also necessary for model identifiability, see the identifiability requirement of DCMM and OCCAM considered in [24,26,29]. Furthermore, based on the fact that DiDCMM, DCMM, and OCCAM can include the well-known model SBM, letting P have unit diagonals is not a serious problem since many wonderful works study a special case of SBM when P has unit diagonals and a network has K equal size clusters (this special case of SBM is also known as a planted partition model), see [12,47,48,49,50,51,52].

Let

Ω = E [A]

be the expectation of the adjacency matrix A. Under DiDCMM, we have

\begin{matrix} Ω = Θ_{r} Π_{r} P Π_{c}^{'} . \end{matrix}

(8)

We refer to

Ω

as the population adjacency matrix. Since

rank (Θ_{r}) = K, rank (Π_{r}) = K, rank (Π_{c}) = K

and

rank (P) = K

by Equation (7) and Conditions (I1) and (I2), the rank of

Ω

is K. Recall that K is the number of communities, and it is much smaller than network size. We see that

Ω

has a low dimensional structure. The form of

Ω

given in Equation (8) is powerful to build the spectral algorithm developed in this paper to fit DiDCMM. Analyzing properties of the population adjacency matrix to build a spectral algorithm fitting statistical model is a common strategy in community detection, for example, references [24,26,27,35] also use this strategy to design their algorithms fitting DCMM, MMSB, and DiDCMM.

For

1 \leq k \leq K

, let

I_{r}^{(k)} = {i \in {1, 2, \dots, n_{r}} : Π_{r} (i, k) = 1}

and

I_{c}^{(k)} = {j \in {1, 2, \dots, n_{c}} : Π_{c} (j, k) = 1}

. By Condition (I2),

I_{r}^{(k)}

and

I_{c}^{(k)}

are nonempty for all

1 \leq k \leq K

. For

1 \leq k \leq K

, select one row node from

I_{r}^{(k)}

to construct the index set

I_{r}

, i.e.,

I_{r}

is the indices of row nodes corresponding to K pure row nodes, one from each community, and

I_{c}

is defined similarly. W.L.O.G., let

Π_{r} (I_{r}, :) = I_{K}

and

Π_{c} (I_{c}, :) = I_{K}

(Lemma 2.1 [27] has a similar setting to design their spectral algorithm under MMSB.), where

I_{K}

is the

K \times K

identity matrix. The proposition below shows that the DiDCMM model is identifiable.

Proposition 1.

(Identifiability). When Conditions (I1) and (I2) hold, DiDCMM is identifiable: for eligible

(P, Π_{r}, Π_{c}, Θ_{r})

and

(\tilde{P}, {\tilde{Π}}_{r}, {\tilde{Π}}_{c}, {\tilde{Θ}}_{r})

, set

Ω = Θ_{r} Π_{r} P Π_{c}^{'}

and

\tilde{Ω} = {\tilde{Θ}}_{r} {\tilde{Π}}_{r} \tilde{P} {\tilde{Π}}_{c}^{'}

. If

Ω = \tilde{Ω}

, then

Θ_{r} = {\tilde{Θ}}_{r}, Π_{r} = {\tilde{Π}}_{r}, Π_{c} = {\tilde{Π}}_{c}

and

P = \tilde{P}

.

Remark 1.

(The reason that we do not model a directed network with mixed memberships where both row and column nodes have degree heterogeneities). Suppose both row and column nodes have degree heterogeneities in a mixed membership directed network. To model such a directed network, the probability of generating an edge from row node i to column node j is

\begin{matrix} P (A (i, j) = 1) = θ_{r} (i) θ_{c} (j) \sum_{k = 1}^{K} \sum_{l = 1}^{K} Π_{r} (i, k) Π_{c} (j, l) P (k, l), \end{matrix}

where

θ_{c}

is an

n_{r} \times 1

vector whose j-th entry is the degree heterogeneity of column node j. Set

Ω = E [A]

, then

Ω = Θ_{r} Π_{r} P Π_{c}^{'} Θ_{c}

, where

Θ_{c} \in R^{n_{c} \times n_{c}}

is a diagonal matrix whose j-th diagonal entry

θ_{c} (j)

. Set

Ω = U Λ V^{'}

as the compact SVD of Ω. Follow similar analysis as Lemma 1, we see that

U = Θ_{r} Π_{r} B_{r}

and

V = Θ_{c} Π_{c} B_{c}

(without causing confusion, we still use

B_{c}

here for convenience.). For model identifiability, follow similar analysis as the proof of Proposition 1, since

Ω (I_{r}, I_{c}) = Θ_{r} (I_{r}, I_{r}) Π_{r} (I_{r},;) P Π_{c}^{'} (I_{c}, :) Θ_{c} (I_{c}, I_{c}) = Θ_{r} (I_{r}, I_{r}) P Θ_{c} (I_{c}, I_{c}) = U (I_{r}, :) Λ V^{'} (I_{c}, :)

, we see that

Θ_{r} (I_{r}, I_{r}) P Θ_{c} (I_{c}, I_{c}) = U (I_{r}, :) Λ V^{'} (I_{c}, :)

. To obtain

Θ_{r} (I_{r}, I_{r})

and

Θ_{c} (I_{c}, I_{c})

from

U (I_{r}, :) Λ V^{'} (I_{c}, :)

, when P has unit diagonals, we see that it is impossible to recover

Θ_{r} (I_{r}, I_{r})

and

Θ_{c} (I_{c}, I_{c})

unless we add a condition that

Θ_{r} (I_{r}, I_{r}) = Θ_{c} (I_{c}, I_{c})

. Now, suppose

Θ_{r} (I_{r}, I_{r}) = Θ_{c} (I_{c}, I_{c})

holds and call it Condition (I3); we have

Θ_{r} (I_{r}, I_{r}) P Θ_{r} (I_{r}, I_{r}) = U (I_{r}, :) Λ V^{'} (I_{c}, :)

; hence,

Θ_{r} (I_{r}, I_{r}) = Θ_{c} (I_{c}, I_{c}) = \sqrt{diag (U (I_{r}, :) Λ V^{'} (I_{c}, :))}

when P has unit diagonals. However, Condition (I3) is nontrivial since it requires

Θ_{r} (I_{r}, I_{r}) = Θ_{c} (I_{c}, I_{c})

, and we always prefer a directed network in which there are no connections between row nodes degree heterogeneities and column nodes degree heterogeneities. For example, when all nodes are pure in a directed network, ref. [30] models such directed network using model DC-ScBM such that

Ω = Θ_{r} Π_{r} P Π_{c}^{'} Θ_{c}

when all nodes are pure, and

Θ_{r}

and

Θ_{c}

are independent under DC-ScBM. Because Condition (I3) is nontrivial, we do not model a mixed membership directed network with all nodes having degree heterogeneities.

For DiDCMM’s identifiability, the number of row communities should equal that of column communities when both row and column nodes may belong to more than one community. However, when only row nodes have mixed memberships while column nodes do not, the number of row communities can be lesser than that of column communities, and this is also discussed in [53]. All proofs of our theoretical results are provided in the Appendix A.1.

Unless specified, we treat Conditions (I1) and (I2) as default from now on. Proposition 1 is important since it guarantees that our model DiDCMM is well-defined, and we can design efficient spectral algorithms to fit DiDCMM based on its identifiability. The reason that we do not consider degree heterogeneity for column nodes for our DiDCMM is mainly for its identifiability. As analyzed in Remark 1, considering degree heterogeneity for both row and column nodes make the model unidentifiable unless adding some nontrivial conditions on model parameters. Meanwhile, many previous statistical models in the community detection areas are identifiable, and spectral algorithms can be applied to fit them. For examples, SBM [3], DCSBM [4], MMSB [17], DCMM [24], OCCAM [29], ScBM (and DCScBM), [30], and DiMMSB [35] are identifiable. Especially, though different statistical models may have different requirements on model parameters for identifiability, the proof of identifiability enjoys a similar idea as that of Proposition 1, for instance, Proposition 1.1 [24] and Theorem 2.1 [27] build theoretical guarantees on identifiability for DCMM and MMSB, respectively.

Remark 2.

We compare our DiDCMM with some previous models in this remark.

When $Θ_{r} = ρ I$ for $ρ > 0$ , Equation (8) gives $Ω = ρ Π_{r} P Π_{c}^{'}$ and DiDCMM degenerates to DiMMSB [35], where ρ is known as a sparsity parameter [9,27,35]. So, DiDCMM includes DiMMSB as a special case, and the relationship between DiDCMM and DiMMSB is similar to that between DCSBM [3,4]. Meanwhile, DiDCMM considers degree heterogeneity parameter $Θ_{r}$ at the cost that DiDCMM requires P to have unit diagonals for model identifiability, while there is no such requirement for P on DiMMSB’s identifiability. Note that both DiDCMM and DiMMSB are identifiable only when P is a full-rank square matrix.
When $Θ_{r} = ρ I$ for $ρ > 0$ and all nodes are pure, DiDCMM reduces to ScBM [30]. DiDCMM can model a directed network in which nodes enjoy overlapping memberships, while ScBM cannot. Meanwhile, DiDCMM enjoys this advantage at the cost of requiring $rank (P) = K$ for model identifiability, while ScBM is identifiable even when P is not a square matrix, i.e., ScBM can model a directed network in which the number of row communities can be different from the number of column communities. A comparison between DiDCMM and DCScBM [30] is similar.
When $Θ_{r} = ρ I$ and the network is undirected, DiDCMM reduces to MMSB [17]. However, DiDCMM models directed networks with mixed memberships, while MMSB only models undirected networks with mixed memberships. Again, DiDCMM enjoys its advantage at the cost of P having unit diagonals for its identifiability (not that DiDCMM allows P to be asymmetric since DiDCMM models directed networks), while MMSB is identifiable even when P has non-unit diagonals (note that P is symmetric under MMSB since it models undirected networks). Meanwhile, the identifiability of both DiDCMM and MMSB requires the square matrix P to have full rank.
When $Θ_{r} = ρ I$ , the network is undirected and all nodes are pure, DiDCMM reduces to SBM [3]. For comparison, DiDCMM models directed networks and allows nodes to belong to multiple communities, while SBM only models undirected networks in which a node only belongs to one community. Meanwhile, DiDCMM enjoys these advantages at the cost of requiring P to be full rank with unit diagonals for its identifiability, while SBM is identifiable even when P is not full rank and P has non-unit diagonals. Note that DiDCMM allows P to be asymmetric, while P must be symmetric for SBM since DiDCMM models directed networks, while SBM models undirected networks. Comparison between DiDCMM and DCSBM [4] is similar.
Compared with DCMM introduced in [24] and OCCAM introduced in [29], DCMM, and OCCAM model undirected networks with mixed memberships, while DiDCMM models directed networks with mixed memberships. DiDCMM, DCMM, and OCCAM all consider degree heterogeneity for overlapping networks, and they are identifiable only when the full rank matrix P has unit diagonals. These three models are identifiable only when the square matrix P is full rank. Meanwhile, DiDCMM allows P to be asymmetric, while P must be symmetric for DCMM and OCCAM since DiDCMM models directed networks, while DCMM and OCCAM model undirected networks.

3. Algorithm

The primary goal of the proposed algorithm is to estimate the row membership matrix

Π_{r}

and column membership matrix

Π_{c}

from the observed adjacency matrix A with given K. We start by considering the ideal case when

Ω

is known, and then we extend what we learn in the ideal case to the real case.

3.1. The Ideal Simplex (IS), the Ideal Cone (IC), and the Ideal DiMSC

Recall that

rank (Ω) = K

under Conditions (I1) and (I2), and K is much smaller than

\min {n_{r}, n_{c}}

. Let

Ω = U Λ V^{'}

be the compact singular value decomposition of

Ω

such that

U \in R^{n_{r} \times K}, Λ \in R^{K \times K}, V \in R^{n_{c} \times K}

,

U^{'} U = I_{K}, V^{'} V = I_{K}

. The goal of the ideal case is to use

U, Λ

, and V to exactly recover

Π_{r}

and

Π_{c}

. As stated in [8,24],

θ_{r}

is one of the major nuisances, and similar to [7], we remove the effect of

θ_{r}

by normalizing each row of U to have a unit

l_{2}

norm. Set

U_{*} \in R^{n_{r} \times K}

by

U_{*} (i, :) = \frac{U (i, :)}{{∥ U (i, :) ∥}_{F}}

, and let

N_{U}

be the

n_{r} \times n_{r}

diagonal matrix such that

N_{U} (i, i) = \frac{1}{{∥ U (i, :) ∥}_{F}}

for

1 \leq i \leq n_{r}

. Then,

U_{*}

can be rewritten as

U_{*} = N_{U} U

. The existences of the ideal cone (IC for short) structure inherent in

U_{*}

and the ideal simplex (IS for short) structure inherent in V are guaranteed by the following lemma.

Lemma 1. (Ideal Simplex and Ideal Cone). Under

D i D C M M_{n_{r}, n_{c}} (K, P, Π_{r}, Π_{c}, Θ_{r})

, there exist a unique

K \times K

matrix

B_{r}

and a unique

K \times K

matrix

B_{c}

such that

$U = Θ_{r} Π_{r} B_{r}$ , where $B_{r} = Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :)$ , and $U_{*} = Y U_{*} (I_{r}, :)$ where
$Y = N_{M} Π_{r} Θ_{r}^{- 1} (I_{r}, I_{r}) N_{U}^{- 1} (I_{r}, I_{r})$ with $N_{M}$ being an $n_{r} \times n_{r}$ diagonal matrix whose diagonal entries are positive. Meanwhile, $U_{*} (i, :) = U_{*} (\bar{i}, :)$ if $Π_{r} (i, :) = Π_{r} (\bar{i}, :)$ for $1 \leq i, \bar{i} \leq n_{r}$ .
$V = Π_{c} B_{c}$ , where $B_{c} = V (I_{c}, :)$ . Meanwhile, $V (j, :) = V (\bar{j}, :)$ if $Π_{c} (j, :) = Π_{c} (\bar{j}, :)$ for $1 \leq j, \bar{j} \leq n_{c}$ .

Lemma 1 says that the rows of V form a K-simplex in

R^{K}

which we call the ideal simplex (IS), with the K rows of

B_{c}

being the vertices. Such IS is also found in [24,27,35]. Lemma 1 also shows that the form of

U_{*} = Y U_{*} (I_{r}, :)

is actually the ideal cone structure mentioned in [26]. Meanwhile, we remove the influence of

θ_{r}

by normalizing each row of U to have a unit norm in this paper. Using the idea of the entry-wise ratio in [8] also works, where ref. [24] develops their spectral algorithms to fit DCMM using the idea of entry-wise ratio. Designing algorithms based on the nonnegative matrix factorization [25] to fit DiDCMM by adding some constraints on

Ω

may also work. We leave the study of using these ideas to fit DiDCMM or its submodels for our future work.

For column nodes (recall that column nodes have no degree heterogeneities), since

B_{c}

is full rank if V and

B_{c}

are known in advance, ideally we can exactly recover

Π_{c}

by setting

Π_{c} = V B_{c}^{'} {(B_{c} B_{c}^{'})}^{- 1} \equiv V B_{c}^{- 1}

. For convenience, to transfer the ideal case to the real case, set

Z_{c} = V B_{c}^{- 1}

. Since

Z_{c} \equiv Π_{c}

, we have

\begin{matrix} Π_{c} (j, :) = \frac{Z_{c} (j, :)}{∥ Z_{c} {(j, :) ∥}_{1}}, 1 \leq j \leq n_{c} . \end{matrix}

With given V, since it enjoys IS structure

V = Π_{c} B_{c} \equiv Π_{c} V (I_{c}, :)

, as long as we can obtain

V (I_{c}, :)

(i.e.,

B_{c}

), we can recover

Π_{c}

exactly. As mentioned in [24,27], for such IS, the successive projection (SP) algorithm [54] (i.e., Algorithm A2 in the Appendix E) can be applied to V with K column communities to find the column corner matrix

B_{c}

. The above analysis gives how to recover

Π_{c}

with given

Ω

and K under DiDCMM ideally.

Next, we aim to recover

Π_{r}

from U with the given K. Since

rank (U_{*}) = K

,

rank (U_{*} (I_{r}, :)) = K

. As

U_{*} (I_{r}, :) \in R^{K \times K}

, the inverse of

U_{*} (I_{r}, :)

exists. Therefore, Lemma 1 also gives that

\begin{matrix} Y = U_{*} U_{*}^{- 1} (I_{r}, :) . \end{matrix}

(9)

Equation (9) holds because

U_{*} = Y U_{*} (I_{r}, :)

and

U_{*} (I_{r}, :)

is a nonsingular matrix. By Lemma 1, we know that for row nodes, their membership matrix

Π_{r}

appears in the expression of Y. Therefore, we aim to use Equation (9) to find the exact expression of

Π_{r}

using

U, V

, and

Λ

by putting Y at the left-hand side of equality. For our next step, we aim at finding

Π_{r}

using Equation (9). Since

Y = N_{M} Π_{r} Θ_{r}^{- 1} (I_{r}, I_{r}) N_{U}^{- 1} (I_{r}, I_{r})

by Lemma 1 and

U_{*} = N_{U} U

, using

N_{M} Π_{r} Θ_{r}^{- 1} (I_{r}, I_{r}) N_{U}^{- 1} (I_{r}, I_{r})

and

N_{U} U

to replace Y and

U_{*}

in Equation (9), respectively, we have

N_{U}^{- 1} N_{M} Π_{r} Θ_{r}^{- 1} (I_{r}, I_{r}) N_{U}^{- 1} (I_{r}, I_{r}) = U U_{*}^{- 1} (I_{r}, :)

, which gives

\begin{matrix} N_{U}^{- 1} N_{M} Π_{r} = U U_{*}^{- 1} (I_{r}, :) N_{U} (I_{r}, I_{r}) Θ_{r} (I_{r}, I_{r}) . \end{matrix}

(10)

From Equation (10), we have found the expression of

Π_{r}

as a function of

U, U_{*}, Θ_{r}, N_{U}

, and

I_{r}

, where we do not move

N_{U}^{- 1} N_{M}

to the right-hand side of Equation (10) because it is a diagonal matrix and does not influence the expression of

Π_{r}

, see our next step for details. When designing a spectral algorithm in the ideal case with given

Ω

and K, we aim at recovering

Π_{r}

and

Π_{c}

by taking advantage of the singular value decomposition of

Ω

. We find that though Equation (10) provides an expression for

Π_{r}

by

Ω

’s SVD, there is a term

Θ_{r} (I_{r}, I_{r})

which relates to degree heterogeneity, and we aim at expressing

Θ_{r} (I_{r}, I_{r})

through

Ω

’s SVD. By the proof of Lemma 1, we know that

Θ_{r} (I_{r}, I_{r}) = diag (U (I_{r}, :) Λ V^{'} (I_{c}, :))

when Condition (I1) holds. Thus, substituting

diag (U (I_{r}, :) Λ V^{'} (I_{c}, :))

for

Θ_{r} (I_{r}, I_{r})

in Equation (10), we obtain an expression of

Π_{r}

such that this expression is directly related to

Ω

’s SVD and two index set

I_{r}

and

I_{c}

. For convenience, set

J_{*} = N_{U} (I_{r}, I_{r}) Θ_{r} (I_{r}, I_{r}) \equiv diag (U_{*} (I_{r}, :) Λ V^{'} (I_{c}, :)), Z_{r} = N_{U}^{- 1} N_{M} Π_{r}, Y_{*} = U U_{*}^{- 1} (I_{r}, :)

. By Equation (10), we have

\begin{matrix} Z_{r} = Y_{*} J_{*} \equiv U U_{*}^{- 1} (I_{r}, :) diag (U_{*} (I_{r}, :) Λ V^{'} (I_{c}, :)) . \end{matrix}

(11)

Equation (11) looks similar to Equation (7) of [55]. However, Equation (11) is related to two index sets

I_{r}

and

I_{c}

, while Equation (7) of [55] is only related to one index set because Equation (11) aims at designing a spectral algorithm for directed network generated under DiDCMM and Equation (7) of [55] aims at reviewing the generation of the SVM-cone-DCMMSB algorithm proposed in [26] for undirected network generated under DCMM. Meanwhile, since

N_{U}^{- 1} N_{M}

is an

n_{r} \times n_{r}

positive diagonal matrix, we have

\begin{matrix} Π_{r} (i, :) = \frac{Z_{r} (i, :)}{∥ Z_{r} {(i, :) ∥}_{1}}, 1 \leq i \leq n_{r} . \end{matrix}

(12)

With given

Ω

and K, we can obtain

U, V

; thus, the above analysis shows that once the two index sets

I_{r}

and

I_{c}

are known, we can exactly recover

Π_{r}

by Equations (11) and (12). Meanwhile, from Equation (10), we see that it is important to express

Θ_{r} (I_{r}, I_{r})

as a combination of

U, V, Λ

, and the two index sets

I_{r}

and

I_{c}

, where we successfully obtain an expression of

Θ_{r} (I_{r}, I_{r})

by Condition (I1), the unit diagonal constraint on P. Otherwise, if P has no unit diagonals, we cannot obtain an expression of

Θ_{r} (I_{r}, I_{r})

unless adding some nontrivial conditions on model parameters, just as analyzed in Remark 1. Similarly, references [24,26] also design their spectral algorithms to fit DCMM by using the unit diagonal constraint on P to obtain an expression of a sub-matrix of degree heterogeneity matrix, see Equations (6)–(8) of [55] as an example.

Given

Ω

and K, to recover

Π_{r}

in the ideal case, we need to obtain

Z_{r}

by Equation (11), which means that the only difficulty is in finding the index set

I_{r}

since

V (I_{c}, :)

can be obtained by SP algorithm from the IS structure

V = Π_{c} V (I_{c}, :)

. From Lemma 1, we know that

U_{*} = Y U_{*} (I_{r}, :)

forms the IC structure. In [26], their SVM-cone algorithm (i.e., Algorithm A3 in the Appendix F) can exactly obtain the row nodes corner matrix

U_{*} (I_{r}, :)

from the ideal cone

U_{*} = Y U_{*} (I_{r}, :)

as long as the Condition

{(U_{*} (I_{r}, :) U_{*}^{'} (I_{r}, :))}^{- 1} 1 > 0

holds (see Lemma 2).

Lemma 2.

Under

D i D C M M_{n_{r}, n_{c}} (K, P, Π_{r}, Π_{c}, Θ_{r})

,

{(U_{*} (I_{r}, :) U_{*}^{'} (I_{r}, :))}^{- 1} 1 > 0

holds.

Based on the above analysis, we are now ready to give the following four-stage algorithm which we call ideal DiMSC. Input

Ω, K

. Output:

Π_{r}

and

Π_{c}

.

Let $Ω = U Λ V^{'}$ be the compact SVD of $Ω$ such that $U \in R^{n_{r} \times K}, V \in R^{n_{c} \times K}, Λ \in R^{K \times K}, U^{'} U = I, V^{'} V = I$ . Let $U_{*} = N_{U} U$ , where $N_{U}$ is an $n_{r} \times n_{r}$ diagonal matrix whose i-th diagonal entry is $\frac{1}{{∥ U (i, :) ∥}_{F}}$ for $1 \leq i \leq n_{r}$ .
Run the SP algorithm on V assuming that there are K column communities to obtain the column corner matrix $V (I_{c}, :)$ (i.e., $B_{c}$ ). Run the SVM-cone algorithm on $U_{*}$ assuming that there are K row communities to obtain $I_{r}$ .
Set $J_{*} = diag (U_{*} (I_{r}, :) Λ V^{'} (I_{c}, :)), Y_{*} = U U_{*}^{- 1} (I_{r}, :), Z_{r} = Y_{*} J_{*}$ and $Z_{c} = V V^{- 1} (I_{c}, :)$ .
Recover $Π_{r}$ and $Π_{c}$ by setting $Π_{r} (i, :) = \frac{Z_{r} (i, :)}{∥ Z_{r} {(i, :) ∥}_{1}}$ for $1 \leq i \leq n_{r}$ , and $Π_{c} (j, :) = \frac{Z_{c} (j, :)}{∥ Z_{c} {(j, :) ∥}_{1}}$ for $1 \leq j \leq n_{c}$ .

The following theorem guarantees that ideal DiMSC exactly recovers nodes memberships, and this verifies the identifiability of DiDCMM in turn. Meanwhile, it should be noted that many spectral algorithms designed to fit identifiable statistical models in the community detection area can exactly recover node memberships for the ideal case. For example, the spectral clustering for K many clusters algorithm addressed in [5] under SBM, the regularized spectral clustering designed in [7] under DCSBM, the SCORE algorithm designed in [8] under DCSBM, the two algorithms designed and studied in [9] under SBM and DCSBM, the RSC-

τ

algorithm studied in [11] under SBM, the mixed-SCORE algorithm designed in [24] under DCMM, the DI-SIM algorithm designed in [30] under DCScBM, the D-SCORE algorithm studied in [31,32] under DCScBM, the SVM-cone-DCMMSB algorithm designed in [26] under DCMM, and the SPACL algorithm designed in [27] under MMSB can exactly recover membership matrices under respective models for the ideal case by using the population adjacency matrix to replace the adjacency matrix in the input of these algorithms. The fact that ideal cases for the above spectral algorithms can return community information also supports the identifiability of the above models.

Theorem 1.

Under

D i D C M M_{n_{r}, n_{c}} (K, P, Π_{r}, Π_{c}, Θ_{r})

, the ideal DiMSC exactly recovers the row nodes membership matrix

Π_{r}

and the column nodes membership matrix

Π_{c}

.

To demonstrate that

U_{*}

has the ideal cone structure, we drew Panel (a) of Figure 2. The simulated data used for Panel (a) is generated from

D i D C M M_{n_{r}, n_{c}} (K, P, Π_{r}, Π_{c}, Θ_{r})

with

n_{r} = 600, n_{c} = 400, K = 3

; each row (and column) community has 120 pure nodes. For the 240 mixed row nodes, we set

Π_{r} (i, 1) = rand (1) / 2, Π_{r} (i, 2) = rand (1) / 2, Π_{r} (i, 3) = 1 - Π_{r} (j, 1) - Π_{r} (j, 2)

, where

rand (1)

is any random number in

(0, 1)

,

Figure 2. Panel (a): plot of

U_{*}

and the hyperplane formed by

U_{*} (I_{r}, :)

. Blue points denote rows respective to mixed row nodes of

U_{*}

, and black points denote the K rows of the corner matrix

U_{*} (I_{r}, :)

. The plane in Panel (a) is the hyperplane formed by the triangle of the 3 rows of

U_{*} (I_{r}, :)

. Panel (b): plot of V and the ideal simplex formed by

V (I_{c}, :)

. Blue points denote rows respective to mixed column nodes of V, and black points denote the K rows of the corner matrix

V (I_{c}, :)

. Since

K = 3

, for visualization, we have projected these points from

R^{3}

to

R^{2}

.

and i is a mixed row node. For the 40 mixed column nodes, set

Π_{c} (j, 1) = rand (1) / 2

,

Π_{c} (j, 2) = rand (1) / 2, Π_{c} (j, 3) = 1 - Π_{c} (j, 1) - Π_{c} (j, 2)

. For the degree heterogeneity parameter, set

θ_{r} (i) = rand (1)

for all row nodes i. The matrix P is set as

P = [\begin{matrix} 1 & 0.4 & 0.3 \\ 0.2 & 1 & 0.1 \\ 0.1 & 0.4 & 1 \end{matrix}] .

Under such a setting, after computing

Ω

and obtaining

U_{*}, V

from

Ω

, we can plot Figure 2. Panel (a) shows that all rows respective to mixed row nodes of

U_{*}

are located at one side of the hyperplane formed by the K rows of

U_{*} (I_{r}, :)

, and this phenomenon occurs since each row of

U_{*}

is a scaled convex combination of the K rows of

U_{*} (I_{r}, :)

guaranteed by the IC structure

U_{*} = Y U_{*} (I_{r},;)

. Thus Panel (a) shows the existence of the ideal cone structure formed by

U_{*}

. Similarly, to demonstrate that V has the ideal simplex structure, we drew Panel (b) of Figure 2, where Panel (b) is obtained under the same setting as Panel (a). Panel (b) shows that rows respective to mixed column nodes of V are located inside of the simplex formed by the K rows of

V (I_{c}, :)

, and this phenomenon occurs since each row of V is a convex linear combination of the K rows of

V (I_{c}, :)

guaranteed by the IS structure

V = Π_{c} V (I_{c},;)

. Thus Panel (b) shows the existence of the ideal simplex structure formed by V.

3.2. Dimsc Algorithm

We now extend the ideal case to the real case. Set

\tilde{A} = \hat{U} \hat{Λ} {\hat{V}}^{'}

to be the top-K-dimensional SVD of A such that

\hat{U} \in R^{n_{r} \times K}, \hat{V} \in R^{n_{c} \times K}, \hat{Λ} \in R^{K \times K}, {\hat{U}}^{'} \hat{U} = I_{K}, {\hat{V}}^{'} \hat{V} = I_{K}

, and

\hat{Λ}

contains the top K singular values of A. Let

{\hat{U}}_{*}

be the row-wise normalization of

\hat{U}

such that

\hat{U} = N_{\hat{U}} \hat{U}

, where

N_{\hat{U}} \in R^{n_{r} \times n_{r}}

is a diagonal matrix whose i-th diagonal entry is

\frac{1}{∥ \hat{U} {(i, :) ∥}_{F}}

. For the real case, we use

{\hat{J}}_{*}, {\hat{Y}}_{*}, {\hat{Z}}_{r}, {\hat{Z}}_{c}, {\hat{Π}}_{r}, {\hat{Π}}_{c}

given in Algorithm 1 to estimate

J_{*}, Y_{*}, Z_{r}, Z_{c}, Π_{r}, Π_{c}

, respectively. Algorithm 1 called directed mixed simplex and cone (DiMSC for short) algorithm is a natural extension of the ideal DiMSC to the real case.

Algorithm 1: Directed Mixed Simplex and Cone (DiMSC) algorithm

Require: The adjacency matrix

A \in R^{n_{r} \times n_{c}}

of a directed network, the number of row (column) communities K.
Ensure: The estimated

n_{r} \times K

row membership matrix

{\hat{Π}}_{r}

and the estimated

n_{c} \times K

column membership matrix

{\hat{Π}}_{c}

.

1:: Obtain $\tilde{A} = \hat{U} \hat{Λ} {\hat{V}}^{'}$ , the top-K-dimensional SVD of A. Compute ${\hat{U}}_{*}$ from $\hat{U}$ .
2:: Apply SP algorithm (i.e., Algorithm A2) on the rows of $\hat{V}$ assuming there are K column communities to obtain ${\hat{I}}_{c}$ , the index set returned by SP algorithm.
3:: Similarly, apply SVM-cone algorithm (i.e., Algorithm 3) on the rows of ${\hat{U}}_{*}$ with K row communities to obtain ${\hat{I}}_{r}$ , the index set returned by SVM-cone algorithm.
4:: Set ${\hat{J}}_{*} = diag ({\hat{U}}_{*} ({\hat{I}}_{r}, :) \hat{Λ} {\hat{V}}^{'} ({\hat{I}}_{c}, :)), {\hat{Y}}_{*} = \hat{U} {\hat{U}}_{*}^{- 1} ({\hat{I}}_{r}, :), {\hat{Z}}_{r} = {\hat{Y}}_{*} {\hat{J}}_{*}$ and ${\hat{Z}}_{c} = \hat{V} {\hat{V}}^{- 1} ({\hat{I}}_{c}, :)$ . Then, set ${\hat{Z}}_{r} = \max (0, {\hat{Z}}_{r})$ and ${\hat{Z}}_{c} = \max (0, {\hat{Z}}_{c})$ .
5:: Estimate $Π_{r} (i, :)$ by ${\hat{Π}}_{r} (i, :) = {\hat{Z}}_{r} (i, :) / {∥ {\hat{Z}}_{r} (i, :) ∥}_{1}, 1 \leq i \leq n_{r}$ and estimate $Π_{c} (j, :)$ by ${\hat{Π}}_{c} (j, :) = {\hat{Z}}_{c} (j, :) / {∥ {\hat{Z}}_{c} (j, :) ∥}_{1}, 1 \leq j \leq n_{c}$ .

In the third step, we set the negative entries of

{\hat{Z}}_{r}

as 0 by setting

{\hat{Z}}_{r} = \max (0, {\hat{Z}}_{r})

for the reason that weights for any row node should be nonnegative, while there may exist some negative entries of

{\hat{Y}}_{*} {\hat{J}}_{*}

. A similar argument holds for

{\hat{Z}}_{c}

. The flowchart of DiMSC is displayed in Figure 3. Meanwhile, in community detection, researchers often use top-K-dimensional SVD of A or its variants such as Laplacian matrix or regularized Laplacian matrix to design their spectral clustering algorithms to fit identifiable statistical models such as spectral methods designed or studied in [5,7,8,9,11,24,26,27,29,31,33,35,56]. Furthermore, as discussed in [57], the

{SVS}^{+}

and

{SVS}^{*}

algorithms may be used as substitutions of the SP algorithm in our DiMSC for a better estimation of

Π_{r}

. When applying the entry-wise normalization idea developed in [8] to deal with U, as analyzed in [24], we obtain a simplex structure, and we can use the SP algorithm (or the combinatorial vertex search and sketched vertex search approaches developed in [24]) to hunt for the corners. The above ideas suggest that we can design different spectral algorithms to fit our model DiDCMM. We leave them for our future work. In particular, in this paper, we apply the SVM-cone algorithm to hunt for the corners of the cone structure inherent in

U_{*}

mainly for the theoretical convenience of the SVM-cone algorithm because ref. [26] has developed a nice theoretical framework on the performance for the SVM-cone algorithm.

Figure 3. Flowchart of Algorithm 1.

3.3. Computational Complexity

The computing cost of DiMSC mainly comes from SVD, SP, and SVM-cone. The computational complexity of SVD is

O (\max (n_{r}, n_{c}) \min (n_{r}^{2}, n_{c}^{2}))

. Since the adjacency matrix A for real-world network data sets is usually sparse, using the power method discussed in [58], the computation complexity for obtaining the top-K-dimensional SVD of A is only slightly larger than

O (\max (n_{r}^{2}, n_{c}^{2}) K)

[8,24]. The SP algorithm step in DiMSC has a complexity of

O (\max (n_{r}, n_{c}) K^{2})

[24]. The complexity of the one-class SVM step for SVM-cone algorithm is

O (\max (n_{r}, n_{c}) K^{2})

[26,59]. The complexity of the K-means step for SVM-cone algorithm is

O (\max (n_{r}, n_{c}) K^{2})

[60]. Since the number of communities K considered in this paper is much smaller than the network size, the total complexity of DiMSC is

O (\max (n_{r}^{2}, n_{c}^{2}) K)

. Results in Section 5 show that, for a computer-generated network with 15,000 nodes under SBM, DiMSC takes hundreds of seconds to process a standard personal computer (Thinkpad X1 Carbon Gen 8) using MATLAB R2021b. Meanwhile, many spectral methods developed under models SBM, DCSBM, MMSB, ScBM, DCScBM, OCCAM, DCMM, and DiMMSB for community detection also have complexity

O (\max (n_{r}^{2}, n_{c}^{2}) K)

, see spectral algorithms designed or studied in [5,7,8,9,11,24,26,27,29,30,31,33,35,61,62]. Researchers design spectral algorithms for community detection under various identifiable statistical models mainly for their convenience on building a theoretical guarantee of consistent estimation, and we also provide a theoretical guarantee on DiMSC’s estimation consistency in next section.

4. Consistency Results

In this section, we show the consistency of our algorithm for fitting the DiDCMM by proving that the sample-based estimates

{\hat{Π}}_{r}

and

{\hat{Π}}_{c}

concentrate around the true mixed membership matrices

Π_{r}

and

Π_{c}

. Throughout this paper, K is a known positive integer. Set

θ_{r, \max} = \max_{1 \leq i \leq n_{r}} θ_{r} (i)

and

θ_{r, \min} = \min_{1 \leq i \leq n_{r}} θ_{r} (i)

. Assume that

Assumption 1.

P_{\max} \max (∥ θ_{r} ∥_{1}, θ_{r, \max} n_{c}) \geq \log (n_{r} + n_{c})

.

Assumption 1 means that the network cannot be too sparse, and it also means that we allow

θ_{r, \max}

to go to zero with increasing numbers of row nodes and column nodes. When building theoretical guarantees on consistent estimation, controlling network sparsity is popular in the community detection area. For examples, Condition (2.9) of [8], Theorem 3.1 of [9], Condition (2.13) of [24], Assumption 3.1 of [27], and Assumption 2 of [31] all control network sparsity for their theoretical analysis. Especially, when DiDCMM reduces to SBM by letting

Θ_{r} = ρ I, n = n_{r} = n_{c}, Π_{r} = Π_{c}

, and all nodes are pure for

ρ > 0

, Assumption A1 requires that

ρ n ≫ \log (n)

, which is consistent with the sparsity requirement in [8,9,24,31]. As analyzed in [55], we know that our requirement on network sparsity is optimal since it matches the sharp threshold of obtaining a connected Erdös–Rényi (ER) random graph [63] when SBM reduces to an ER random graph by letting

K = 1

.

For notation convenience, set

ϖ = \max (∥ \hat{U} {\hat{U}}^{'} - U U^{'} ∥_{2 \to \infty}, ∥ \hat{V} {\hat{V}}^{'} - V V^{'} ∥_{2 \to \infty}), {\hat{f}}_{r}

= \max_{1 \leq i \leq n_{r}} ∥ e_{i}^{'} ({\hat{Π}}_{r} - Π_{r} P_{r}) ∥_{1}, {\hat{f}}_{c} = \max_{1 \leq j \leq n_{c}} {∥ e_{j}^{'} ({\hat{Π}}_{c} - Π_{c} P_{c}) ∥}_{1}

, and

π_{r, \min} = \min_{1 \leq k \leq K}

1^{'} Π_{r} e_{k}

, where

ϖ

is the row-wise singular vector deviation which can be bounded by Theorem 4.4 of [64],

{\hat{f}}_{r}

and

{\hat{f}}_{c}

measures per node clustering error of DiMSC, and

π_{r, \min}

measures the minimum summation of row nodes belonging to a certain row community. Increasing

π_{r, \min}

makes the network tend to be more balanced and vice versa. Meanwhile, row-wise singular vector deviation is important when building a theoretical guarantee of spectral methods fitting models for a network with mixed memberships, for example, refs. [24,26,27,35] also consider

ϖ

when building consistent estimation for their spectral methods.

The next theorem gives theoretical bounds on estimations of memberships for both row and column nodes, which is the main theoretical result for our DiMSC method.

Theorem 2.

Under

D i D C M M_{n_{r}, n_{c}} (K, P, Π_{r}, Π_{c}, Θ_{r})

, let

{\hat{Π}}_{r}

and

{\hat{Π}}_{c}

be obtained from Algorithm 1, when Assumption 1 holds, suppose

σ_{K} (Ω) \geq C \sqrt{θ_{r, \max} P_{\max} (n_{r} + n_{c}) \log (n_{r} + n_{c})}

, with probability at least

1 - o ({(n_{r} + n_{c})}^{- 3})

, we have

\begin{matrix} {\hat{f}}_{r} = O (\frac{K^{5.5} θ_{r, \max}^{15} ϖ κ^{4.5} (Π_{r}^{'} Π_{r}) κ (Π_{c}) λ_{1}^{1.5} (Π_{r}^{'} Π_{r})}{θ_{r, \min}^{15} π_{r, \min}}), {\hat{f}}_{c} = O (ϖ K κ (Π_{c}^{'} Π_{c}) \sqrt{λ_{1} (Π_{c}^{'} Π_{c})}) . \end{matrix}

In Theorem 2, the Condition

σ_{K} (Ω) \geq C \sqrt{θ_{r, \max} P_{\max} (n_{r} + n_{c}) \log (n_{r} + n_{c})}

is necessary when applying Theorem 4.4 [64] to obtain a theoretical upper bound of

ϖ

. When building a theoretical guarantee on estimation consistency for spectral methods fitting models modeling network with mixed memberships, it is necessary to have a lower bound requirement on

σ_{K} (Ω)

, see [24,26,27,35]. Actually, this requirement matches with the consistent requirement on

\frac{σ_{K} (P)}{\sqrt{P_{\max}}}

obtained from the theoretical upper bound of error rates for a balanced network, see Remark 4 for details. Meanwhile, similar to [7,11,30], we can design a spectral algorithm via an application of regularized Laplacian matrix to fit DiDCMM.

The following corollary is obtained by adding conditions on model parameters similar to Corollary 3.1 in [27], where these conditions give a directed network in which each community has the same order of size, and each node has the same order of degree, i.e., a balanced network.

Corollary 1.

Under

D i D C M M_{n_{r}, n_{c}} (K, P, Π_{r}, Π_{c}, Θ_{r})

, when conditions of Theorem 2 hold, suppose

λ_{K} (Π_{r}^{'} Π_{r}) = O (\frac{n_{r}}{K}), λ_{K} (Π_{c}^{'} Π_{c}) = O (\frac{n_{c}}{K}), π_{r, \min} = O (\frac{n_{r}}{K})

and

K = O (1)

, with probability at least

1 - o ({(n_{r} + n_{c})}^{- 3})

, we have

\begin{matrix} {\hat{f}}_{r} = O ({(\frac{θ_{r, \max}}{θ_{r, \min}})}^{15.5} \frac{1}{σ_{K} (P)} \sqrt{\frac{P_{\max} \log (n_{r} + n_{c})}{θ_{r, \min} n_{c}}}), {\hat{f}}_{c} = O ({(\frac{θ_{r, \max}}{θ_{r, \min}})}^{0.5} \frac{1}{σ_{K} (P)} \sqrt{\frac{P_{\max} \log (n_{r} + n_{c})}{θ_{r, \min} n_{r}}}) . \end{matrix}

Meanwhile,

when $θ_{r, \max} = O (ρ), θ_{r, \min} = O (ρ)$ (i.e., $\frac{θ_{r, \min}}{θ_{r, \max}} = O (1)$ ), we have

$\begin{matrix} {\hat{f}}_{r} = O (\frac{1}{σ_{K} (P)} \sqrt{\frac{P_{\max} \log (n_{r} + n_{c})}{ρ n_{c}}}), {\hat{f}}_{c} = O (\frac{1}{σ_{K} (P)} \sqrt{\frac{P_{\max} \log (n_{r} + n_{c})}{ρ n_{r}}}) . \end{matrix}$
when $n_{r} = O (n), n_{c} = O (n)$ and $θ_{r, \max} = O (ρ), θ_{r, \min} = O (ρ)$ , we have

$\begin{matrix} {\hat{f}}_{r} = O (\frac{1}{σ_{K} (P)} \sqrt{\frac{P_{\max} \log (n)}{ρ n}}), {\hat{f}}_{c} = O (\frac{1}{σ_{K} (P)} \sqrt{\frac{P_{\max} \log (n)}{ρ n}}) . \end{matrix}$

Consider a directed mixed membership network under the settings of Corollary 1 when

θ_{r, \max} = O (ρ), θ_{r, \min} = O (ρ)

for

ρ > 0

, to obtain consistent estimations for both row nodes and column nodes, by Corollary 1,

\frac{σ_{K} (P)}{\sqrt{P_{\max}}}

should shrink slower than

\sqrt{\frac{\log (n_{r} + n_{c})}{ρ \min (n_{r}, n_{c})}}

, where consistent estimation means that the theoretical upper bound of error rate goes to zero when increasing network size. Especially, when

n_{r} = O (n)

and

n_{c} = O (n)

,

\frac{σ_{K} (P)}{\sqrt{P_{\max}}}

should shrink slower than

\sqrt{\frac{\log (n)}{n}}

. We further assume that

P = (2 - β) I_{K} + (β - 1) 1 1^{'}

for

β \in [1, 2) \cup (2, \infty)

and let

\tilde{P} = ρ P

(note that for this P, we have

σ_{K} (P) = | β - 2 |

and

P_{\max} = \max (1, β - 1)

). So the diagonal elements for

\tilde{P}

are

ρ

and non-diagonal elements are

ρ (β - 1)

. Set

p_{in}

as the diagonal entries of

\tilde{P}

, and

p_{out}

as the non-diagonal entries of

\tilde{P}

, we have

p_{in} = ρ

,

p_{out} = ρ (β - 1)

, and

\frac{| p_{in} - p_{out} |}{\sqrt{\max (p_{in}, p_{out})}} = \frac{\sqrt{ρ} | β - 2 |}{\sqrt{\max (1, β - 1)}} = \frac{\sqrt{ρ} σ_{K} (P)}{\sqrt{P_{\max}}}

. Hence, for consistent estimation, we see that

\frac{| p_{in} - p_{out} |}{\sqrt{\max (p_{in}, p_{out})}}

should shrink slower than

\sqrt{\frac{\log (n_{r} + n_{c})}{\min (n_{r}, n_{c})}}

by Corollary 1 and should shrink slower than

\sqrt{\frac{\log (n)}{n}}

when

n_{r} = O (n)

and

n_{c} = O (n)

, where this result is consistent with classical separation condition for a standard network with two equal-sized clusters by applying the separation condition and sharp threshold criterion developed in [55].

Remark 3.

When the network is undirected (i.e.,

n_{r} = n_{c} = n, Π_{r} = Π_{c}

) with

K = O (1)

by setting

θ_{r} (i) = ρ

for

1 \leq i \leq n_{r}

, DiDCMM degenerates to MMSB considered in [27], the upper bound of error rate for DiMSC is

O (\frac{1}{σ_{K} (P)} \sqrt{\frac{\log (n)}{ρ n}})

when

P_{\max} = 1

. Replacing the Θ in [24] by

Θ = \sqrt{ρ} I

, their DCMM model degenerates to MMSB. Then, their conditions in Theorem 2.2 are our Assumption 1 and

λ_{K} (Π^{'} Π) = O (\frac{n}{K})

, where

Π = Π_{r} = Π_{c}

for MMSB. When

K = O (1)

, the error bound in Theorem 2.2 in [24] is

O (\frac{1}{σ_{K} (P)} \sqrt{\frac{\log (n)}{ρ n}})

, which is consistent with ours.

Remark 4.

By Lemma A5 in the Appendix D, we know

σ_{K} (Ω) \geq θ_{r, \min} σ_{K} (P) σ_{K} (Π_{r}) σ_{K} (Π_{c})

. To ensure the Condition

σ_{K} (Ω) \geq C {(θ_{r, \max} P_{\max} (n_{r} + n_{c}) \log (n_{r} + n_{c}))}^{1 / 2}

in Theorem 2 holds, we need

\begin{matrix} \frac{σ_{K} (P)}{\sqrt{P_{\max}}} \geq C {(\frac{θ_{r, \max} (n_{r} + n_{c}) \log (n_{r} + n_{c})}{θ_{r, \min}^{2} λ_{K} (Π_{r}^{'} Π_{r}) λ_{K} (Π_{c}^{'} Π_{c})})}^{1 / 2} . \end{matrix}

(13)

When

K = O (1), n_{r} = O (n), n_{c} = O (n), λ_{K} (Π_{r}^{'} Π_{r}) = O (\frac{n_{r}}{K}), λ_{K} (Π_{c}^{'} Π_{c}) = O (\frac{n_{c}}{K})

and

θ_{r, \max} = O (ρ), θ_{r, \min} = O (ρ)

, Equation (13) gives that

\frac{σ_{K} (P)}{\sqrt{P_{\max}}}

should shrink slower than

\sqrt{\frac{\log (n)}{ρ n}}

, which matches with the consistency requirement on

\frac{σ_{K} (P)}{\sqrt{P_{\max}}}

of Corollary 1.

For convenience, we need the following definition.

Definition 2.

Let

D i D C M M (n, K, Π_{r}, Π_{c}, α_{in}, α_{out})

be a special case of

D i M M D F_{n_{r}, n_{c}} (K, P, Π_{r}, Π_{c}, Θ_{r})

when

Θ_{r} = ρ I, n_{r} = n_{c} = n, λ_{K} (Π_{r}^{'} Π_{r}) = O (n / K), λ_{K} (Π_{c}^{'} Π_{c}) = O (n / K), π_{r, \min} = O (n / K), K = O (1)

, and

\tilde{P} = ρ P

has diagonal entries

p_{in} = α_{in} \frac{\log (n)}{n}

and non-diagonal entries

p_{out} = α_{out} \frac{\log (n)}{n}

.

D i D C M M (n, K, Π_{r}, Π_{c}, α_{in}, α_{out})

denotes a special directed network such that row communities have nearly equal sizes since

λ_{K} (Π_{r}^{'} Π_{r}) = O (n / K)

, and column communities also have nearly equal sizes. By Corollary 1, for consistent estimation, we need

\frac{| p_{in} - p_{out} |}{\sqrt{\max (p_{in}, p_{out})}} ≫ \sqrt{\frac{\log (n)}{n}}

under

D i D C M M (n, K, Π_{r}, Π_{c}, α_{in}, α_{out})

. Since

\frac{| p_{in} - p_{out} |}{\sqrt{\max (p_{in}, p_{out})}} = \frac{| α_{in} - α_{out} | \sqrt{\frac{\log (n)}{n}}}{\sqrt{\max (α_{in}, α_{out})}}

, for consistent estimation, we need

\begin{matrix} \frac{| α_{in} - α_{out} |}{\sqrt{\max (α_{in}, α_{out})}} ≫ 1 \end{matrix}

(14)

Our numerical results in Section 5 support that DiMSC can estimate memberships for both row and column nodes when the threshold

\frac{| α_{in} - α_{out} |}{\sqrt{\max (α_{in}, α_{out})}} ≫ 1

holds under

D i D C M M (n, K, Π_{r}, Π_{c}, α_{in}, α_{out})

.

Remark 5.

When

K = 2

, the network is undirected (i.e.,

Π_{r} = Π_{c}

), all nodes are pure, and each community has an equal size,

D i D C M M (n, K, Π_{r}, Π_{c}, α_{in}, α_{out})

reduces to the SBM case such that nodes connect with probability

p_{in}

within clusters and

p_{out}

across clusters. This case has been well studied in recent years, see [50] and references therein. Especially, for this case, ref. [50] finds that exact recovery is possible if

| \sqrt{α_{in}} - \sqrt{α_{out}} | > \sqrt{2}

and impossible if

| \sqrt{α_{in}} - \sqrt{α_{out}} | < \sqrt{2}

. For convenience, we use

S B M (n, p_{in}, p_{out})

to denote this case. Our numerical results in Section 5 show that DiMSC return consistent estimation under

S B M (n, p_{in}, p_{out})

when

α_{in}

and

α_{out}

are set in the impossible region of exact recovery but satisfy Equation (14).

Remark 6.

In information theory, Shannon entropy [65] quantifies the amount of information in a variable, and it is a measure of uncertainty information of a probability distribution. We use a node membership entropy (NME) derived from Shannon theory to measure the node’s uncertainty about the node and all communities [66,67]. For row node i with membership

Π_{r} (i, :)

, since

\sum_{k = 1}^{K} Π_{r} (i, k) = 1

and

Π_{r} (i, k)

can be seen as the probability that row node i belongs to row cluster k for

1 \leq k \leq K

, NME of row node i is the Shannon entropy related to

Π_{r} (i, :)

:

\begin{matrix} N M E (i) = - \sum_{k = 1}^{K} Π_{r} (i, k) \log (Π_{r} (i, k)) . \end{matrix}

(15)

For column node j with membership

Π_{c} (j, :)

, we can also obtain its NME by Equation (15). In particular, if a node belongs to each cluster with equal probability

\frac{1}{K}

, its NME is

\log (K)

which is the maximum among all NME; if a node belongs to two clusters with equal probability

\frac{1}{2}

, its NME is

\log (2)

which is less than

\log (K)

when

K \geq 3

. Generally, we see that recovering memberships for mixed nodes is harder than for pure nodes since NME is 0 for pure nodes, while NME is larger than 0 for mixed nodes by the definition of NME.

5. Simulations

In this section, several experiments are conducted to investigate the performance of our DiMSC under DiDCMM. We compare our DiMSC with three model-based methods that can be thought of as special cases of our model DiDCMM. Model-based methods we compare include the DISIM algorithm proposed in [30], the DSCORE algorithm studied in [31], and the DiPCA algorithm which is obtained by using the adjacency matrix A to replace the regularized graph Laplacian matrix in the DISIM algorithm. Similar to [24,27], for simulations, we measure the errors for the inferred community membership matrices instead of simply each node. We measure the performance of DiMSC and its competitors by the mixed Hamming error rate (MHamm for short) defined below

\begin{matrix} MHamm = \max (\frac{\min_{P \in S_{P}} {∥ {\hat{Π}}_{r} P - Π_{r} ∥}_{1}}{n_{r}}, \frac{\min_{P \in S_{P}} {∥ {\hat{Π}}_{c} P - Π_{c} ∥}_{1}}{n_{c}}), \end{matrix}

(16)

where

S_{P}

is the set of

K \times K

permutation matrices.

For all simulations in this section, unless specified, we set the parameters

(n_{r}, n_{c}, K, P, Π_{r}, Π_{c}, Θ_{r})

under DiDCMM as follows: let each row community and each column community have

n_{0}

pure nodes; let all mixed row nodes (and mixed column nodes) have membership

(1 / K, 1 / K, \dots, 1 / K)

; for

z \geq 1

, we generate the degree parameters for row nodes as below: let

{\bar{θ}}_{r} \in R^{n_{r} \times 1}

such that

1 / {\bar{θ}}_{r} (i) \overset{i i d}{\sim} U (1, z)

for

1 \leq i \leq n_{r}

, where

U (1, z)

denotes the uniform distribution on

[1, z]

, and set

θ_{r} = ρ {\bar{θ}}_{r}

, where we use

ρ

to control the sparsity of the network; when

K = 2

, P is set as

\begin{matrix} P_{1} & = [\begin{matrix} 1 & 0.1 \\ 0.2 & 1 \end{matrix}] or P_{2} = [\begin{matrix} 0.8 & 0.1 \\ 0.2 & 0.9 \end{matrix}]; \end{matrix}

when

K = 3

,

\begin{matrix} P_{3} & = [\begin{matrix} 1 & 0.1 & 0.3 \\ 0.2 & 1 & 0.4 \\ 0.5 & 0.2 & 1 \end{matrix}] or P_{4} = [\begin{matrix} 0.8 & 0.1 & 0.3 \\ 0.2 & 0.9 & 0.4 \\ 0.5 & 0.2 & 1 \end{matrix}]; \end{matrix}

where

P_{2}

and

P_{4}

have non-unit diagonals, and we consider the two cases because we want to investigate DiMSC’s sensitivity when P has non-unit diagonals such that P disobeys Condition (I1).

After obtaining

P, Π_{r}, Π_{c}, θ_{r}

, similar to the five simulation steps in [8], each simulation experiment contains the following steps:

(a) Let

Θ_{r}

be the

n_{r} \times n_{r}

diagonal matrix such that

Θ_{r} (i, i) = θ_{r} (i), 1 \leq i \leq n_{r}

. Set

Ω = Θ_{r} Π_{r} P Π_{c}^{'}

.

(b) Let W be an

n_{r} \times n_{c}

matrix such that

W (i, j)

are independent centered-Bernoulli with parameters

Ω (i, j)

. Let

\tilde{A} = Ω + W

.

(c) Set

{\tilde{S}}_{r} = {i : \sum_{j = 1}^{n_{c}} \tilde{A} (i, j) = 0}

and

{\tilde{S}}_{c} = {j : \sum_{i = 1}^{n_{r}} \tilde{A} (i, j) = 0}

, i.e.,

{\tilde{S}}_{r}

(

{\tilde{S}}_{c}

) is the set of row (column) nodes with 0 edges. Let A be the adjacency matrix obtained by removing rows respective to nodes in

{\tilde{S}}_{r}

and removing columns respective to nodes in

{\tilde{S}}_{c}

from

\tilde{A}

. Similarly, update

Π_{r}

by removing nodes in

{\tilde{S}}_{r}

and update

Π_{c}

by removing nodes in

{\tilde{S}}_{c}

.

(d) Apply the DiMSC algorithm (and its competitors) to A. Record MHamm under investigations.

(e) Repeat (b)–(d) 50 times, and report the averaged MHamm over the 50 repetitions.

Let

n_{r, A}

be the number of rows of A and

n_{c, A}

be the number of columns of A. In our experiments,

n_{r, A}

and

n_{c, A}

are usually very close to

n_{r}

and

n_{c}

; therefore we do not report the exact values of

n_{r, A}

and

n_{c, A}

. After providing the above steps about how to generate A numerically under DiDCMM and how to record the error rates, now we describe our experiments in detail. We consider six experiments here. In experiments 1–6, we study the influence of the fraction of pure nodes, degree heterogeneity, connectivity across communities, sparsity, phase transition, and network size on performances of these methods, respectively.

Experiment 1 (a): Fraction of pure nodes. Set

n_{r} = 200, n_{c} = 300, z = 5, ρ = 1

and P as

P_{1}

. Let

n_{0}

range in

{10, 20, 30, \dots, 100}

. The numerical results are shown in Panel (a) of Figure 4. The results show that as the fraction of pure nodes increases for both row and column communities, all approaches perform better. Meanwhile, DiMSC performs best among all methods in Experiment 1 (a).

Figure 4. Errors against increasing

n_{0}

. y-axis: MHamm. Panel (a): Experiment 1 (a); Panel (b): Experiment 1 (b); Panel (c): Experiment 1 (c); Panel (d): Experiment 1 (d).

Experiment 1 (b): Fraction of pure nodes. All parameters are set the same as Experiment 1 (a) except that we set P as

P_{2}

here. The numerical results are shown in Panel (b) of Figure 4. The results show that all methods perform better as

n_{0}

increases, DiMSC outperforms its competitors, and DiMSC enjoys satisfactory performance even when P has non-unit diagonals.

Experiment 1 (c): Fraction of pure nodes. Set

n_{r} = 600, n_{c} = 900, z = 5

,

ρ = 1

, and P as

P_{3}

. Let

n_{0}

range in

{20, 40, 60, \dots, 200}

. The numerical results are shown in Panel (c) of Figure 4, and we see that all methods perform better when there are more pure nodes and our DiMSC performs best.

Experiment 1 (d): Fraction of pure nodes. All parameters are set the same as Experiment 1 (c) except that we set P as

P_{4}

here. The numerical results are shown in Panel (d) of Figure 4, and the analysis is similar to that of Experiment 1 (b).

Experiment 2 (a): Degree heterogeneity. Set

n_{r} = 200, n_{c} = 300, n_{0} = 80, ρ = 1

, and P as

P_{1}

. Let z range in

{2, 3, 4, \dots, 12}

. A lager z generates lesser edges. The results are displayed in Panel (a) of Figure 5. The results suggest that the error rates of DiMSC for both row and column nodes tend to increase as z increases. This phenomenon happens because decreasing degree heterogeneities for row nodes lowers the number of edges in the directed network; thus the network becomes harder to be detected for both row and column nodes. Meanwhile, DiMSC outperforms its competitors in this experiment, and it is interesting to see that the error rates of DI-SIM, DiPCA, and DSCORE are almost the same for this experiment.

Figure 5. Errors against increasing z. y-axis: MHamm. Panel (a): Experiment 2 (a); Panel (b): Experiment 2 (b); Panel (c): Experiment 2 (c); Panel (d): Experiment 2 (d); Panel (e): Experiment 2 (e); Panel (f): Experiment 2 (f); Panel (g): Experiment 2 (g); Panel (h): Experiment 2 (h).

Experiment 2 (b): Degree heterogeneity. All parameters are set the same as Experiment 2 (a) except that we set P as

P_{2}

here. The results are displayed in Panel (b) of Figure 5, and we see that DiMSC performs satisfactorily when the directed network is not too sparse (i.e., a small z case) even when P has non-unit diagonals. Meanwhile, DiMSC significantly outperforms its competitors in this experiment.

Experiment 2 (c): Degree heterogeneity. Set

n_{r} = 600, n_{c} = 900, n_{0} = 150

,

ρ = 1

, and P as

P_{3}

. Let z range in

{2, 3, 4, \dots, 12}

. The results are shown in Panel (c) of Figure 5 and can be analyzed similarly to Experiment 2 (a).

Experiment 2 (d): Degree heterogeneity. All parameters are set the same as Experiment 2 (c) except that we set P as

P_{4}

here. The results are displayed in Panel (d) of Figure 5 and are similar to that of Experiment 2 (b).

Experiment 2 (e): Degree heterogeneity. All parameters are set the same as Experiment 2(a) except that we set

n_{0} = 0

(so there are no pure nodes in both row and column communities), and all mixed row nodes have two different memberships (0.9, 0.1) and (0.1, 0.9), each with

\frac{n_{r}}{K} = 100

number of row nodes, and all mixed column nodes also have the above two memberships, each with

\frac{n_{c}}{K} = 150

number of column nodes. Panel (e) of Figure 5 shows the results, and we see that DiMSC performs satisfactorily for a small z even for the case when there are no pure nodes for both row and column communities. Meanwhile, DiMSC performs better than its competitors when

z < 7

, and it perform poorer than its competitors when

z \geq 8

for this experiment. Furthermore, compared with numerical results of Experiment 2 (a), we see that DI-SIM, DiPCA, and DSCORE have better performances in Experiment 2 (e). The possible reason is the memberships

(0.9, 0.1)

and

(0.1, 0.9)

are close to

(1, 0)

and

(0, 1)

somewhat.

Experiment 2 (f): Degree heterogeneity. All parameters are set the same as Experiment 2 (b) except that we set

Π_{r}

and

Π_{c}

the same as Experiment 2 (e). The results are shown in Panel (f) of Figure 5 and are similar to that of Experiment 2 (e).

Experiment 2 (g): Degree heterogeneity. All parameters are set the same as Experiment 2 (c) except that we set

n_{0} = 0

, all mixed row nodes have three different memberships (0.8, 0.1, 0.1), (0.1, 0.8, 0.1), and

(0.1, 0.1, 0.8)

, each with

\frac{n_{r}}{K} = 200

number of row nodes, and all mixed column nodes also have the above four memberships, each with

\frac{n_{c}}{K} = 300

number of column nodes. The results are displayed in Panel (g) of Figure 5 and are similar to that of Experiment 2 (e).

Experiment 2 (h): Degree heterogeneity. All parameters are set the same as Experiment 2 (d) except that we set

Π_{r}

and

Π_{c}

the same as Experiment 2 (g). The results are shown in Panel (h) of Figure 5 and are similar to that of Experiment 2 (e).

Experiment 3 (a): Connectivity across communities. Set

n_{r} = 200, n_{c} = 300

,

n_{0} = 80

,

z = 5, ρ = 1

. Set

P = [\begin{matrix} 1 & β - 1 \\ β - 1 & 1 \end{matrix}] .

and let

β

range in

{1, 1.2, 1.4, \dots, 4}

. Decreasing

| β - 2 |

increases the hardness of detecting such directed networks. Note that

P (A (i, j) = 1) = Ω (i, j) = θ_{r} (i) Π_{r} (i, :) P Π_{c}^{'} (j, :)

gives

\max_{i, j} Ω (i, j) = θ_{r, \max} P_{\max}

should be no larger than 1. Since

P_{\max}

may be larger than one in this experiment, after obtaining

θ_{r}

, we need to update

θ_{r}

as

θ_{r} / P_{\max}

. The results are displayed in Panel (a) of Figure 6, and they support the arguments given after Corollary 1 such that DiMSC performs better when

| β - 2 |

increases and vice versa. Meanwhile, our DiMSC outperforms its competitors in this experiment.

Figure 6. Errors against increasing

β

. y-axis: MHamm. Panel (a): Experiment 3 (a); Panel (b): Experiment 3 (b); Panel (c): Experiment 3 (c); Panel (d): Experiment 3 (d); Panel (e): Experiment 3 (e); Panel (f): Experiment 3 (f); Panel (g): Experiment 3 (g); Panel (h): Experiment 3 (h).

Experiment 3 (b): Connectivity across communities. All parameters are set the same as Experiment 3 (a) except that we set

P = [\begin{matrix} 0.8 & β - 1 \\ β - 1 & 0.9 \end{matrix}] .

The results are displayed in Panel (b) of Figure 6, and we see that DiMSC performs better when

| β - 2 |

increases even for the case that P has non-unit diagonals.Meanwhile, our DiMSC performs better than its competitors here.

Experiment 3 (c): Connectivity across communities. Set

n_{r} = 600, n_{c} = 900, n_{0} = 150, z = 5, ρ = 1

. Set

P = [\begin{matrix} 1 & β - 1 & β - 1 \\ β - 1 & 1 & β - 1 \\ β - 1 & β - 1 & 1 \end{matrix}] .

and let

β

range in

{1, 1.2, 1.4, \dots, 4}

. The results are displayed in Panel (c) of Figure 6 and can be analyzed similarly to Experiment 3 (a).

Experiment 3 (d): Connectivity across communities. All parameters are set the same as Experiment 3(c) except that we set

P = [\begin{matrix} 0.8 & β - 1 & β - 1 \\ β - 1 & 0.9 & β - 1 \\ β - 1 & β - 1 & 1 \end{matrix}] .

The results are displayed in Panel (d) of Figure 6 and can be analyzed similarly to Experiment 3 (b).

Experiment 3 (e): Connectivity across communities. All parameters are set the same as Experiment 3(a) except that we let

Π_{r}

and

Π_{c}

be the same as that of Experiment 2 (e) (so there are no pure nodes in both row and column communities.). Panel (e) of Figure 6 shows the results, and we see that DiMSC enjoys better performance when

| β - 2 |

increases even in the case that there are no pure nodes for both row and column communities. Meanwhile, all methods have competitive performances for this experiment, and the possible reason that DiMSC’s competitors enjoy better performances here than in Experiment 3 (a) is analyzed in Experiment 2 (e).

Experiment 3 (f): Connectivity across communities. All parameters are set the same as Experiment 3 (b) except that we set

Π_{r}

and

Π_{c}

the same as Experiment 2 (e). The results are displayed in Panel (f) of Figure 6 and can be analyzed similarly to Experiment 3 (e).

Experiment 3 (g): Connectivity across communities. All parameters are set the same as Experiment 3 (c) except that we let

Π_{r}

and

Π_{c}

be the same as that of Experiment 2 (g) (so there are no pure nodes). Panel (g) of Figure 6 shows the results, and the analysis is similar to that of Experiment 3 (b).

Experiment 3 (h): Connectivity across communities. All parameters are set the same as Experiment 3 (d) except that we set

Π_{r}

and

Π_{c}

the same as Experiment 2 (g). Panel (h) of Figure 6 shows the results, and the analysis is similar to that of Experiment 3 (b).

Experiment 4 (a): Sparsity. Set

n_{r} = 200, n_{c} = 300, n_{0} = 80, z = 5,

and P as

P_{1}

. Let

ρ

range in

{0.2, 0.3, \dots, 1}

. A larger

ρ

indicates a denser network. Panel (a) in Figure 7 displays the simulation results of this experiment. We see that DiMSC performs better as the simulated directed network becomes denser, and DiMSC significantly outperforms its competitors in this experiment.

Figure 7. Errors against increasing

ρ

. y-axis: MHamm: Panel (a): Experiment 4 (a); Panel (b): Experiment 4 (b); Panel (c): Experiment 4 (c); Panel (d): Experiment 4 (d); Panel (e): Experiment 4 (e); Panel (f): Experiment 4 (f); Panel(g): Experiment 4 (g); Panel (h): Experiment 4 (h).

Experiment 4 (b): Sparsity. All parameters are set the same as Experiment 4 (a) except that P is set as

P_{2}

. Panel (b) of Figure 7 shows the results, and the analysis is similar to that of Experiment 2 (b).

Experiment 4 (c): Sparsity. Set

n_{r} = 600, n_{c} = 900, n_{0} = 150, z = 5,

and P as

P_{3}

. Let

ρ

range in

{0.2, 0.3, \dots, 1}

. Panel (c) of Figure 7 shows the results, and the analysis is similar to that of Experiment 4 (a).

Experiment 4 (d): Sparsity. All parameters are set the same as Experiment 4 (c) except that P is set as

P_{4}

. Panel (d) of Figure 7 displays the results, and the analysis is similar to that of Experiment 4 (b).

Experiment 4 (e): Sparsity. All parameters are set the same as Experiment 4 (a) except that we let

Π_{r}

and

Π_{c}

be the same as that of Experiment 2 (e). Panel (e) of Figure 7 shows the results, and we see that DiMSC’s error rates decrease for a denser directed network even when all nodes are mixed. Meanwhile, all methods enjoy similar performances in this experiment.

Experiment 4 (f): Sparsity. All parameters are set the same as Experiment 4 (b) except that we set

Π_{r}

and

Π_{c}

the same as Experiment 2(e). Panel (f) of Figure 7 shows the results, and the analysis is similar to that of Experiment 4 (e).

Experiment 4 (g): Sparsity. All parameters are set the same as Experiment 4 (c) except that we let

Π_{r}

and

Π_{c}

be the same as that of Experiment 2 (g). Panel (g) of Figure 7 shows the results, and the analysis is similar to that of Experiment 4 (e).

Experiment 4 (h): Sparsity. All parameters are set the same as Experiment 4 (d) except that we set

Π_{r}

and

Π_{c}

the same as Experiment 2(g). Panel (h) of Figure 7 shows the results, and the analysis is similar to that of Experiment 4 (e).

Experiment 5 (a): Phase transition. Under

D i D C M M (n, K, Π_{r}, Π_{c}, α_{in}, α_{out})

, set

K = 2, n = n_{r} = n_{c} = 300

. Let each row community have 100 pure nodes, each column community have 120 pure nodes, and all mixed nodes have membership

(1 / 2, 1 / 2)

. Since

\max (p_{in}, p_{out}) = \max (α_{in}, α_{out}) \frac{\log (n)}{n} \leq 1

,

α_{in}

and

α_{out}

should be set in

(0, \frac{n}{\log (n)}]

. We let

α_{in}

and

α_{out}

be in the range of

{2.5, 5, 7.5, \dots, 50}

. Panel (a) of Figure 8 displays the results. We see that DiMSC performs satisfactorily when

α_{in}

and

α_{out}

satisfy Equation (14), and this means that DiMSC achieves the threshold provided in Equation (14) under

D i D C M M (n, K, Π_{r}, Π_{c}, α_{in}, α_{out})

.

Figure 8. Phase transition for DiMSC: darker pixels represent lower error rates. The red lines represent

\frac{| α_{in} - α_{out} |}{\sqrt{\max (α_{in}, α_{out})}} = 1

. Panel (a): Experiment 5 (a); Panel (b): Experiment 5 (b).

Experiment 5 (b): Phase transition. Under

D i D C M M (n, K, Π_{r}, Π_{c}, α_{in}, α_{out})

, set

K = 3, n = n_{r} = n_{c} = 300

. Let each row community have 60 pure nodes, each column community have 80 pure nodes, and all mixed nodes have membership

(1 / 3, 1 / 3, 1 / 3)

. We also let

α_{in}

and

α_{out}

be in the range of

{2.5, 5, 7.5, \dots, 50}

. Panel (b) of Figure 8 displays the results, and the analysis is similar to that of Experiment 5 (a).

For Experiments 1–5, we can conclude that DiMSC outperforms its competitors, and this supports our analysis in Remark 6 because DiMSC is designed to estimate mixed memberships, while its competitors are designed for community partition of pure nodes.

Experiment 6: Network size. Under

S B M (n, p_{in}, p_{out})

, let

α_{in} = 2

and

α_{out} = 0.0001

. On the one hand, we have

\sqrt{α_{in}} - \sqrt{α_{out}} = \sqrt{2} - 0.01 < \sqrt{2}

, i.e.,

α_{in}

and

α_{out}

locates in the impossible region of exact recovery introduced in [50]. On the other hand, we have

\frac{α_{in} - α_{out}}{\sqrt{α_{in}}} > 1

, i.e.,

α_{in}

and

α_{out}

satisfy Equation (14) for DiMSC’s consistent estimation. Let n range in

{1000, 2000, 3000, \dots, 15000}

. For each n in this experiment, we report the averaged error rate and running time of DiMSC over 10 independent repetitions. The results are shown in Figure 9. From Panel (a) of Figure 9, we see that DiMSC enjoys satisfactory performance with a small error rate for this experiment. Panels (b) of Figure 9 says that DiMSC processes computer-generated networks of up to 15,000 nodes within hundreds of seconds.

Figure 9. Numerical results for Experiment 6. Panel (a): MHamm; Panel (b): running time.

Remark 7.

For visuality, we provide some examples of different types of directed networks generated under DiDCMM in this remark. Let

θ_{r} (i) = 0.9 + \frac{i^{2}}{9 n_{r}^{2}}

for

1 \leq i \leq n_{r}

. Let each row community has

n_{r, 0}

pure nodes, and each column community has

n_{c, 0}

pure nodes. Let all mixed nodes have membership

(1 / K, \dots, 1 / K)

. For the setting of P, we set it as

\begin{matrix} P_{a} & = [\begin{matrix} 0.9 & 0.05 \\ 0.1 & 0.95 \end{matrix}] or P_{b} = [\begin{matrix} 0.1 & 0.95 \\ 0.9 & 0.05 \end{matrix}] or P_{c} = [\begin{matrix} 12 & 1 \\ 0 & 12 \end{matrix}] \frac{\log (n_{r})}{n_{r}} or P_{d} = [\begin{matrix} 0 & 12 \\ 12 & 1 \end{matrix}] \frac{\log (n_{r})}{n_{r}} or \\ P_{e} & = [\begin{matrix} 12 & 1 & 0 \\ 0 & 12 & 0 \\ 1 & 0 & 12 \end{matrix}] \frac{\log (n_{r})}{n_{r}} or P_{f} = [\begin{matrix} 1 & 0 & 12 \\ 12 & 0 & 0 \\ 0 & 12 & 1 \end{matrix}] \frac{\log (n_{r})}{n_{r}}, \end{matrix}

where

K = 2

when P is

P_{a}, P_{b}, P_{c}

or

P_{d}

, and

K = 3

when P is

P_{e}

or

P_{f}

. Meanwhile, we can generate different types of directed networks under DiDCMM by considering the above six different settings of P, where these different types are also considered in Experiments 1–6, and we mainly provide the visuality for these directed networks with different structures provided in different P for this remark. Note that we allow P to have non-unit diagonals here because Condition (I1) is mainly for our theoretical buildings, and results for previous experiments show that DiMSC performs stable even when P has non-unit diagonals. We consider below eight settings.

Model Setup 1: Set

n_{r} = 16, n_{r, 0} = 6, n_{c} = 16, n_{c, 0} = 7

, and P as

P_{a}

. For this setup, a directed network with 16 row nodes and 16 column nodes is generated from DiDCMM. Figure 10 shows a directed network

N

generated under Model Setup 1, where we also report DiMSC’s error rate. Figure 10 says that there are more directed edges sent from row nodes 1–6 to column nodes 1–7 than from row nodes 7–12 to column nodes 1–7 for

P_{a}

. With given adjacency matrix A and known memberships

Π_{r}

and

Π_{c}

for this setup, readers can apply our DiMSC directly to A given in Panel (a) of Figure 10 to check the effectiveness of DiMSC.

Figure 10. Illustration for a directed network under Model Setup 1. Panel (a): Adjacency matrix of

N

, where black square denotes 1; Panel (b): directed network

N

, where red (blue) points indicate row (column) nodes. The error rate MHamm defined in Equation (16) of our DiSMC algorithm for this directed network

N

is 0.0377.

Model Setup 2: All settings are the same as Model Setup 1 except that we let P be

P_{b}

. The directed network

N

and its adjacency matrix are shown in Figure 11. We see that there are more directed edges sent from row nodes 1–6 to column nodes 10–16 than from row nodes 7–12 to column nodes 10–16 for

P_{b}

, which means that directed network generated using

P_{b}

and directed network from

P_{a}

has different structures.

Figure 11. Illustration for a directed network under Model Setup 2. Panel (a): Adjacency matrix A; Panel (b): directed network

N

. MHamm of DiMSC for this directed network

N

is 0.0424.

Model Setup 3: Set

n_{r} = 32, n_{r, 0} = 14, n_{c} = 28, n_{c, 0} = 12

, and P as

P_{a}

. For this setup, a bipartite network with 32 row nodes and 28 column nodes are generated from DiDCMM. Figure 12 shows this bipartite network and its adjacency matrix.

Figure 12. Illustration for a bipartite network under Model Setup 3. Panel (a): Adjacency matrix A; Panel (b): bipartite network

N

. MHamm of DiMSC for this bipartite network

N

is 0.0313.

Model Setup 4: All settings are the same as Model Setup 3 except that we let P be

P_{b}

. Figure 13 displays the results, and we see that the bipartite network from

P_{b}

also has a different structure compared with the one generated from using

P_{a}

under DiDCMM.

Figure 13. Illustration for a bipartite network under Model Setup 4. Panel (a): Adjacency matrix A; Panel (b): bipartite network

N

. MHamm of DiMSC for this bipartite network

N

is 0.0320.

Model Setup 5: Set

n_{r} = 100, n_{r, 0} = 48, n_{c} = 100, n_{c, 0} = 45

, and P as

P_{c}

. Figure 14 shows the row and column communities for a directed network generated from Setup 5 under DiDCMM, where we plot the directed network directly.

Figure 14. Illustration for a directed network under Model Setup 5. Panels (a,b) show the row and column communities, respectively. In these two panels, dots in the same color are pure nodes in the same communities, and a square indicates mixed nodes. MHamm of DiMSC for this directed network

N

is 0.0181.

Model Setup 6: All settings are the same as Model Setup 5 except that we let P be

P_{d}

. Figure 15 shows a directed network obtained from this setup, and we see that the structure of the directed network from

P_{d}

in Figure 15 differs a lot from that of the directed network from

P_{c}

shown in Figure 14.

Figure 15. Illustration for a directed network under Model Setup 6. Panels (a,b) show the row and column communities, respectively. MHamm of DiMSC for this directed network

N

is 0.0185.

Model Setup 7: Set

n_{r} = 100, n_{r, 0} = 30, n_{c} = 100, n_{c, 0} = 32

, and P as

P_{e}

. Figure 16 shows a directed network generated from this setup.

Figure 16. Illustration for a directed network under Model Setup 7. Panels (a,b) show the row and column communities, respectively. MHamm of DiMSC for this directed network

N

is 0.0266.

Model Setup 8: All settings are the same as Model Setup 7 except that we let P be

P_{f}

. Figure 17 displays a directed network generated from this setup, and we see that directed networks from

P_{f}

and

P_{e}

have different structures by comparing Figure 16 and Figure 17.

Figure 17. Illustration for a directed network under Model Setup 8. Panels (a,b) show the row and column communities, respectively. MHamm of DiMSC for this directed network

N

is 0.0279.

6. Application to Real-World Directed Networks

For the empirical directed networks considered here, row nodes are always the same as column nodes, so we let

n_{r} = n_{c} = n

. For

{\hat{Π}}_{r}

, we call node i highly mixed node if

0.8 \geq \max_{1 \leq k \leq K} {\hat{Π}}_{r} (i, k)

, similar for

{\hat{Π}}_{c}

. A highly mixed node tells us whether a node has mixed memberships and belongs to multiple communities. Let

τ_{r} = \frac{| i : 0.8 \geq \max_{1 \leq k \leq K} {\hat{Π}}_{r} (i, k) |}{n}

be the proportion of highly mixed nodes among all nodes to measure the mixability of all row communities. Define

τ_{c}

similar to

τ_{r}

. Let

{\hat{ℓ}}_{r}

be a vector such that

{\hat{ℓ}}_{r} (i) = {argmax}_{1 \leq k \leq K} {\hat{Π}}_{r} (i, k)

for

1 \leq i \leq n

, where we use

{\hat{ℓ}}_{r} (i)

to denote the home base row community of node i. Define

{\hat{ℓ}}_{c}

similar to

{\hat{ℓ}}_{r}

. To measure the asymmetric structure of a directed network, we use

\begin{matrix} {Hamm}_{r c} = \frac{\min_{P \in S_{P}} {∥ {\hat{Π}}_{c} P - {\hat{Π}}_{r} ∥}_{1}}{n}, \end{matrix}

where a large

{Hamm}_{r c}

means that the structure of row clusters differs a lot from that of column clusters. For

1 \leq i \leq n

, let

d_{r} (i) = \sum_{j = 1}^{n} A (i, j)

be the number of edges sent by node i,

d_{c} (i) = \sum_{j = 1}^{n} A (j, i)

be the number of edges received by node i, where

d_{r} (i)

(and

d_{c} (i)

) is the out degree (in degree) of node i. Since there are many nodes with zero in degree or out degree for real-world directed network, we need the below pre-processing: for any directed network

N

, we let

A_{m}

be its adjacency matrix for any positive integer m such that

A_{m}

is connected, and every node has at least m in degree and m out degree in

A_{m}

.

We apply DiMSC to the following real-world directed networks to discover their mixability, asymmetries, and directional communities.

Political blogs: This is a directed network of hyperlinks between weblogs on US politics [68]. In this data, node means a blog, and edge means a hyperlink. This data can be downloaded from http://www-personal.umich.edu/~mejn/netdata/ (accessed on 28 August 2022). It is well-known that there are two parties, “liberal” and “conservative”, so

K = 2

for this data. The are 1490 nodes in the original data. After pre-processing,

A_{1} \in {0, 1}^{813 \times 813}, A_{3} \in {0, 1}^{495 \times 495}, A_{6} \in {0, 1}^{285 \times 285}, A_{9} \in {0, 1}^{158 \times 158}

, where we focus on the cases when

m = 1, 3, 6, 9

for this data here. Meanwhile, we use political blogs

A_{m}

to denote this network when its adjacency matrix is

A_{m}

, where every node has a degree at least m. Similar notations hold for other real-world directed networks used in this paper.

Wikipedia links (gan): This directed network consists of the Wikilinks of Wikipedia in the Gan Chinese language (gan). In this data, node means an article, and the directed edge is a Wikilink [69]. This data can be downloaded from http://konect.cc/networks/wikipedia_link_gan (accessed on 28 August 2022). There are 9189 nodes in the original data. After pre-processing,

A_{1} \in {0, 1}^{6012 \times 6012}, A_{30} \in {0, 1}^{820 \times 820}, A_{60} \in {0, 1}^{559 \times 569}, A_{90} \in {0, 1}^{240 \times 240}

, where we study the cases

m = 1, 30, 60, 90

for this data. The leading 20 singular values of

A_{1}, A_{30}, A_{60}, A_{90}

shown in Panels (e)–(h) of Figure 18 suggest

K = 2

for these four adjacency matrices, where [30] also uses eigengap to estimate K.

Figure 18. Leading 20 singular values of real-world directed networks used in this paper. Panel (a): political blogs

A_{1}

; Panel (b): political blogs

A_{3}

; Panel (c): political blogs

A_{6}

; Panel (d): political blogs

A_{9}

; Panel (e): Wikipedia links (gan)

A_{1}

; Panel (f): Wikipedia links (gan)

A_{30}

; Panel (g): Wikipedia links (gan)

A_{60}

; Panel (h): Wikipedia links (gan)

A_{90}

; Panel (i): Wikipedia links (nah)

A_{1}

; Panel (j): Wikipedia links (nah)

A_{20}

; Panel (k): Wikipedia links (nah)

A_{30}

; Panel (l): Wikipedia links (nah)

A_{40}

.

Wikipedia links (nah): This network consists of the Wikilinks of the N

\bar{a}

huatl language (nah) [69] and can be downloaded from http://konect.cc/networks/wikipedia_link_nah/ (accessed on 28 August 2022). The original data has 10285 nodes. After pre-processing,

A_{1} \in {0, 1}^{6924 \times 6924}, A_{20} \in {0, 1}^{1057 \times 1057}, A_{30} \in {0, 1}^{486 \times 486}, A_{40} \in {0, 1}^{136 \times 136}

. Panel (i) of Figure 18 suggests

K = 4

for

A_{1}

, and Panels (j)–(l) of Figure 18 suggest

K = 2

for

A_{20}, A_{30}

, and

A_{40}

. Note that it only takes around 4 seconds for DiMSC to estimate memberships of Wikipedia links (nah)

A_{1}

.

The proportions of highly mixed nodes and

{Hamm}_{r c}

when applying DiMSC on the above real-world directed networks are reported in Table 1. For the political blogs network, small

τ_{r}, τ_{c}

, and

{Hamm}_{r c}

indicate that there are only a few highly mixed nodes, and the structure of row communities is similar to that of column communities, i.e., there is a slight asymmetry for this data. For Wikipedia links (gan)

A_{1}

and Wikipedia links (nah)

A_{1}

, they have a large proposition of highly mixed nodes in both row and column communities, and the row communities differ a lot from column communities, suggesting heavy asymmetric structure between row and column communities for these two data. For Wikipedia links (gan)

A_{30}, A_{60}

, and Wikipedia links (nah)

A_{20}

, we see that the proportion of highly mixed nodes for row (column) communities is small (large), and there is a slight asymmetric for these data. For Wikipedia links (gan)

A_{90}

and Wikipedia links (nah)

A_{30}, A_{40}

, there is no highly mixed node, and the structure of row clusters is similar to that of column clusters. For visualization, we plot the row and column communities as well as highly mixed nodes by applying DiMSC to some of these directed networks in Figure 19 and Figure 20.

Table 1.

τ_{r}, τ_{c}

, and

{Hamm}_{r c}

obtained from DiMSC for real-world directed networks used in this paper.

Figure 19. Row and column communities detected by DiMSC for political blogs. Colors indicate clusters, and a green square indicates highly mixed nodes, where the row and column communities are obtained from

{\hat{ℓ}}_{r}

and

{\hat{ℓ}}_{c}

, respectively. Panel (a): political blogs

A_{1}

; Panel (b): political blogs

A_{1}

; Panel (c): political blogs

A_{3}

; Panel (d): political blogs

A_{3}

; Panel (e): political blogs

A_{6}

; Panel (f): political blogs

A_{6}

; Panel (g): political blogs

A_{9}

; Panel (h): political blogs

A_{9}

.

Figure 20. Row and column communities detected by DiMSC for Wikipedia links (gan)

A_{90}

and Wikipedia links (nah)

A_{40}

. Colors indicate clusters, where the row and column communities are obtained from

{\hat{ℓ}}_{r}

and

{\hat{ℓ}}_{c}

, respectively. Panel (a): Wikipedia links (gan)

A_{90}

; Panel (b): Wikipedia links (gan)

A_{90}

; Panel (c): Wikipedia links (nah)

A_{40}

; Panel (d): Wikipedia links (nah)

A_{40}

.

7. Discussion and Conclusions

In this paper, we propose a novel directed degree corrected mixed membership (DiDCMM) model. DiDCMM models a directed network with mixed memberships for row nodes with degree heterogeneities and column nodes without degree heterogeneities. DiDCMM is identifiable when the two well-used Conditions (I1) and (I2) hold. It should be mentioned that a model modeling a directed network with mixed memberships for both row and column nodes with degree heterogeneities is unidentifiable unless considering some nontrivial conditions. To fit the model, we propose a provably consistent spectral algorithm called DiMSC to infer community memberships for both row and column nodes in a directed network generated by DiDCMM. DiMSC is designed based on the SVD of the adjacency matrix, where we apply the SP algorithm to hunt for the corners in the simplex structure and the SVM-cone algorithm to hunt for the corners in the cone structure. The theoretical results of DiMSC show that it consistently recovers memberships of both row nodes and column nodes under mild conditions. Meanwhile, when DiDCMM degenerates to MMSB, our theoretical results match that of Theorem 2.2 [24] when their DCMM degenerates to MMSB under mild conditions. Experiments conducted on synthetic directed networks generated from DiDCMM verify the effectiveness and the stability of Conditions (I1) and (I2) of DiMSC. Results for real-world directed networks show that DiMSC reveals highly mixed nodes and asymmetries in the structure of row and column communities. The model DiDCMM and the algorithm DiMSC developed in this paper are useful to discover asymmetry for a directed network with mixed memberships. DiDCMM can also generate an artificially directed network with mixed memberships as a benchmark directed network for research purposes. We wish that DiDCMM and DiMSC can be widely applied in social network analysis.

The proposed model DiDCMM and the algorithm DiMSC can be extended in many ways. Similar to [24,57], we may obtain an ideal simplex from U using the idea of the entry-wise ratio proposed in [8]. Meanwhile, DiMSC is designed based on the SVD of the adjacency matrix, and similar to [5,7,11,30], we may design spectral algorithms based on the regularized Laplacian matrix under DiDCMM. Extending DiDCMM from an un-weighted directed network to a weighted directed network with an application of the distribution-free idea introduced in [62] is one of our future research directions. The SVD step of DiMSC can be accelerated by the random projection and random sampling ideas introduced in [70] to process large-scale directed networks. Instead of simply using eigengap to find K, in our future work, it is worth focusing on estimating the number of communities in a directed network generated under ScBM (and DCScBM) [30] and DiDCMM. Ref. [46] proposes an algorithm to uncover boundary nodes that spread information between communities in undirected social networks. It is an interesting topic to extend works in [46] to directed networks generated from ScBM, DCScBM, and DiDCMM. We leave them for our future work.

Funding

This research was funded by the Scientific research start-up fund of CUMT NO. 102520253, the High-level personal project of Jiangsu Province NO. JSSCBS20211218.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SBM	Stochastic Blockmodel
DCSBM	Degree Corrected Stochastic Blockmodel
MMSB	Mixed Membership Stochastic Blockmodel
DCMM	Degree Corrected Mixed Membership model
OCCAM	Overlapping Continuous Community Assignment model
ScBM	Stochastic co-Blockmodel
DC-ScBM	Degree Corrected Stochastic co-Blockmodel
DiMMSB	Directed Mixed Membership Stochastic Blockmodel
DiDCMM	Directed Degree Corrected Mixed Membership model
SP	Successive projection algorithm
SVD	Singular value decomposition
DiMSC	Directed Mixed Simplex & Cone algorithm

Appendix A. Proof for Identifiability

Appendix A.1. Proof of Proposition 1

Proof.

Let

Ω = U Λ V^{'}

be the compact singular value decomposition of

Ω

. Lemma 1 gives

V = Π_{c} B_{c} \equiv Π_{c} V (I_{c}, :)

. Since

Ω = \tilde{Ω}

, V also equals to

{\tilde{Π}}_{c} V (I_{c}, :)

, which gives that

Π_{c} = {\tilde{Π}}_{c}

.

Since

Ω (I_{r}, I_{c}) = Θ_{r} (I_{r}, I_{r}) Π_{r} (I_{r}, :) P Π_{c}^{'} (I_{c}, :) = Θ_{r} (I_{r}, I_{r}) P = U (I_{r}, :) Λ V^{'} (I_{c}, :)

by Condition (I2), we have

Θ_{r} (I_{r}, I_{r}) P = U (I_{r}, :) Λ V^{'} (I_{c}, :)

, which gives that

Θ_{r} (I_{r}, I_{r}) = diag (U (I_{r}, :) Λ V^{'} (I_{c}, :))

. From this step, we see that if P’s diagonal entries are not ones, we cannot obtain

Θ_{r} (I_{r}, I_{r}) = diag (U (I_{r}, :) Λ V^{'} (I_{c}, :))

which leads to a consequence that

Θ_{r} (I_{r}, I_{r})

does not equal to

{\tilde{Θ}}_{r} (I_{r}, I_{r})

; hence Condition (I1) is necessary by Condition (I1). Since

Ω = \tilde{Ω}

, we also have

{\tilde{Θ}}_{r} (I_{r}, I_{r}) = diag (U (I_{r}, :) Λ V^{'} (I_{c}, :))

, which gives that

Θ_{r} (I_{r}, I_{r}) = {\tilde{Θ}}_{r} (I_{r}, I_{r})

. Since

{\tilde{Θ}}_{r} (I_{r}, I_{r}) \tilde{P}

also equals to

U (I_{r}, :) Λ V^{'} (I_{c}, :)

, we have

P = \tilde{P}

.

Lemma 1 gives that

U = Θ_{r} Π_{r} B_{r}

, where

B_{r} = Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :)

. Since

Ω = \tilde{Ω}

, we also have

U = {\tilde{Θ}}_{r} {\tilde{Π}}_{r} {\tilde{B}}_{r}

. Since

{\tilde{B}}_{r} = {\tilde{Θ}}_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :) = Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :)

, we have

{\tilde{B}}_{r} = B_{r}

. Since

U = Θ_{r} Π_{r} B_{r} = {\tilde{Θ}}_{r} {\tilde{Π}}_{r} {\tilde{B}}_{r} = {\tilde{Θ}}_{r} {\tilde{Π}}_{r} B_{r}

, we have

Θ_{r} Π_{r} = {\tilde{Θ}}_{r} {\tilde{Π}}_{r}

. Since each row of

Π_{r}

or

{\tilde{Π}}_{r}

is a PMF,

Θ_{r} = {\tilde{Θ}}_{r}, Π_{r} = {\tilde{Π}}_{r}

, and the claim follows. □

Appendix B. Ideal Simplex, Ideal Cone

Appendix B.1. Proof of Lemma 1

Proof.

First, we consider U and V. Since

Ω = U Λ V^{'}

, we have

U = Ω V Λ^{- 1}

since

V^{'} V = I_{K}

. Recall that

Ω = Θ_{r} Π_{r} P Π_{c}^{'}

, we have

U = Θ_{r} Π_{r} P Π_{c}^{'} V Λ^{- 1} = Θ_{r} Π_{r} B_{r}

, where we set

B_{r} = P Π_{c}^{'} V Λ^{- 1}

and sure it is unique. Since

U (I_{r}, :) = Θ_{r} (I_{r}, I_{r}) Π_{r} (I_{r}, :) B_{r} = Θ_{r} (I_{r}, I_{r}) B_{r}

, we have

B_{r} = Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :)

.

Similarly, since

Ω = U Λ V^{'}

, we have

V^{'} = Λ^{- 1} U^{'} Ω

since

U^{'} U = I_{K}

, hence

V = Ω^{'} U Λ^{- 1}

. Recall that

Ω = Θ_{r} Π_{r} P Π_{c}^{'}

, we have

V = {(Θ_{r} Π_{r} P Π_{c}^{'})}^{'} U Λ^{- 1} = Π_{c} P^{'} Π_{r}^{'} Θ_{r} U Λ^{- 1} = Π_{c} B_{c}

, where we set

B_{c} = P^{'} Π_{r}^{'} Θ_{r} U Λ^{- 1}

and sure it is unique. Since

V (I_{c}, :) = Π_{c} (I_{c}, :) B_{c} = B_{c}

, we have

B_{c} = V (I_{c}, :)

. Meanwhile, for

1 \leq j \leq n_{c}

, we have

V (j, :) = e_{j}^{'} Π_{c} B_{c} = Π_{c} (j, :) B_{c}

. Hence, we have

V (j, :) = V (\bar{j}, :)

as long as

Π_{c} (j, :) = Π_{c} (\bar{j}, :)

.

Now, we show the ideal cone structure that appears in

U_{*}

. For convenience, set

M = Π_{r} B_{r}

, hence

U = Θ_{r} Π_{r} B_{r}

gives

U = Θ_{r} M

. Hence, we have

U (i, :) = e_{i}^{'} U = Θ_{r} (i, i) M (i, :)

. Therefore,

U_{*} (i, :) = \frac{U (i, :)}{{∥ U (i, :) ∥}_{F}} = \frac{M (i, :)}{{∥ M (i, :) ∥}_{F}}

, combine it with the fact that

B_{r} = Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :)

, we have

\begin{matrix} U_{*} & = [\begin{matrix} \frac{1}{{∥ M (1, :) ∥}_{F}} \\ \frac{1}{{∥ M (2, :) ∥}_{F}} \\ ⋱ \\ \frac{1}{∥ M (n_{r}, :) ∥_{F}} \end{matrix}] Π_{r} B_{r} \\ = [\begin{matrix} Π_{r} (1, :) / {∥ M (1, :) ∥}_{F} \\ Π_{r} (2, :) / {∥ M (2, :) ∥}_{F} \\ ⋮ \\ Π_{r} (n_{r}, :) / {∥ M (n_{r}, :) ∥}_{F} \end{matrix}] Θ_{r}^{- 1} (I_{r}, I_{r}) N_{U}^{- 1} (I_{r}, I_{r}) U_{*} (I_{r}, :) . \end{matrix}

Therefore, we have

\begin{matrix} Y = [\begin{matrix} Π_{r} (1, :) / {∥ M (1, :) ∥}_{F} \\ Π_{r} (2, :) / {∥ M (2, :) ∥}_{F} \\ ⋮ \\ Π_{r} (n_{r}, :) / {∥ M (n_{r}, :) ∥}_{F} \end{matrix}] Θ_{r}^{- 1} (I_{r}, I_{r}) N_{U}^{- 1} (I_{r}, I_{r}) = N_{M} Π_{r} Θ_{r}^{- 1} (I_{r}, I_{r}) N_{U}^{- 1} (I_{r}, I_{r}), \end{matrix}

where

N_{M}

is a diagonal matrix with

N_{M} (i, i) = \frac{1}{{∥ M (i, :) ∥}_{F}}

for

1 \leq i \leq n_{r}

. All entries of Y are nonnegative, and since we assume that each community has at least one pure node, no row of Y is 0.

Then, we prove that

U_{*} (i, :) = U_{*} (\bar{i}, :)

when

Π_{r} (i, :) = Π_{r} (\bar{i}, :)

. For

1 \leq i \leq n_{r}

, we have

\begin{matrix} U_{*} (i, :) & = e_{i}^{'} U_{*} = \frac{1}{{∥ M (i, :) ∥}_{F}} e_{i}^{'} M = \frac{1}{∥ Π_{r} (i, :) B_{r} ∥_{F}} Π_{r} (i, :) B_{r}, \end{matrix}

and the claim follows immediately. □

Appendix B.2. Proof of Lemma 2

Proof.

Since

I = U^{'} U = B_{r}^{'} Π_{r}^{'} Θ_{r}^{2} Π_{r} B_{r} = U^{'} (I_{r}, :) Θ_{r}^{- 1} (I_{r}, I_{r}) Π_{r}^{'} Θ_{r}^{2} Π_{r} Θ^{- 1} (I_{r}, I_{r}) U (I_{r}, :)

and

rank (U (I_{r}, :)) = K

(i.e., the inverse of

U (I_{r}, :)

exists), we have

{(U (I_{r}, :) U^{'} (I_{r}, :))}^{- 1} = Θ_{r}^{- 1} (I_{r}, I_{r}) Π_{r}^{'} Θ_{r}^{2} Π_{r} Θ_{r}^{- 1} (I_{r}, I_{r})

. Since

U_{*} (I_{r}, :) = N_{U} (I_{r}, I_{r}) U (I_{r}, :)

, we have

\begin{matrix} {(U_{*} (I_{r}, :) U_{*}^{'} (I_{r}, :))}^{- 1} = N_{U}^{- 1} (I_{r}, I_{r}) Θ^{- 1} (I_{r}, I_{r}) Π_{r}^{'} Θ_{r}^{2} Π_{r} Θ_{r}^{- 1} (I_{r}, I_{r}) N_{U}^{- 1} (I_{r}, I_{r}) . \end{matrix}

Since all entries of

N_{U}^{- 1} (I_{r}, I_{r}), Π_{r}, Θ_{r}

and nonnegative and

N, Θ_{r}

are diagonal matrices, we see that all entries of

{(U_{*} (I_{r}, :) U_{*}^{'} (I_{r}, :))}^{- 1}

are nonnegative, and its diagonal entries are strictly positive, hence we have

{(U_{*} (I_{r}, :) U_{*}^{'} (I_{r}, :))}^{- 1} 1 > 0

. □

Appendix B.3. Proof of Theorem 1

Proof.

For column nodes, Remark A1 guarantees that SP algorithm returns

I_{c}

when the input is V with K column communities, hence ideal DiMSC recovers

Π_{c}

exactly. For row nodes, Remark A2 guarantees that SVM-cone algorithm returns

I_{r}

when the input is

U_{*}

with K row communities, hence ideal DiMSC recovers

Π_{r}

exactly, and this theorem follows. □

Appendix C. Equivalence Algorithm

In this subsection, we design one algorithm DiMSC-equivalence which returns the same estimations as DiMSC. Set

U_{2} = U U^{'} \in R^{n_{r} \times n_{r}}, {\hat{U}}_{2} = \hat{U} {\hat{U}}^{'} \in R^{n_{r} \times n_{r}}, V_{2} = V V^{'} \in R^{n_{c} \times n_{c}}, {\hat{V}}_{2} = \hat{V} {\hat{V}}^{'} \in R^{n_{c} \times n_{c}}

. Set

U_{*, 2} \in R^{n_{r} \times n_{r}}

as

U_{*, 2} (i, :) = \frac{U_{2} (i, :)}{∥ U_{2} {(i, :) ∥}_{F}}

for

1 \leq i \leq n_{r}

.

{\hat{U}}_{*, 2}

is defined similarly. The next lemma guarantees that

V_{2}

enjoys IS structure, and

U_{*, 2}

enjoys IC structure.

Lemma A1.

Under

D i D C M M_{n_{r}, n_{c}} (K, P, Π_{r}, Π_{c}, Θ_{r})

, we have

V_{2} = Π_{c} V_{2} (I_{c}, :)

, and

U_{*, 2} = Y U_{*, 2} (I_{r}, :)

.

Proof.

By Lemma 1, we know that

V = Π_{c} V (I_{c}, :)

, which gives that

V_{2} = V V^{'} = Π_{c} V (I_{c}, :) V^{'} = Π_{c} (V V^{'}) (I_{c}, :) = Π_{c} V_{2} (I_{c}, :)

. For U, since

U = Θ_{r} Π_{r} Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :)

by Lemma 1, we have

U_{2} = U U^{'} = Θ_{r} Π_{r} Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :) U^{'} = Θ_{r} Π_{r} Θ_{r}^{- 1} (I_{r}, I_{r}) (U U^{'}) (I_{r}, :) = Θ_{r} Π_{r} Θ_{r}^{- 1} (I_{r}, I_{r}) U_{2} (I_{r}, :)

. Set

M_{2} = Π_{r} Θ_{r}^{- 1} (I_{r}, I_{r}) U_{2} (I_{r}, :)

, we have

U_{2} = Θ_{r} M_{2}

. Then, follow similar proof as Lemma 1, we have

U_{*, 2} = Y_{2} U_{*, 2} (I_{r}, :)

, where

Y_{2} = N_{M_{2}} Π_{r} Θ_{r}^{- 1} (I_{r}, I_{r}) N_{U_{2}}^{- 1} (I_{r}, I_{r})

, and

N_{M_{2}}, N_{U_{2}}

are

n_{r} \times n_{r}

diagonal matrices whose i-th diagonal entries are

\frac{1}{∥ M_{2} {(i, :) ∥}_{F}}, \frac{1}{∥ U_{2} {(i, :) ∥}_{F}}

, respectively. Since

∥ U_{2} {(i, :) ∥}_{F} = ∥ U (i, :) U^{'} ∥_{F} = {∥ U (i, :) ∥}_{F}

, we have

N_{U_{2}} = N_{U}

. Since

∥ M_{2} {(i, :) |}_{F} = ∥ Π_{r} Θ_{r}^{- 1} (I_{r}, I_{r}) U_{2} (I_{r}, :) ∥_{F} = ∥ Π_{r} Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :) U^{'} ∥_{F} = {∥ M (i, :) ∥}_{F}

, we have

N_{M_{2}} = N_{M}

. Hence,

Y_{2} \equiv Y

and the claim follows. □

Since

U_{*, 2} (I_{r}, :) \in R^{K \times n_{r}}

and

V_{2} (I_{c}, :) \in R^{K \times n_{c}}

,

U_{*, 2} (I_{r}, :)

and

V_{2} (I_{c}, :)

are singular matrix with rank K by Condition (I1), while the inverses of

U_{*, 2} (I_{r}, :) U_{*, 2}^{'} (I_{r}, :)

and

V_{2} (I_{c}, :) V_{2}^{'} (I_{c}, :)

exist. Therefore, Lemma A1 gives that

\begin{matrix} Y = U_{*, 2} U_{*, 2}^{'} (I_{r}, :) {(U_{*, 2} (I_{r}, :) U_{*, 2}^{'} (I_{r}, :))}^{- 1}, Π_{c} = V_{2} V_{2}^{'} (I_{c}, :) {(V_{2} (I_{c}, :) V_{2}^{'} (I_{c}, :))}^{- 1} . \end{matrix}

Since

U_{*, 2} = N_{U} U_{2}

and

Y = N_{M} Π_{r} Θ_{r}^{- 1} (, I_{r}, I_{r}) N_{U}^{- 1} (I_{r}, I_{r})

, we see that

Y_{*}

also equals to

U_{2} U_{*, 2}^{'} (I_{r}, :) {(U_{*, 2} (I_{r}, :) U_{*, 2}^{'} (I_{r}, :))}^{- 1}

by basic algebra.

Based on the above analysis, we are now ready to give the ideal DiMSC-equivalence. Input

Ω

. Output:

Π_{r}

and

Π_{c}

.

Obtain $U, Λ, V, U_{*, 2}, V_{2}$ from $Ω$ .
Run SP algorithm on $V_{2}$ with K column communities to obtain $V_{2} (I_{c}, :)$ . Run SVM-cone algorithm on $U_{*, 2}$ with K row communities to obtain $I_{r}$ .
Set $J_{*} = diag (U_{*} (I_{r}, :) Λ V^{'} (I_{c}, :)), Y_{*} = U_{2} U_{*, 2}^{'} (I_{r}, :) {(U_{*, 2} (I_{r}, :) U_{*, 2}^{'} (I_{r}, :))}^{- 1}, Z_{r} = Y_{*} J_{*}$ and $Z_{c} = V_{2} V_{2}^{'} (I_{c}, :) {(V_{2} (I_{c}, :) V_{2}^{'} (I_{c}, :))}^{- 1}$ .
Recover $Π_{r}$ and $Π_{c}$ by setting $Π_{r} (i, :) = \frac{Z_{r} (i, :)}{∥ Z_{r} {(i, :) ∥}_{1}}$ for $1 \leq i \leq n_{r}$ , and $Π_{c} (j, :) = \frac{Z_{c} (j, :)}{∥ Z_{c} {(j, :) ∥}_{1}}$ for $1 \leq j \leq n_{c}$ .

For the real case, set

{\hat{U}}_{2} = \hat{U} {\hat{U}}^{'}, {\hat{V}}_{2} = \hat{V} {\hat{V}}^{'}, {\hat{U}}_{*, 2} = N_{\hat{U}} {\hat{U}}_{2}

. We now extend the ideal case to the real one given by Algorithm A1.

Algorithm A1: DiMSC-equivalence

Require: The adjacency matrix

A \in R^{n_{r} \times n_{c}}

of a directed network, the number of row communities (column communities) K.
Ensure: The estimated

n_{r} \times K

row membership matrix

{\hat{Π}}_{r, 2}

and the estimated

n_{c} \times K

column membership matrix

{\hat{Π}}_{c, 2}

.

1:: Obtain $\tilde{A} = \hat{U} \hat{Λ} {\hat{V}}^{'}$ , the top-K-dimensional SVD of A. Compute ${\hat{U}}_{*}, {\hat{U}}_{2}, {\hat{V}}_{2}, {\hat{U}}_{*, 2}$ .
2:: Apply SP algorithm on the rows of ${\hat{V}}_{2}$ assuming there are K column communities to obtain ${\hat{I}}_{c, 2}$ , the index set returned by SP algorithm.
3:: Apply SVM-cone algorithm on the rows of ${\hat{U}}_{*, 2}$ with K row communities to obtain ${\hat{I}}_{r, 2}$ , the index set returned by SVM-cone algorithm.
4:: Set ${\hat{J}}_{*, 2} = diag ({\hat{U}}_{*} ({\hat{I}}_{r, 2}, :) \hat{Λ} {\hat{V}}^{'} ({\hat{I}}_{c, 2}, :)), {\hat{Y}}_{*, 2} = {\hat{U}}_{2} {\hat{U}}_{*, 2}^{'} ({\hat{I}}_{r, 2}, :) {({\hat{U}}_{*, 2} ({\hat{I}}_{r, 2}, :) {\hat{U}}_{*, 2}^{'} ({\hat{I}}_{r, 2}, :))}^{- 1}, {\hat{Z}}_{r, 2} = {\hat{Y}}_{*, 2} {\hat{J}}_{*, 2}$ and ${\hat{Z}}_{c, 2} = {\hat{V}}_{2} {\hat{V}}_{2}^{'} ({\hat{I}}_{c, 2}, :) {({\hat{V}}_{2} ({\hat{I}}_{c, 2}, :) {\hat{V}}_{2}^{'} ({\hat{I}}_{c, 2}, :))}^{- 1}$ . Then, set ${\hat{Z}}_{r, 2} = \max (0, {\hat{Z}}_{r, 2})$ and ${\hat{Z}}_{c, 2} = \max (0, {\hat{Z}}_{c, 2})$ .
5:: Estimate $Π_{r} (i, :)$ by ${\hat{Π}}_{r, 2} (i, :) = {\hat{Z}}_{r, 2} (i, :) / {∥ {\hat{Z}}_{r, 2} (i, :) ∥}_{1}, 1 \leq i \leq n_{r}$ and estimate $Π_{c} (j, :)$ by ${\hat{Π}}_{c, 2} (j, :) = {\hat{Z}}_{c, 2} (j, :) / {∥ {\hat{Z}}_{c, 2} (j, :) ∥}_{1}, 1 \leq j \leq n_{c}$ .

Lemma A2.(Equivalence). For the empirical case, we have

{\hat{I}}_{r, 2} \equiv {\hat{I}}_{r}, {\hat{I}}_{c, 2} \equiv {\hat{I}}_{c}, {\hat{U}}_{*, 2} ({\hat{I}}_{r, 2}, :) {\hat{U}}_{*, 2}^{'} ({\hat{I}}_{r, 2}, :) \equiv {\hat{U}}_{*} ({\hat{I}}_{r}, :) {\hat{U}}_{*}^{'} ({\hat{I}}_{r}, :), {\hat{Y}}_{*, 2} \equiv {\hat{Y}}_{*}, {\hat{J}}_{*, 2} \equiv {\hat{J}}_{*}, {\hat{Z}}_{r, 2} \equiv {\hat{Z}}_{r}, {\hat{Z}}_{c, 2} \equiv {\hat{Z}}_{c}, {\hat{Π}}_{r, 2} \equiv {\hat{Π}}_{r}

and

{\hat{Π}}_{c, 2} \equiv {\hat{Π}}_{c}

.

Proof.

For column nodes, Lemma 3.2 [27] gives

{\hat{I}}_{c} = {\hat{I}}_{c, 2}

(i.e., SP algorithm will return the same indices on both

\hat{V}

and

{\hat{V}}_{2}

.), which gives that

{\hat{V}}_{2} {\hat{V}}_{2}^{'} ({\hat{I}}_{c, 2}, :) = {\hat{V}}_{2} {\hat{V}}_{2}^{'} ({\hat{I}}_{c}, :) = \hat{V} {\hat{V}}^{'} {((\hat{V} {\hat{V}}^{'}) ({\hat{I}}_{c}, :))}^{'} = \hat{V} {\hat{V}}^{'} {(\hat{V} ({\hat{I}}_{c}, :) {\hat{V}}^{'})}^{'} = \hat{V} {\hat{V}}^{'} \hat{V} {\hat{V}}^{'} ({\hat{I}}_{c}, :) = \hat{V} {\hat{V}}^{'} ({\hat{I}}_{c}, :)

, and

{\hat{V}}_{2} ({\hat{I}}_{c, 2}, :) {\hat{V}}_{2}^{'} ({\hat{I}}_{c, 2}, :) = {\hat{V}}_{2} ({\hat{I}}_{c}, :) {\hat{V}}_{2}^{'} ({\hat{I}}_{c}, :) = \hat{V} ({\hat{I}}_{c}, :) {\hat{V}}^{'} {(\hat{V} ({\hat{I}}_{c}, :) {\hat{V}}^{'})}^{'} = \hat{V} ({\hat{I}}_{c}, :) {\hat{V}}^{'} ({\hat{I}}_{c}, :)

. Therefore, we have

{\hat{Z}}_{c, 2} = {\hat{Z}}_{c}, {\hat{Π}}_{c, 2} = {\hat{Π}}_{c}

.

For row nodes, Lemma G.1 [26] guarantees that

{\hat{I}}_{r} = {\hat{I}}_{r, 2}

(i.e., SVM-cone algorithm will return the same indices on both

{\hat{U}}_{*}

and

{\hat{U}}_{*, 2}

.), so immediately we have

{\hat{J}}_{*, 2} = {\hat{J}}_{*}

. Since

{\hat{U}}_{*, 2} ({\hat{I}}_{r, 2}, :) = {\hat{U}}_{*, 2} ({\hat{I}}_{r}, :) = N_{\hat{U}} ({\hat{I}}_{r}, {\hat{I}}_{r}) {\hat{U}}_{2} ({\hat{I}}_{r}, :) = N_{\hat{U}} ({\hat{I}}_{r}, {\hat{I}}_{r}) \hat{U} ({\hat{I}}_{r}, :) {\hat{U}}^{'} = {\hat{U}}_{*} ({\hat{I}}_{r}, :) {\hat{U}}^{'}

, we have

{\hat{U}}_{2} {\hat{U}}_{*, 2}^{'} ({\hat{I}}_{r, 2}, :) = {\hat{U}}_{2} {\hat{U}}_{*, 2}^{'} ({\hat{I}}_{r}, :) = \hat{U} {\hat{U}}^{'} \hat{U} {\hat{U}}_{*}^{'} ({\hat{I}}_{r}, :) = \hat{U} {\hat{U}}_{*}^{'} ({\hat{I}}_{r}, :)

and

{({\hat{U}}_{*, 2} ({\hat{I}}_{r, 2}, :) {\hat{U}}_{*, 2}^{'} ({\hat{I}}_{r, 2}, :))}^{- 1} = {({\hat{U}}_{*, 2} ({\hat{I}}_{r}, :) {\hat{U}}_{*, 2}^{'} ({\hat{I}}_{r}, :))}^{- 1} = {({\hat{U}}_{*} ({\hat{I}}_{r}, :) {\hat{U}}_{*}^{'} ({\hat{I}}_{r}, :))}^{- 1}

, which give that

{\hat{Y}}_{*, 2} = {\hat{Y}}_{*}

, and the claim follows immediately. □

Lemma A2 guarantees that the DiMSC and DiMSC-equivalence return same estimations for both row and column nodes’s memberships. In this article, we introduce the DiMSC-equivalence algorithm since it is helpful to build a theoretical framework for DiMSC, see Remark A3 and A4 for detail.

Appendix D. Basic Properties of Ω

Lemma A3.

Under

D i D C M M_{n_{r}, n_{c}} (K, P, Π_{r}, Π_{c}, Θ_{r})

, we have

\begin{matrix} \frac{θ_{r, \min}}{θ_{r, \max} \sqrt{K λ_{1} (Π_{r}^{'} Π_{r})}} \leq {∥ U (i, :) ∥}_{F} \leq \frac{θ_{r, \max}}{θ_{r, \min} \sqrt{λ_{K} (Π_{r}^{'} Π_{r})}}, 1 \leq i \leq n_{r}, \\ \sqrt{\frac{1}{K λ_{1} (Π_{c}^{'} Π_{c})}} \leq {∥ V (j, :) ∥}_{F} \leq \sqrt{\frac{1}{λ_{K} (Π_{c}^{'} Π_{c})}}, 1 \leq j \leq n_{c} . \end{matrix}

Proof.

Since

I = U^{'} U = U^{'} (I_{r}, :) Θ_{r}^{- 1} (I_{r}, I_{r}) Π_{r}^{'} Θ_{r}^{2} Π_{r} Θ^{- 1} (I_{r}, I_{r}) U (I_{r}, :)

, we have

\begin{matrix} ((Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :)) {({(Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :))}^{'})}^{- 1} = Π_{r}^{'} Θ_{r}^{2} Π_{r}, \end{matrix}

which gives that

\begin{matrix} \max_{k} {∥ e_{k}^{'} (Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :)) ∥}_{F}^{2} & = \max_{k} e_{k}^{'} (Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :)) {(Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :))}^{'} e_{k} \\ \leq \max_{{∥ x ∥}_{F} = 1} x^{'} (Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :)) {(Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :))}^{'} x \\ = λ_{1} ((Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :)) {(Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :))}^{'}) \\ = \frac{1}{λ_{K} (Π_{r}^{'} Θ_{r}^{2} Π_{r})} \leq \frac{1}{θ_{r, \min}^{2} λ_{K} (Π_{r}^{'} Π_{r})} . \end{matrix}

Similarly, we have

\begin{matrix} \min_{k} {∥ e_{k}^{'} (Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :)) ∥}_{F}^{2} \geq \frac{1}{λ_{1} (Π_{r}^{'} Θ_{r}^{2} Π_{r})} \geq \frac{1}{θ_{r, \max}^{2} λ_{1} (Π_{r}^{'} Π_{r})} . \end{matrix}

Since

U (i, :) = e_{i}^{'} U = e_{i}^{'} Θ_{r} Π_{r} Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :) = θ_{r} (i) Π_{r} (i, :) Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :)

for

1 \leq i \leq n_{r}

, we have

\begin{matrix} {∥ U (i, :) ∥}_{F} & = ∥ θ_{r} (i) Π_{r} (i, :) Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :) ∥_{F} = θ_{r} (i) {∥ Π_{r} (i, :) Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :) ∥}_{F} \\ \leq θ_{r} (i) \max_{i} ∥ Π_{r} {(i, :) ∥}_{F} \max_{i} {∥ e_{i}^{'} (Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :)) ∥}_{F} \\ \leq θ_{r} (i) \max_{i} {∥ e_{i}^{'} (Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :)) ∥}_{F} \leq \frac{θ_{r, \max}}{θ_{r, \min} \sqrt{λ_{K} (Π_{r}^{'} Π_{r})}} . \end{matrix}

Similarly, we have

\begin{matrix} {∥ U (i, :) ∥}_{F} & \geq θ_{r} (i) \min_{i} ∥ Π_{r} {(i, :) ∥}_{F} \min_{i} {∥ e_{i}^{'} (Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :)) ∥}_{F} \\ \geq θ_{r} (i) \min_{i} {∥ e_{i}^{'} (Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :)) ∥}_{F} / \sqrt{K} \geq \frac{θ_{r, \min}}{θ_{r, \max} \sqrt{K λ_{1} (Π_{r}^{'} Π_{r})}} . \end{matrix}

For

{∥ V (j, :) ∥}_{F}

, since

V = Π_{c} B_{c}

, we have

\begin{matrix} \min_{j} {∥ e_{j}^{'} V ∥}_{F}^{2} & = \min_{j} e_{j}^{'} V V^{'} e_{j} = \min_{j} Π_{c} (j, :) B_{c} B_{c}^{'} Π_{c}^{'} (j, :) \\ = \min_{j} {∥ Π_{c} (j, :) ∥}_{F}^{2} \frac{Π_{c} (j, :)}{∥ Π_{c} {(j, :) ∥}_{F}} B_{c} B_{c}^{'} \frac{Π_{c}^{'} (j, :)}{∥ Π_{c} {(j, :) ∥}_{F}} \\ \geq \min_{j} ∥ Π_{c} {(j, :) ∥}_{F}^{2} \min_{{∥ x ∥}_{F} = 1} x^{'} B_{c} B_{c}^{'} x = \min_{j} {∥ Π_{c} (j, :) ∥}_{F}^{2} λ_{K} (B_{c} B_{c}^{'}) \\ \overset{By Lemma A 4}{=} \frac{\min_{j} {∥ Π_{c} (j, :) ∥}_{F}^{2}}{λ_{1} (Π_{c}^{'} Π_{c})} \geq \frac{1}{K λ_{1} (Π_{c}^{'} Π_{c})} . \end{matrix}

Meanwhile,

\begin{matrix} \max_{j} {∥ e_{j}^{'} V ∥}_{F}^{2} & = \max_{j} {∥ Π_{c} (j, :) ∥}_{F}^{2} \frac{Π_{c} (j, :)}{∥ Π_{c} {(j, :) ∥}_{F}} B_{c} B_{c}^{'} \frac{Π_{c}^{'} (j, :)}{∥ Π_{c} {(j, :) ∥}_{F}} \\ \leq \max_{j} ∥ Π_{c} {(j, :) ∥}_{F}^{2} \max_{{∥ x ∥}_{F} = 1} x^{'} B_{c} B_{c}^{'} x = \max_{j} {∥ Π_{c} (j, :) ∥}_{F}^{2} λ_{K} (B_{c} B_{c}^{'}) \\ \overset{By Lemma A 4}{=} \frac{\max_{j} {∥ Π_{c} (j, :) ∥}_{F}^{2}}{λ_{K} (Π_{c}^{'} Π_{c})} \leq \frac{1}{λ_{K} (Π_{c}^{'} Π_{c})} . \end{matrix}

□

Lemma A4.

Under

D i D C M M_{n_{r}, n_{c}} (K, P, Π_{r}, Π_{c}, Θ_{r})

, we have

\begin{matrix} \frac{θ_{r, \min}^{2} λ_{K} (Π_{r}^{'} Π_{r})}{θ_{r, \max}^{2} λ_{1} (Π_{r}^{'} Π_{r})} \leq λ_{K} (U_{*} (I_{r}, :) U_{*}^{'} (I_{r}, :)), λ_{1} (U_{*} (I_{r}, :) U_{*}^{'} (I_{r}, :)) \leq \frac{θ_{r, \max}^{2} K λ_{1} (Π_{r}^{'} Π_{r})}{θ_{r, \min}^{2} λ_{K} (Π_{r}^{'} Π_{r})}, \\ and λ_{1} (B_{c} B_{c}^{'}) = \frac{1}{λ_{K} (Π_{c}^{'} Π_{c})}, λ_{K} (B_{c} B_{c}^{'}) = \frac{1}{λ_{1} (Π_{c}^{'} Π_{c})} . \end{matrix}

Proof.

Recall that

V = Π_{c} B_{c}

and

V^{'} V = I

, we have

I = B_{c}^{'} Π_{c}^{'} Π_{c} B_{c}

. As

B_{c}

is full rank, we have

Π_{c}^{'} Π_{c} = {(B_{c} B_{c}^{'})}^{- 1}

, which gives

\begin{matrix} λ_{1} (B_{c} B_{c}^{'}) = \frac{1}{λ_{K} (Π_{c}^{'} Π_{c})}, λ_{K} (B_{c} B_{c}^{'}) = \frac{1}{λ_{1} (Π_{c}^{'} Π_{c})} . \end{matrix}

By the proof of Lemma 2, we know that

\begin{matrix} {(U_{*} (I_{r}, :) U_{*}^{'} (I_{r}, :))}^{- 1} = N_{U}^{- 1} (I_{r}, I_{r}) Θ^{- 1} (I_{r}, I_{r}) Π_{r}^{'} Θ_{r}^{2} Π_{r} Θ_{r}^{- 1} (I_{r}, I_{r}) N_{U}^{- 1} (I_{r}, I_{r}), \end{matrix}

which gives that

\begin{matrix} U_{*} (I_{r}, :) U_{*}^{'} (I_{r}, :) = N_{U} (I_{r}, I_{r}) Θ (I_{r}, I_{r}) {(Π_{r}^{'} Θ_{r}^{2} Π_{r})}^{- 1} Θ_{r} (I_{r}, I_{r}) N_{U} (I_{r}, I_{r}) . \end{matrix}

Then, we have

\begin{matrix} λ_{1} (U_{*} (I_{r}, :) U_{*}^{'} (I_{r}, :)) = λ_{1} (N_{U} (I_{r}, I_{r}) Θ (I_{r}, I_{r}) {(Π_{r}^{'} Θ_{r}^{2} Π_{r})}^{- 1} Θ_{r} (I_{r}, I_{r}) N_{U} (I_{r}, I_{r})) \\ = λ_{1} (N_{U}^{2} (I_{r}, I_{r}) Θ_{r}^{2} (I_{r}, I_{r}) {(Π_{r}^{'} Θ_{r}^{2} Π_{r})}^{- 1}) \leq λ_{1}^{2} (N_{U} (I_{r}, I_{r}) Θ_{r} (I_{r}, I_{r})) λ_{1} ({(Π_{r}^{'} Θ_{r}^{2} Π_{r})}^{- 1}) \\ = λ_{1}^{2} (N_{U} (I_{r}, I_{r}) Θ_{r} (I_{r}, I_{r})) / λ_{K} (Π_{r}^{'} Θ_{r}^{2} Π_{r}) \leq (\max_{i \in I_{r}} θ_{r} (i) / ∥ U (i, :) {∥_{F})}^{2} / λ_{K} (Π_{r}^{'} Θ_{r}^{2} Π_{r}) \\ \leq \frac{θ_{r, \max}^{2} K λ_{1} (Π_{r}^{'} Π_{r})}{λ_{K} (Π_{r}^{'} Θ_{r}^{2} Π_{r})} \leq \frac{θ_{r, \max}^{2} K λ_{1} (Π_{r}^{'} Π_{r})}{θ_{r, \min}^{2} λ_{K} (Π_{r}^{'} Π_{r})} . \end{matrix}

Similarly, we have

\begin{matrix} λ_{K} (U_{*} (I_{r}, :) U_{*}^{'} (I_{r}, :)) = λ_{K} (N_{U} (I_{r}, I_{r}) Θ (I_{r}, I_{r}) {(Π_{r}^{'} Θ_{r}^{2} Π_{r})}^{- 1} Θ_{r} (I_{r}, I_{r}) N_{U} (I_{r}, I_{r})) \\ = λ_{K} (N_{U}^{2} (I_{r}, I_{r}) Θ_{r}^{2} (I_{r}, I_{r}) {(Π_{r}^{'} Θ_{r}^{2} Π_{r})}^{- 1}) \geq λ_{K}^{2} (N_{U} (I_{r}, I_{r}) Θ_{r} (I_{r}, I_{r})) λ_{K} ({(Π_{r}^{'} Θ_{r}^{2} Π_{r})}^{- 1}) \\ = λ_{K}^{2} (N_{U} (I_{r}, I_{r}) Θ_{r} (I_{r}, I_{r})) / λ_{1} (Π_{r}^{'} Θ_{r}^{2} Π_{r}) \geq (\min_{i \in I_{r}} θ_{r} (i) / ∥ U (i, :) {∥_{F})}^{2} / λ_{1} (Π_{r}^{'} Θ_{r}^{2} Π_{r}) \\ \geq \frac{θ_{r, \min}^{2} λ_{K} (Π_{r}^{'} Π_{r})}{λ_{1} (Π_{r}^{'} Θ_{r}^{2} Π_{r})} \geq \frac{θ_{r, \min}^{2} λ_{K} (Π_{r}^{'} Π_{r})}{θ_{r, \max}^{2} λ_{1} (Π_{r}^{'} Π_{r})} . \end{matrix}

□

Lemma A5.

Under

D i D C M M_{n_{r}, n_{c}} (K, P, Π_{r}, Π_{c}, Θ_{r})

, we have

\begin{matrix} σ_{K} (Ω) \geq θ_{r, \min} σ_{K} (P) σ_{K} (Π_{r}) σ_{K} (Π_{c}), σ_{1} (Ω) \leq θ_{r, \max} σ_{1} (P) σ_{1} (Π_{r}) σ_{1} (Π_{c}) . \end{matrix}

Proof.

For

σ_{K} (Ω)

, we have

\begin{matrix} σ_{K}^{2} (Ω) = λ_{K} (Ω Ω^{'}) = λ_{K} (Θ_{r} Π_{r} P Π_{c}^{'} Π_{c} P^{'} Π_{r}^{'} Θ_{r}) = λ_{K} (Θ_{r}^{2} Π_{r} P Π_{c}^{'} Π_{c} P^{'} Π_{r}^{'}) \\ \geq θ_{r, \min}^{2} λ_{K} (Π_{r}^{'} Π_{r} P Π_{c}^{'} Π_{c} P^{'}) \geq θ_{r, \min}^{2} λ_{K} (Π_{r}^{'} Π_{r}) λ_{K} (P Π_{c}^{'} Π_{c} P^{'}) = θ_{r, \min}^{2} λ_{K} (Π_{r}^{'} Π_{r}) λ_{K} (Π_{c}^{'} Π_{c} P^{'} P) \\ \geq θ_{r, \min}^{2} λ_{K} (Π_{r}^{'} Π_{r}) λ_{K} (Π_{c}^{'} Π_{c}) λ_{K} (P P^{'}) = θ_{r, \min}^{2} σ_{K}^{2} (Π_{r}) σ_{K}^{2} (Π_{c}) σ_{K}^{2} (P), \end{matrix}

where we have used the fact for any matrices

X, Y

, the nonzero eigenvalues of

X Y

are the same as the nonzero eigenvalues of

Y X

. Following a similar analysis, the lemma follows. □

Lemma A6.

Under

D i D C M M_{n_{r}, n_{c}} (K, P, Π_{r}, Π_{c}, Θ_{r})

, when Assumption A1 holds, with probability at least

1 - o ({(n_{r} + n_{c})}^{- 3})

, we have

\begin{matrix} ∥ A - Ω ∥ = O (\sqrt{P_{\max} \max (∥ θ_{r} ∥_{1}, θ_{r, \max} n_{c}) \log (n_{r} + n_{c})}) . \end{matrix}

Proof.

Since the proof is similar to that of Lemma 7 [35], we omit most of the details. Let

e_{i}

be an

n_{r} \times 1

vector, where

e_{i} (i) = 1

and 0 elsewhere, for row nodes

1 \leq i \leq n_{r}

, and

{\tilde{e}}_{j}

be an

n_{c} \times 1

vector, where

{\tilde{e}}_{j} (j) = 1

and 0 elsewhere, for column nodes

1 \leq j \leq n_{c}

. Set

W = \sum_{i = 1}^{n^{r}} \sum_{j = 1}^{n_{c}} W (i, j) e_{i} {\tilde{e}}_{j}^{'}

, where

W = A - Ω

. Set

W^{(i, j)} = W (i, j) e_{i} {\tilde{e}}_{j}^{'}

, for

1 \leq i \leq n_{r}, 1 \leq j \leq n_{c}

. Then, we have

E (W^{(i, j)}) = 0

. For

1 \leq i \leq n_{r}, 1 \leq j \leq n_{c}

, we have

\begin{matrix} ∥ W^{(i, j)} ∥ & = ∥ W (i, j) e_{i} {\tilde{e}}_{j}^{'} ∥ = | A (i, j) - Ω (i, j) | \leq 1 . \end{matrix}

Next, we consider the variance parameter

\begin{matrix} σ^{2} : = \max (∥ \sum_{i = 1}^{n_{r}} \sum_{j = 1}^{n_{c}} E (W^{(i, j)} {(W^{(i, j)})}^{'}) ∥, ∥ \sum_{i = 1}^{n_{r}} \sum_{j = 1}^{n_{c}} E ({(W^{(i, j)})}^{'} W^{(i, j)}) ∥) . \end{matrix}

Since

\begin{matrix} E (W^{2} (i, j)) = E ({(A (i, j) - Ω (i, j))}^{2}) = Var (A (i, j)), \end{matrix}

where

Var (A (i, j))

denotes the variance of the Bernoulli random variable

A (i, j)

, we have

\begin{matrix} E (W^{2} (i, j)) & = Var (A (i, j)) = P (A (i, j) = 1) (1 - P (A (i, j) = 1)) \\ \leq P (A (i, j) = 1) = Ω (i, j) = e_{i}^{'} Θ_{r} Π_{r} P Π_{c}^{'} {\tilde{e}}_{j} = θ_{r} (i) e_{i}^{'} Π_{r} P Π_{c}^{'} {\tilde{e}}_{j} \leq θ_{r} (i) P_{\max} . \end{matrix}

Since

e_{i} e_{i}^{'}

is an

n_{r} \times n_{r}

diagonal matrix with

(i, i)

-th entry being one and other entries being zero, we have

\begin{matrix} ∥ \sum_{i = 1}^{n_{r}} \sum_{j = 1}^{n_{c}} E (W^{(i, j)} {(W^{(i, j)})}^{'}) ∥ = ∥ \sum_{i = 1}^{n_{r}} \sum_{j = 1}^{n_{c}} E (W^{2} (i, j)) e_{i} e_{i}^{'} ∥ = \max_{1 \leq i \leq n_{r}} | \sum_{j = 1}^{n_{c}} E (W^{2} (i, j)) | \leq θ_{r, \max} P_{\max} n_{c} . \end{matrix}

Similarly, we have

∥ \sum_{i = 1}^{n_{r}} \sum_{j = 1}^{n_{c}} E ({(W^{(i, j)})}^{'} W^{(i, j)}) ∥ \leq P_{\max} {∥ θ_{r} ∥}_{1}

, which gives that

\begin{matrix} σ^{2} = \max (∥ \sum_{i = 1}^{n_{r}} \sum_{j = 1}^{n_{c}} E (W^{(i, j)} {(W^{(i, j)})}^{'}) ∥, ∥ \sum_{i = 1}^{n_{r}} \sum_{j = 1}^{n_{c}} E ({(W^{(i, j)})}^{'} W^{(i, j)}) ∥) \leq P_{\max} \max (∥ θ_{r} ∥_{1}, θ_{r, \max} n_{c}) . \end{matrix}

By the rectangular version of the Bernstein inequality [71], combining with

σ^{2} \leq P_{\max} \max (∥ θ_{r} ∥_{1}, θ_{r, \max} n_{c}), R = 1, d_{1} + d_{2} = n_{r} + n_{c}

, set

t = \frac{α + 1 + \sqrt{α^{2} + 20 α + 19}}{3} \sqrt{P_{\max} \max (∥ θ_{r} ∥_{1}, θ_{r, \max} n_{c}) \log (n_{r} + n_{c})}

for any

α > 0

, we have

\begin{matrix} P (∥ W ∥ \geq t) = P (∥ \sum_{i = 1}^{n^{r}} \sum_{j = 1}^{n_{c}} W^{(i, j)} ∥ \geq t) \leq (n_{r} + n_{c}) \exp (- \frac{t^{2} / 2}{σ^{2} + \frac{R t}{3}}) \\ \leq (n_{r} + n_{c}) \exp (- \frac{t^{2} / 2}{P_{\max} \max (∥ θ_{r} ∥_{1}, θ_{r, \max} n_{c}) + t / 3}) \\ = (n_{r} + n_{c}) \exp (- (α + 1) \log (n_{r} + n_{c}) \cdot \frac{1}{\frac{18}{{(\sqrt{α + 19} + \sqrt{α + 1})}^{2}} + \frac{2 \sqrt{α + 1}}{\sqrt{α + 19} + \sqrt{α + 1}} \sqrt{\frac{\log (n_{r} + n_{c})}{P_{\max} \max (∥ θ_{r} ∥_{1}, θ_{r, \max} n_{c})}}}) \\ \leq (n_{r} + n_{c}) \exp (- (α + 1) \log (n_{r} + n_{c})) = \frac{1}{{(n_{r} + n_{c})}^{α}}, \end{matrix}

where we have used Assumption 1 in the last inequality. Set

α = 3

, and the claim follows. □

Appendix E. Proof of Consistency for DiMSC

Similar to [24,26,27], for our DiMSC, the main theoretical results (i.e., Theorem 2) rely on the row-wise singular vector deviation bounds for the singular eigenvectors of the adjacency matrix.

Lemma A7. (Row-wise singular vector deviation) Under

D i D C M M_{n_{r}, n_{c}} (K, P, Π_{r}, Π_{c}, Θ_{r})

, when Assumption 1 holds, suppose

σ_{K} (Ω) \geq C \sqrt{θ_{r, \max} (n_{r} + n_{c}) \log (n_{r} + n_{c})}

, with probability at least

1 - o ({(n_{r} + n_{c})}^{- 3})

, we have

\begin{matrix} \max (∥ \hat{U} {\hat{U}}^{'} - U U^{'} ∥_{2 \to \infty}, ∥ \hat{V} {\hat{V}}^{'} - V V^{'} ∥_{2 \to \infty}) = O (\frac{\sqrt{P_{\max} θ_{r, \max} K \log (n_{r} + n_{c})}}{θ_{r, \min} σ_{K} (P) σ_{K} (Π_{r}) σ_{K} (Π_{c})}) . \end{matrix}

Proof.

Let

H_{\hat{U}} = {\hat{U}}^{'} U

, and

H_{\hat{U}} = U_{H_{\hat{U}}} Σ_{H_{\hat{U}}} V_{H_{\hat{U}}}^{'}

be the SVD decomposition of

H_{\hat{U}}

with

U_{H_{\hat{U}}}, V_{H_{\hat{U}}} \in R^{n_{r} \times K}

, where

U_{H_{\hat{U}}}

and

V_{H_{\hat{U}}}

represent, respectively, the left and right singular matrices of

H_{\hat{U}}

. Define

sgn (H_{\hat{U}}) = U_{H_{\hat{U}}} V_{H_{\hat{U}}}^{'}

;

sgn (H_{\hat{V}})

is defined similarly. Since

E (A (i, j) - Ω (i, j)) = 0

,

E [{(A (i, j) - Ω (i, j))}^{2}] \leq θ_{r} (i) P_{\max} \leq θ_{r, \max} P_{\max}

by the proof of Lemma A6,

\frac{1}{\sqrt{θ_{r, \max} P_{\max} \min (n_{r}, n_{c}) / (μ \log (n_{r} + n_{c}))}} \leq O (1)

holds by Assumption 1, where

μ

is the incoherence parameter defined as

μ = \max (\frac{n_{r} {∥ U ∥}_{2 \to \infty}^{2}}{K}, \frac{n_{c} {∥ V ∥}_{2 \to \infty}^{2}}{K})

. By Theorem 4.4 [64], with high probability, we have below row-wise singular vector deviation

\begin{matrix} \max (∥ \hat{U} sgn (H_{\hat{U}}) - U ∥_{2 \to \infty}, ∥ \hat{V} sgn (H_{\hat{V}}) - V ∥_{2 \to \infty}) \\ \leq C \frac{\sqrt{P_{\max} θ_{r, \max} K} (κ (Ω) \sqrt{\frac{\max (n_{r}, n_{c}) μ}{\min (n_{r}, n_{c})}} + \sqrt{\log (n_{r} + n_{c})})}{σ_{K} (Ω)} \\ \leq C \frac{\sqrt{P_{\max} θ_{r, \max} K \log (n_{r} + n_{c})}}{σ_{K} (Ω)}, \end{matrix}

(A1)

provided that

c_{1} σ_{K} (Ω) \geq \sqrt{θ_{r, \max} P_{\max} (n_{r} + n_{c}) \log (n_{r} + n_{c})}

for some sufficiently small constant

c_{1}

, and here we set

\sqrt{\frac{\max (n_{r}, n_{c}) μ}{\min (n_{r}, n_{c})}} = O (1)

for convenience since this term has little effect on the error bounds of DiMSC, especially for the case when

\frac{n_{r}}{n_{c}} = O (1)

.

Since

U^{'} U = I, {\hat{U}}^{'} \hat{U} = I

, we have

∥ \hat{U} {\hat{U}}^{'} - U U^{'} ∥_{2 \to \infty} \leq 2 {∥ U - \hat{U} sgn (H_{\hat{U}}) ∥}_{2 \to \infty}

by basic algebra. Now, we are ready to bound

∥ \hat{U} {\hat{U}}^{'} - U U^{'} ∥_{2 \to \infty}

:

\begin{matrix} ∥ \hat{U} {\hat{U}}^{'} - U U^{'} ∥_{2 \to \infty} = \max_{1 \leq i \leq n_{r}} ∥ e_{i}^{'} (U U^{'} - \hat{U} {\hat{U}}^{'}) ∥_{F} \leq 2 {∥ U - \hat{U} sgn (H_{\hat{U}}) ∥}_{2 \to \infty} \\ \leq C \frac{\sqrt{P_{\max} θ_{r, \max} K \log (n_{r} + n_{c})}}{σ_{K} (Ω)} \overset{By Lemma A 5}{\leq} C \frac{\sqrt{P_{\max} θ_{r, \max} K \log (n_{r} + n_{c})}}{θ_{r, \min} σ_{K} (P) σ_{K} (Π_{r}) σ_{K} (Π_{c})} . \end{matrix}

The lemma holds by following similar proof for

∥ \hat{V} {\hat{V}}^{'} - V V^{'} ∥_{2 \to \infty}

. □

When

Θ_{r} = ρ I, n_{r} = n_{c}, Π_{r} = Π_{c} = Π, P_{\max} = O (1)

, and DiCCMM degenerates to MMSB, the bound in Lemma A7 is

O (\frac{\sqrt{K \log (n)}}{σ_{K} (P) \sqrt{ρ} λ_{K} (Π^{'} Π)})

. if we further assume that

λ_{K} (Π^{'} Π) = O (\frac{n}{K})

and

K = O (1)

, the bound is of order

O (\frac{1}{σ_{K} (P)} \frac{1}{\sqrt{n}} \sqrt{\frac{\log (n)}{ρ n}})

. Set the

Θ

in [24] as

\sqrt{ρ} I

, their DCMM degenerates to MMSB, their assumptions are translated to our

λ_{K} (Π^{'} Π) = O (\frac{n}{K})

, when

K = O (1)

, the row-wise singular vector deviation bound in the fourth bullet of Lemma 2.1 [24] is

O (\frac{1}{σ_{K} (P)} \frac{1}{\sqrt{n}} \sqrt{\frac{\log (n)}{ρ n}})

, which is consistent with ours. Meanwhile, if we further assume that

σ_{K} (P) = O (1)

, the bound is of order

\frac{1}{\sqrt{n}} \sqrt{\frac{\log (n)}{ρ n}}

.

The next lemma is the cornerstone to characterizing the behaviors of DiMSC.

Lemma A8.

Under

D i D C M M_{n_{r}, n_{c}} (K, P, Π_{r}, Π_{c}, Θ_{r})

, when conditions of Lemma A7 hold, there exist two permutation matrices

P_{r}, P_{c} \in R^{K \times K}

such that with probability at least

1 - o ({(n_{r} + n_{c})}^{- 3})

, we have

\begin{matrix} \max_{1 \leq k \leq K} {∥ e_{k}^{'} ({\hat{U}}_{*, 2} ({\hat{I}}_{r}, :) - P_{r}^{'} U_{*, 2} (I_{r}, :)) ∥}_{F} = O (\frac{K^{3} θ_{r, \max}^{11} ϖ κ^{3} (Π_{r}^{'} Π_{r}) λ_{1}^{1.5} (Π_{r}^{'} Π_{r})}{θ_{r, \min}^{11} π_{r, \min}}), \\ \max_{1 \leq k \leq K} {∥ e_{k}^{'} ({\hat{V}}_{2} ({\hat{I}}_{c}, :) - P_{c}^{'} V_{2} (I_{c}, :)) ∥}_{F} = O (ϖ κ (Π_{c}^{'} Π_{c})) . \end{matrix}

Proof.

First, we consider column nodes. The detail of the SP algorithm is in Algorithm A2.

Algorithm A2 Successive Projection (SP) [54]

Require: Near-separable matrix

Y_{s p} = S_{s p} M_{s p} + Z_{s p} \in R_{+}^{m \times n}

, where

S_{s p}, M_{s p}

should satisfy Assumption 1 [54], the number r of columns to be extracted.
Ensure: Set of indices

K

such that

Y (K, :) \approx S

(up to permutation)

1:: Compute ${\hat{U}}_{r} \in R^{n_{r} \times K_{r}}$ and ${\hat{U}}_{c} \in R^{n_{c} \times K_{r}}$ from the top- $K_{r}$ -dimensional SVD of A.
2:: Let $R = Y_{s p}, K = {}, k = 1$ .
3:: While $R \neq 0$ and $k \leq r$ do
4:: $k_{*} = {argmax}_{k} {∥ R (k, :) ∥}_{F}$ .
5:: $u_{k} = R (k_{*}, :)$ .
6:: $R \leftarrow (I - \frac{u_{k} u_{k}^{'}}{∥ u_{k} ∥_{F}^{2}}) R$ .
7:: $K = K \cup {k_{*}}$ .
8:: k=k+1.
9:: end while

Based on Algorithm A2, the following theorem is Theorem 1.1 in [54].

Theorem A1. Fix

m \geq r

and

n \geq r

. Consider a matrix

Y_{s p} = S_{s p} M_{s p} + Z_{s p}

, where

S_{s p} \in R^{m \times r}

has a full column rank,

M_{s p} \in R^{r \times n}

is a nonnegative matrix such that the sum of each column is at most 1, and

Z_{s p} = [Z_{s p, 1}, \dots, Z_{s p, n}] \in R^{m \times n}

. Suppose

M_{s p}

has a submatrix equal to

I_{r}

. Write

ϵ \leq \max_{1 \leq i \leq n} {∥ Z_{s p, i} ∥}_{F}

. Suppose

ϵ = O (\frac{σ_{\min} (S_{s p})}{\sqrt{r} κ^{2} (S_{s p})})

, where

σ_{\min} (S_{s p})

and

κ (S_{s p})

are the minimum singular value and condition number of

S_{s p}

, respectively. If we apply the SP algorithm to columns of

Y_{s p}

, then it outputs an index set

K \subset {1, 2, \dots, n}

such that

| K | = r

and

\max_{1 \leq k \leq r} \min_{j \in K} {∥ S_{s p} (:, k) - Y_{s p} (:, j) ∥}_{F} = O (ϵ κ^{2} (S_{s p}))

, where

S_{s p} (:, k)

is the k-th column of

S_{s p}

.

Let

m = K, r = K, n = n_{c}, Y_{s p} = {\hat{V}}_{2}^{'}, Z_{s p} = {\hat{V}}_{2}^{'} - V_{2}^{'}, S_{s p} = V_{2}^{'} (I_{c}, :),

and

M_{s p} = Π_{c}^{'}

. By Condition (I2),

M_{s p}

has an identity submatrix

I_{K}

. By Lemma A7, we have

\begin{matrix} ϵ_{c} = \max_{1 \leq j \leq n_{c}} ∥ {\hat{V}}_{2} (j, :) - V_{2} {(j, :) ∥}_{F} = {∥ {\hat{V}}_{2} (j, :) - V_{2} (j, :) ∥}_{2 \to \infty} \leq ϖ . \end{matrix}

By Theorem A1, there exists a permutation matrix

P_{c}

such that

\begin{matrix} \max_{1 \leq k \leq K} {∥ e_{k}^{'} ({\hat{V}}_{2} ({\hat{I}}_{c}, :) - P_{c}^{'} V_{2} (I_{c}, :)) ∥}_{F} = O (ϵ_{c} κ^{2} (V_{2} (I_{c}, :)) \sqrt{K}) = O (ϖ κ^{2} (V_{2} (I_{c}, :))) . \end{matrix}

Since

κ^{2} (V_{2} (I_{c}, :)) = κ (V_{2} (I_{c}, :) V_{2}^{'} (I_{c}, :)) = κ (V (I_{c}, :) V^{'} (I_{c}, :)) = κ (Π_{c}^{'} Π_{c})

, where the last equality holds by Lemma A4, we have

\begin{matrix} \max_{1 \leq k \leq K} {∥ e_{k}^{'} ({\hat{V}}_{2} ({\hat{I}}_{c}, :) - P_{c}^{'} V_{2} (I_{c}, :)) ∥}_{F} = O (ϖ κ (Π_{c}^{'} Π_{c})) . \end{matrix}

Remark A1.For the ideal case, let

m = K, r = K, n = n_{c}, Y_{s p} = V^{'}, Z_{s p} = V^{'} - V^{'} \equiv 0, S_{s p} = V^{'} (I_{c}, :),

and

M_{s p} = Π_{c}^{'}

. Then, we have

\max_{1 \leq j \leq n_{c}} {∥ V (j, :) - V (j, :) ∥}_{F} = 0

. By Theorem A1, SP algorithm returns

I_{c}

when the input is V assuming there are K column communities.

Now, we consider row nodes. From Lemma 2, we see that

U_{*} (I_{r}, :)

satisfies Condition 1 in [26]. Meanwhile, since

{(U_{*} (I_{r}, :) U_{*}^{'} (I_{r}, :))}^{- 1} 1 > 0

, we have

{(U_{*} (I_{r}, :) U_{*}^{'} (I_{r}, :))}^{- 1} 1 \geq η 1

, hence

U_{*} (I_{r}, :)

satisfies Condition 2 in [26]. Now, we give a lower bound for

η

to show that

η

is strictly positive. By the proof of Lemma A4, we have

\begin{matrix} {(U_{*} (I_{r}, :) U_{*}^{'} (I_{r}, :))}^{- 1} & = N_{U}^{- 1} (I_{r}, I_{r}) Θ^{- 1} (I_{r}, I_{r}) Π_{r}^{'} Θ_{r}^{2} Π_{r} Θ_{r}^{- 1} (I_{r}, I_{r}) N_{U}^{- 1} (I_{r}, I_{r}) \\ \geq \frac{θ_{r, \min}^{2}}{θ_{r, \max}^{2} N_{U, \max}^{2}} Π_{r}^{'} Π_{r} \geq \frac{θ_{r, \min}^{4}}{θ_{r, \max}^{4} K λ_{1} (Π_{r}^{'} Π_{r})} Π_{r}^{'} Π_{r}, \end{matrix}

where we set

N_{U, \max} = \max_{1 \leq i \leq n_{r}} N_{U} (i, i)

, and we have used the facts that

N_{U}, Θ_{r}

are diagonal matrices, and

N_{U, \max} \leq \frac{θ_{r, \max} \sqrt{K λ_{1} (Π_{r}^{'} Π_{r})}}{θ_{r, \min}}

by Lemma A3. Then, we have

\begin{matrix} η = \min_{1 \leq k \leq K} ({(U_{*} (I_{r}, :) U_{*}^{'} (I_{r}, :))}^{- 1} 1) (k) \geq \frac{θ_{r, \min}^{4}}{θ_{r, \max}^{4} K λ_{1} (Π_{r}^{'} Π_{r})} \min_{1 \leq k \leq K} e_{k}^{'} Π_{r}^{'} Π_{r} 1 \\ = \frac{θ_{r, \min}^{4}}{θ_{r, \max}^{4} K λ_{1} (Π_{r}^{'} Π_{r})} \min_{1 \leq k \leq K} e_{k}^{'} Π_{r}^{'} 1 = \frac{θ_{r, \min}^{4} π_{r, \min}}{θ_{r, \max}^{4} K λ_{1} (Π_{r}^{'} Π_{r})}, \end{matrix}

i.e.,

η

is strictly positive. Since

U_{*, 2} (I_{r}, :) U_{*, 2}^{'} (I_{r}, :) \equiv U_{*} (I_{r}, :) U_{*}^{'} (I_{r}, :)

, we have

U_{*, 2} (I_{r}, :)

also satisfies Conditions 1 and 2 in [26]. The above analysis shows that we can directly apply Lemma F.1 of [26] since the ideal DiMSC algorithm satisfies Conditions 1 and 2 in [26], therefore there exists a permutation matrix

P_{r} \in R^{K \times K}

such that

\begin{matrix} \max_{1 \leq k \leq K} {∥ e_{k}^{'} ({\hat{U}}_{*, 2} ({\hat{I}}_{r}, :) - P_{r}^{'} U_{*, 2} (I_{r}, :)) ∥}_{F} = O (\frac{\sqrt{K} ζ ϵ_{r}}{λ_{K}^{1.5} (U_{*, 2} (I_{r}, :)) U_{*, 2}^{'} (I_{r}, :)}), \end{matrix}

where

ζ \leq \frac{4 K}{η λ_{K}^{1.5} (U_{*, 2} (I_{r}, :) U_{*, 2}^{'} (I_{r}, :))} = O (\frac{K}{η λ_{K}^{1.5} (U_{*} (I_{r}, :) U_{*}^{'} (I_{r}, :))})

, and

ϵ_{r} = \max_{1 \leq i \leq n_{r}} ∥ {\hat{U}}_{*, 2} (i, :) - U_{*, 2} (i, :) ∥

. Next, we bound

ϵ_{r}

as below

\begin{matrix} ∥ {\hat{U}}_{*, 2} (i, :) - U_{*, 2} {(i, :) ∥}_{F} = {∥ \frac{{\hat{U}}_{2} (i, :) ∥ U_{2} {(i, :) ∥}_{F} - U_{2} (i, :) {∥ {\hat{U}}_{2} (i, :) ∥}_{F}}{∥ {\hat{U}}_{2} {(i, :) ∥}_{F} {∥ U_{2} (i, :) ∥}_{F}} ∥}_{F} \leq \frac{2 ∥ {\hat{U}}_{2} (i, :) - U_{2} {(i, :) ∥}_{F}}{∥ U_{2} {(i, :) ∥}_{F}} \\ \leq \frac{2 ∥ {\hat{U}}_{2} - U_{2} ∥_{2 \to \infty}}{∥ U_{2} {(i, :) ∥}_{F}} \leq \frac{2 ϖ}{∥ U_{2} {(i, :) ∥}_{F}} = \frac{2 ϖ}{∥ (U U^{'}) (i, :) ∥_{F}} = \frac{2 ϖ}{∥ U (i, :) U^{'} ∥_{F}} = \frac{2 ϖ}{{∥ U (i, :) ∥}_{F}} \\ \leq 2 ϖ \frac{θ_{r, \max} \sqrt{K λ_{1} (Π_{r}^{'} Π_{r})}}{θ_{r, \min}}, \end{matrix}

where the last inequality holds by Lemma A3. Then, we have

ϵ_{r} = O (ϖ \frac{θ_{r, \max} \sqrt{K λ_{1} (Π_{r}^{'} Π_{r})}}{θ_{r, \min}})

. Finally, by Lemma A4, we have

\begin{matrix} \max_{1 \leq k \leq K} {∥ e_{k}^{'} ({\hat{U}}_{*, 2} ({\hat{I}}_{r}, :) - P_{r}^{'} U_{*, 2} (I_{r}, :)) ∥}_{F} = O (\frac{K^{3} θ_{r, \max}^{11} ϖ κ^{3} (Π_{r}^{'} Π_{r}) λ_{1}^{1.5} (Π_{r}^{'} Π_{r})}{θ_{r, \min}^{11} π_{r, \min}}) . \end{matrix}

Remark A2.For the ideal case, when setting

U_{*}

as the input of the SVM-cone algorithm assuming there are K row communities, since

∥ U_{*} - U_{*} ∥_{2 \to \infty} = 0

, Lemma F.1; [26] guarantees that SVM-cone algorithm returns

I_{r}

exactly. Meanwhile, another view to see that the SVM-cone algorithm exactly obtains

I_{r}

when the input is

U_{*}

(also

U_{2, *}

) is given in Appendix F, which focuses on following the three steps of SVM-cone algorithm to show that it returns

I_{r}

with input

U_{*}

(also

U_{*, 2}

), instead of simply applying Lemma F.1. [26].

□

Lemma A9.

Under

D i D C M M_{n_{r}, n_{c}} (K, P, Π_{r}, Π_{c}, Θ_{r})

, when conditions of Lemma A7 hold, with probability at least

1 - o ({(n_{r} + n_{c})}^{- 3})

, we have

\begin{matrix} \max_{1 \leq i \leq n_{r}} {∥ e_{i}^{'} ({\hat{Z}}_{r} - Z_{r} P_{r}) ∥}_{F} = O (\frac{K^{5} θ_{r, \max}^{15} ϖ κ^{4.5} (Π_{r}^{'} Π_{r}) κ (Π_{c}) λ_{1}^{1.5} (Π_{r}^{'} Π_{r})}{θ_{r, \min}^{14} π_{r, \min}}), \\ \max_{1 \leq j \leq n_{c}} {∥ e_{j}^{'} ({\hat{Z}}_{c} - Z_{c} P_{c}) ∥}_{F} = O (ϖ κ (Π_{c}^{'} Π_{c}) \sqrt{K λ_{1} (Π_{c}^{'} Π_{c})}) . \end{matrix}

Proof.

First, we consider column nodes. Recall that

V (I_{c}, :) = B_{c}

. For convenience, set

\hat{V} ({\hat{I}}_{c}, :) = {\hat{B}}_{c}, V_{2} (I_{c}, :) = B_{2 c}, {\hat{V}}_{2} ({\hat{I}}_{c}, :) = {\hat{B}}_{2 c}

. We bound

∥ e_{j}^{'} ({\hat{Z}}_{c} - Z_{c} P_{c}) ∥_{F}

when the input is

\hat{V}

in the SP algorithm. Recall that

Z_{c} = \max (V V^{'} (I_{c}, :) {(V (I_{c}, :) V^{'} (I_{c}, :))}^{- 1}, 0) \equiv Π_{c}

, for

1 \leq j \leq n_{c}

, we have

\begin{matrix} ∥ e_{j}^{'} ({\hat{Z}}_{c} - Z_{c} P_{c}) ∥_{F} = {∥ e_{j}^{'} (\max (0, \hat{V} {\hat{B}}_{c}^{'} {({\hat{B}}_{c} {\hat{B}}_{c}^{'})}^{- 1}) - V B_{c}^{'} {(B_{c} B_{c}^{'})}^{- 1} P_{c}) ∥}_{F} \\ \leq ∥ e_{j}^{'} (\hat{V} {\hat{B}}_{c}^{'} {({\hat{B}}_{c} {\hat{B}}_{c}^{'})}^{- 1} - V B_{c}^{'} {(B_{c} B_{c}^{'})}^{- 1} P_{c}) ∥_{F} \\ = ∥ e_{j}^{'} (\hat{V} - V (V^{'} \hat{V})) {\hat{B}}_{c}^{'} {({\hat{B}}_{c} {\hat{B}}_{c}^{'})}^{- 1} + e_{j}^{'} (V (V^{'} \hat{V}) {\hat{B}}_{c}^{'} {({\hat{B}}_{c} {\hat{B}}_{c}^{'})}^{- 1} - V (V^{'} \hat{V}) {(P_{c}^{'} (B_{c} B_{c}^{'}) {(B_{c}^{'})}^{- 1} (V^{'} \hat{V}))}^{- 1}) ∥_{F} \\ \leq ∥ e_{j}^{'} (\hat{V} - V (V^{'} \hat{V})) {\hat{B}}_{c}^{'} {({\hat{B}}_{c} {\hat{B}}_{c}^{'})}^{- 1} ∥_{F} + {∥ e_{j}^{'} V (V^{'} \hat{V}) ({\hat{B}}_{c}^{'} {({\hat{B}}_{c} {\hat{B}}_{c}^{'})}^{- 1} - {(P_{c}^{'} (B_{c} B_{c}^{'}) {(B_{c}^{'})}^{- 1} (V^{'} \hat{V}))}^{- 1}) ∥}_{F} \\ \leq ∥ e_{j}^{'} (\hat{V} - V (V^{'} \hat{V})) ∥_{F} ∥ {\hat{B}}_{c}^{- 1} ∥_{F} + {∥ e_{j}^{'} V (V^{'} \hat{V}) ({\hat{B}}_{c}^{'} {({\hat{B}}_{c} {\hat{B}}_{c}^{'})}^{- 1} - {(P_{c}^{'} (B_{c} B_{c}^{'}) {(B_{c}^{'})}^{- 1} (V^{'} \hat{V}))}^{- 1}) ∥}_{F} \\ \leq \sqrt{K} ∥ e_{j}^{'} (\hat{V} - V (V^{'} \hat{V})) ∥_{F} / \sqrt{λ_{K} ({\hat{B}}_{c} {\hat{B}}_{c}^{'})} + {∥ e_{j}^{'} V (V^{'} \hat{V}) ({\hat{B}}_{c}^{- 1} - {(P_{c}^{'} B_{c} (V^{'} \hat{V}))}^{- 1}) ∥}_{F} \\ = \sqrt{K} ∥ e_{j}^{'} (\hat{V} {\hat{V}}^{'} - V V^{'}) \hat{V} ∥_{F} O (\sqrt{λ_{1} (Π_{c}^{'} Π_{c})}) + {∥ e_{j}^{'} V (V^{'} \hat{V}) ({\hat{B}}_{c}^{- 1} - {(P_{c}^{'} B_{c} (V^{'} \hat{V}))}^{- 1}) ∥}_{F} \\ \leq \sqrt{K} ∥ e_{j}^{'} (\hat{V} {\hat{V}}^{'} - V V^{'}) ∥_{F} O (\sqrt{λ_{1} (Π_{c}^{'} Π_{c})}) + {∥ e_{j}^{'} V (V^{'} \hat{V}) ({\hat{B}}_{c}^{- 1} - {(P_{c}^{'} B_{c} (V^{'} \hat{V}))}^{- 1}) ∥}_{F} \\ \leq \sqrt{K} ϖ O (\sqrt{λ_{1} (Π_{c}^{'} Π_{c})}) + {∥ e_{j}^{'} V (V^{'} \hat{V}) ({\hat{B}}_{c}^{- 1} - {(P_{c}^{'} B_{c} (V^{'} \hat{V}))}^{- 1}) ∥}_{F} \\ = O (ϖ \sqrt{K λ_{1} (Π_{c}^{'} Π_{c})}) + {∥ e_{j}^{'} V (V^{'} \hat{V}) ({\hat{B}}_{c}^{- 1} - {(P_{c}^{'} B_{c} (V^{'} \hat{V}))}^{- 1}) ∥}_{F}, \end{matrix}

where we have used similar idea in the proof of Lemma VII.3 in [27] such that apply

O (\frac{1}{λ_{K} (B_{c} B_{c}^{'})})

to estimate

\frac{1}{λ_{K} ({\hat{B}}_{c} {\hat{B}}_{c}^{'})}

, then by Lemma A4, we have

\frac{1}{λ_{K} ({\hat{B}}_{c} {\hat{B}}_{c}^{'})} = O (λ_{1} (Π_{c}^{'} Π_{c}))

.

Now, we aim to bound

∥ e_{j}^{'} V (V^{'} \hat{V}) ({\hat{B}}_{c}^{- 1} - {(P_{c}^{'} B_{c} (V^{'} \hat{V}))}^{- 1}) ∥_{F}

. For convenience, set

T_{c} = V^{'} \hat{V}, S_{c} = P_{c}^{'} B_{c} T_{c}

. We have

\begin{matrix} ∥ e_{j}^{'} V (V^{'} \hat{V}) ({\hat{B}}_{c}^{- 1} - {(P_{c}^{'} B_{c} (V^{'} \hat{V}))}^{- 1}) ∥_{F} = {∥ e_{j}^{'} V T_{c} S_{c}^{- 1} (S_{c} - {\hat{B}}_{c}) {\hat{B}}_{c}^{- 1} ∥}_{F} \\ \leq ∥ e_{j}^{'} V T_{c} S_{c}^{- 1} (S_{c} - {\hat{B}}_{c}) ∥_{F} ∥ {\hat{B}}_{c}^{- 1} ∥_{F} \leq {∥ e_{j}^{'} V T_{c} S_{c}^{- 1} (S_{c} - {\hat{B}}_{c}) ∥}_{F} \frac{\sqrt{K}}{| λ_{K} ({\hat{B}}_{c}) |} \\ = ∥ e_{j}^{'} V T_{c} S_{c}^{- 1} (S_{c} - {\hat{B}}_{c}) ∥_{F} \frac{\sqrt{K}}{\sqrt{λ_{K} ({\hat{B}}_{c} {\hat{B}}_{c}^{'})}} \leq {∥ e_{j}^{'} V T_{c} S_{c}^{- 1} (S_{c} - {\hat{B}}_{c}) ∥}_{F} O (\sqrt{K λ_{1} (Π_{c}^{'} Π_{c})}) \\ = ∥ e_{j}^{'} V T_{c} T_{c}^{- 1} B_{c}^{'} {(B_{c} B_{c}^{'})}^{- 1} P_{c} (S_{c} - {\hat{B}}_{c}) ∥_{F} O (\sqrt{K λ_{1} (Π_{c}^{'} Π_{c})}) \\ = ∥ e_{j}^{'} V B_{c}^{'} {(B_{c} B_{c}^{'})}^{- 1} P_{c} (S_{c} - {\hat{B}}_{c}) ∥_{F} O (\sqrt{K λ_{1} (Π_{c}^{'} Π_{c})}) \\ = ∥ e_{j}^{'} Z_{c} P_{c} (S_{c} - {\hat{B}}_{c}) ∥_{F} O (\sqrt{K λ_{1} (Π_{c}^{'} Π_{c})}) \overset{By Z_{c} = Π_{c}}{\leq} \max_{1 \leq k \leq K} {∥ e_{k}^{'} (S_{c} - {\hat{B}}_{c}) ∥}_{F} O (\sqrt{K λ_{1} (Π_{c}^{'} Π_{c})}) \\ = \max_{1 \leq k \leq K} {∥ e_{k}^{'} ({\hat{B}}_{c} - P_{c}^{'} B_{c} V^{'} \hat{V}) ∥}_{F} O (\sqrt{K λ_{1} (Π_{c}^{'} Π_{c})}) \\ = \max_{1 \leq k \leq K} {∥ e_{k}^{'} ({\hat{B}}_{c} {\hat{V}}^{'} - P_{c}^{'} B_{c} V^{'}) \hat{V} ∥}_{F} O (\sqrt{K λ_{1} (Π_{c}^{'} Π_{c})}) \\ \leq \max_{1 \leq k \leq K} {∥ e_{k}^{'} ({\hat{B}}_{c} {\hat{V}}^{'} - P_{c}^{'} B_{c} V^{'}) ∥}_{F} O (\sqrt{K λ_{1} (Π_{c}^{'} Π_{c})}) \\ = \max_{1 \leq k \leq K} {∥ e_{k}^{'} ({\hat{B}}_{2 c} - P_{c}^{'} B_{2 c}) ∥}_{F} O (\sqrt{K λ_{1} (Π_{c}^{'} Π_{c})}) \\ = O (ϖ κ (Π_{c}^{'} Π_{c}) \sqrt{K λ_{1} (Π_{c}^{'} Π_{c})}) . \end{matrix}

(A2)

Remark A3. Equation (A2) supports our statement that building the theoretical framework of DiMSC benefits a lot by introducing the DiMSC-equivalence algorithm since

∥ {\hat{B}}_{2 c} - P_{c}^{'} B_{2 c} ∥_{2 \to \infty}

is obtained from DiMSC-equivalence (i.e., inputing

{\hat{V}}_{2}

in the SP algorithm obtains

∥ {\hat{B}}_{2 c} - P_{c}^{'} B_{2 c} ∥_{2 \to \infty}

).

Then, we have

\begin{matrix} ∥ e_{j}^{'} ({\hat{Z}}_{c} - Z_{c} P_{c}) ∥_{F} \leq O (ϖ \sqrt{K λ_{1} (Π_{c}^{'} Π_{c})}) + {∥ e_{j}^{'} V (V^{'} \hat{V}) ({\hat{B}}_{c}^{- 1} - {(P_{c}^{'} B_{c} (V^{'} \hat{V}))}^{- 1}) ∥}_{F} \\ \leq O (ϖ \sqrt{K λ_{1} (Π_{c}^{'} Π_{c})}) + O (ϖ κ (Π_{c}^{'} Π_{c}) \sqrt{K λ_{1} (Π_{c}^{'} Π_{c})}) = O (ϖ κ (Π_{c}^{'} Π_{c}) \sqrt{K λ_{1} (Π_{c}^{'} Π_{c})}) . \end{matrix}

Next, we consider row nodes. For

1 \leq i \leq n_{r}

, since

Z_{r} = Y_{*} J_{*}, {\hat{Z}}_{r} = {\hat{Y}}_{*} {\hat{J}}_{*}

, we have

\begin{matrix} ∥ e_{i}^{'} ({\hat{Z}}_{r} - Z_{r} P_{r}) ∥_{F} = ∥ e_{i}^{'} (\max (0, {\hat{Y}}_{*} {\hat{J}}_{*}) - Y_{*} J_{*} P_{r}) ∥_{F} \leq {∥ e_{i}^{'} ({\hat{Y}}_{*} {\hat{J}}_{*} - Y_{*} J_{*} P_{r}) ∥}_{F} \\ = ∥ e_{i}^{'} ({\hat{Y}}_{*} - Y_{*} P_{r}) {\hat{J}}_{*} + e_{i}^{'} Y_{*} P_{r} ({\hat{J}}_{*} - P_{r}^{'} J_{*} P_{r}) ∥_{F} \\ \leq ∥ e_{i}^{'} ({\hat{Y}}_{*} - Y_{*} P_{r}) ∥_{F} ∥ {\hat{J}}_{*} ∥_{F} + ∥ e_{i}^{'} Y_{*} P_{r} ∥_{F} {∥ {\hat{J}}_{*} - P_{r}^{'} J_{*} P_{r} ∥}_{F} \\ = ∥ e_{i}^{'} ({\hat{Y}}_{*} - Y_{*} P_{r}) ∥_{F} ∥ {\hat{J}}_{*} ∥_{F} + ∥ e_{i}^{'} Y_{*} ∥_{F} {∥ {\hat{J}}_{*} - P_{r}^{'} J_{*} P_{r} ∥}_{F} . \end{matrix}

Next, we consider row nodes. For

1 \leq i \leq n_{r}

, since

Z_{r} = Y_{*} J_{*}, {\hat{Z}}_{r} = {\hat{Y}}_{*} {\hat{J}}_{*}

, we have

\begin{matrix} ∥ e_{i}^{'} ({\hat{Z}}_{r} - Z_{r} P_{r}) ∥_{F} = ∥ e_{i}^{'} (\max (0, {\hat{Y}}_{*} {\hat{J}}_{*}) - Y_{*} J_{*} P_{r}) ∥_{F} \leq {∥ e_{i}^{'} ({\hat{Y}}_{*} {\hat{J}}_{*} - Y_{*} J_{*} P_{r}) ∥}_{F} \\ = ∥ e_{i}^{'} ({\hat{Y}}_{*} - Y_{*} P_{r}) {\hat{J}}_{*} + e_{i}^{'} Y_{*} P_{r} ({\hat{J}}_{*} - P_{r}^{'} J_{*} P_{r}) ∥_{F} \\ \leq ∥ e_{i}^{'} ({\hat{Y}}_{*} - Y_{*} P_{r}) ∥_{F} ∥ {\hat{J}}_{*} ∥_{F} + ∥ e_{i}^{'} Y_{*} P_{r} ∥_{F} {∥ {\hat{J}}_{*} - P_{r}^{'} J_{*} P_{r} ∥}_{F} \\ = ∥ e_{i}^{'} ({\hat{Y}}_{*} - Y_{*} P_{r}) ∥_{F} ∥ {\hat{J}}_{*} ∥_{F} + ∥ e_{i}^{'} Y_{*} ∥_{F} {∥ {\hat{J}}_{*} - P_{r}^{'} J_{*} P_{r} ∥}_{F} . \end{matrix}

Therefore, the bound of

∥ e_{i}^{'} ({\hat{Z}}_{r} - Z_{r} P_{r}) ∥_{F}

can be obtained as long as we bound

∥ e_{i}^{'} ({\hat{Y}}_{*} - Y_{*} P_{r}) ∥_{F}, ∥ {\hat{J}}_{*} ∥_{F}, {∥ e_{i}^{'} Y_{*} ∥}_{F}

and

∥ {\hat{J}}_{*} - P_{r}^{'} J_{*} P_{r} ∥_{F}

. We bound the four terms as below:

We bound $∥ e_{i}^{'} ({\hat{Y}}_{*} - Y_{*} P_{r}) ∥_{F}$ first. Similar as bounding $∥ e_{j}^{'} ({\hat{Z}}_{c} - Z_{c} P_{c}) ∥$ , we set $U_{*} (I_{r}, :) = B_{R}, {\hat{U}}_{*} ({\hat{I}}_{r}, :) = {\hat{B}}_{R}, U_{*, 2} (I_{r}, :) = B_{2 R}, {\hat{U}}_{*, 2} ({\hat{I}}_{r}, :) = {\hat{B}}_{2 R}$ for convenience. We bound $∥ e_{i}^{'} ({\hat{Y}}_{*} - Y_{*} P_{r}) ∥_{F}$ when the input is ${\hat{U}}_{*}$ in the SVM-cone algorithm. For $1 \leq i \leq n_{r}$ , we have

$\begin{matrix} ∥ e_{i}^{'} ({\hat{Y}}_{*} - Y_{*} P_{r}) ∥_{F} = {∥ e_{i}^{'} (\hat{U} {\hat{B}}_{R}^{'} {({\hat{B}}_{R} {\hat{B}}_{R}^{'})}^{- 1} - U B_{R}^{'} {(B_{R} B_{R}^{'})}^{- 1} P_{r}) ∥}_{F} \\ = ∥ e_{i}^{'} (\hat{U} - U (U^{'} \hat{U})) {\hat{B}}_{R}^{'} {({\hat{B}}_{R} {\hat{B}}_{R}^{'})}^{- 1} + e_{i}^{'} (U (U^{'} \hat{U}) {\hat{B}}_{R}^{'} {({\hat{B}}_{R} {\hat{B}}_{R}^{'})}^{- 1} \\ - U (U^{'} \hat{U}) {(P_{r}^{'} (B_{R} B_{R}^{'}) {(B_{R}^{'})}^{- 1} (U^{'} \hat{U}))}^{- 1} {) ∥}_{F} \\ \leq ∥ e_{i}^{'} (\hat{U} - U (U^{'} \hat{U})) {\hat{B}}_{R}^{'} {({\hat{B}}_{R} {\hat{B}}_{R}^{'})}^{- 1} ∥_{F} + ∥ e_{i}^{'} U (U^{'} \hat{U}) ({\hat{B}}_{R}^{'} {({\hat{B}}_{R} {\hat{B}}_{R}^{'})}^{- 1} \\ - {(P_{r}^{'} (B_{R} B_{R}^{'}) {(B_{R}^{'})}^{- 1} (U^{'} \hat{U}))}^{- 1} {) ∥}_{F} \\ \leq ∥ e_{i}^{'} (\hat{U} - U (U^{'} \hat{U})) ∥_{F} ∥ {\hat{B}}_{R}^{- 1} ∥_{F} + {∥ e_{i}^{'} U (U^{'} \hat{U}) ({\hat{B}}_{R}^{'} {({\hat{B}}_{R} {\hat{B}}_{R}^{'})}^{- 1} - {(P_{r}^{'} (B_{R} B_{R}^{'}) {(B_{R}^{'})}^{- 1} (U^{'} \hat{U}))}^{- 1}) ∥}_{F} \\ \leq \sqrt{K} ∥ e_{i}^{'} (\hat{U} - U (U^{'} \hat{U})) ∥_{F} / \sqrt{λ_{K} ({\hat{B}}_{R} {\hat{B}}_{R}^{'})} + {∥ e_{i}^{'} U (U^{'} \hat{U}) ({\hat{B}}_{R}^{- 1} - {(P_{r}^{'} B_{R} (U^{'} \hat{U}))}^{- 1}) ∥}_{F} \\ \overset{(i)}{=} \sqrt{K} ∥ e_{i}^{'} (\hat{U} {\hat{U}}^{'} - U U^{'}) \hat{U} ∥_{F} O (\frac{θ_{r, \max} \sqrt{κ (Π_{r}^{'} Π_{r})}}{θ_{r, \min}}) + {∥ e_{i}^{'} U (U^{'} \hat{U}) ({\hat{B}}_{R}^{- 1} - {(P_{r}^{'} B_{R} (U^{'} \hat{U}))}^{- 1}) ∥}_{F} \\ \leq \sqrt{K} ∥ e_{i}^{'} (\hat{U} {\hat{U}}^{'} - U U^{'}) ∥_{F} O (\frac{θ_{r, \max} \sqrt{κ (Π_{r}^{'} Π_{r})}}{θ_{r, \min}}) + {∥ e_{i}^{'} U (U^{'} \hat{U}) ({\hat{B}}_{R}^{- 1} - {(P_{r}^{'} B_{R} (U^{'} \hat{U}))}^{- 1}) ∥}_{F} \\ \leq \sqrt{K} ϖ O (\frac{θ_{r, \max} \sqrt{κ (Π_{r}^{'} Π_{r})}}{θ_{r, \min}}) + {∥ e_{i}^{'} U (U^{'} \hat{U}) ({\hat{B}}_{R}^{- 1} - {(P_{r}^{'} B_{R} (U^{'} \hat{U}))}^{- 1}) ∥}_{F} \\ = O (ϖ \frac{θ_{r, \max} \sqrt{K κ (Π_{r}^{'} Π_{r})}}{θ_{r, \min}}) + {∥ e_{i}^{'} U (U^{'} \hat{U}) ({\hat{B}}_{R}^{- 1} - {(P_{r}^{'} B_{R} (U^{'} \hat{U}))}^{- 1}) ∥}_{F}, \end{matrix}$

where we have used similar idea in the proof of Lemma VII.3 in [27] such that we apply $O (\frac{1}{λ_{K} (B_{R} B_{R}^{'})})$ to estimate $\frac{1}{λ_{K} ({\hat{B}}_{R} {\hat{B}}_{R}^{'})}$ , hence (i) holds by Lemma A4.

Now, we aim to bound

∥ e_{i}^{'} U (U^{'} \hat{U}) ({\hat{B}}_{R}^{- 1} - {(P_{r}^{'} B_{R} (U^{'} \hat{U}))}^{- 1}) ∥_{F}

. For convenience, set

T_{r} = U^{'} \hat{U}, S_{r} = P_{r}^{'} B_{R} T_{r}

. We have

\begin{matrix} ∥ e_{i}^{'} U (U^{'} \hat{U}) ({\hat{B}}_{R}^{- 1} - {(P_{r}^{'} B_{R} (U^{'} \hat{U}))}^{- 1}) ∥_{F} = {∥ e_{i}^{'} U T_{r} S_{r}^{- 1} (S_{r} - {\hat{B}}_{R}) {\hat{B}}_{R}^{- 1} ∥}_{F} \\ \leq ∥ e_{i}^{'} U T_{r} S_{r}^{- 1} (S_{r} - {\hat{B}}_{R}) ∥_{F} ∥ {\hat{B}}_{R}^{- 1} ∥_{F} \leq {∥ e_{i}^{'} U T_{r} S_{r}^{- 1} (S_{r} - {\hat{B}}_{R}) ∥}_{F} \frac{\sqrt{K}}{| λ_{K} ({\hat{B}}_{R}) |} \\ = ∥ e_{i}^{'} U T_{r} S_{r}^{- 1} (S_{r} - {\hat{B}}_{R}) ∥_{F} \frac{\sqrt{K}}{\sqrt{λ_{K} ({\hat{B}}_{R} {\hat{B}}_{R}^{'})}} \leq {∥ e_{i}^{'} U T_{r} S_{r}^{- 1} (S_{r} - {\hat{B}}_{R}) ∥}_{F} O (\frac{θ_{r, \max} \sqrt{K κ (Π_{r}^{'} Π_{r})}}{θ_{r, \min}}) \\ = ∥ e_{i}^{'} U T_{r} T_{r}^{- 1} B_{R}^{'} {(B_{R} B_{R}^{'})}^{- 1} P_{r} (S_{r} - {\hat{B}}_{R}) ∥_{F} O (\frac{θ_{r, \max} \sqrt{K κ (Π_{r}^{'} Π_{r})}}{θ_{r, \min}}) \\ = ∥ e_{i}^{'} U B_{R}^{'} {(B_{R} B_{R}^{'})}^{- 1} P_{r} (S_{r} - {\hat{B}}_{R}) ∥_{F} O (\frac{θ_{r, \max} \sqrt{K κ (Π_{r}^{'} Π_{r})}}{θ_{r, \min}}) \\ = ∥ e_{i}^{'} Y_{*} P_{r} (S_{r} - {\hat{B}}_{R}) ∥_{F} O (\frac{θ_{r, \max} \sqrt{K λ_{1} (Π_{r}^{'} Π_{r})}}{θ_{r, \min}}) \leq ∥ e_{i}^{'} Y_{*} ∥_{F} {∥ S_{r} - {\hat{B}}_{R} ∥}_{F} O (\frac{θ_{r, \max} \sqrt{K λ_{1} (Π_{r}^{'} Π_{r})}}{θ_{r, \min}}) \\ \overset{By Equation (A 4)}{\leq} \frac{θ_{r, \max}^{2} \sqrt{K λ_{1} (Π_{r}^{'} Π_{r})}}{θ_{r, \min}^{2} λ_{K} (Π_{r}^{'} Π_{r})} \max_{1 \leq k \leq K} {∥ e_{k}^{'} (S_{r} - {\hat{B}}_{R}) ∥}_{F} O (\frac{θ_{r, \max} K \sqrt{κ (Π_{r}^{'} Π_{r})}}{θ_{r, \min}}) \\ = \max_{1 \leq k \leq K} {∥ e_{k}^{'} ({\hat{B}}_{R} - P_{r}^{'} B_{R} U^{'} \hat{U}) ∥}_{F} O (\frac{θ_{r, \max}^{3} K^{1.5} κ (Π_{r}^{'} Π_{r})}{θ_{r, \min}^{3} \sqrt{λ_{K} (Π_{r}^{'} Π_{r})}}) \\ = \max_{1 \leq k \leq K} {∥ e_{k}^{'} ({\hat{B}}_{R} {\hat{U}}^{'} - P_{r}^{'} B_{R} U^{'}) \hat{U} ∥}_{F} O (\frac{θ_{r, \max}^{3} K^{1.5} κ (Π_{r}^{'} Π_{r})}{θ_{r, \min}^{3} \sqrt{λ_{K} (Π_{r}^{'} Π_{r})}}) \\ \leq \max_{1 \leq k \leq K} {∥ e_{k}^{'} ({\hat{B}}_{R} {\hat{U}}^{'} - P_{r}^{'} B_{R} U^{'}) ∥}_{F} O (\frac{θ_{r, \max}^{3} K^{1.5} κ (Π_{r}^{'} Π_{r})}{θ_{r, \min}^{3} \sqrt{λ_{K} (Π_{r}^{'} Π_{r})}}) \\ = \max_{1 \leq k \leq K} {∥ e_{k}^{'} ({\hat{B}}_{2 R} - P_{r}^{'} B_{2 R}) ∥}_{F} O (\frac{θ_{r, \max}^{3} K^{1.5} κ (Π_{r}^{'} Π_{r})}{θ_{r, \min}^{3} \sqrt{λ_{K} (Π_{r}^{'} Π_{r})}}) \\ \overset{By Lemma A 8}{=} O (\frac{K^{4.5} θ_{r, \max}^{14} ϖ κ^{4.5} (Π_{r}^{'} Π_{r}) λ_{1} (Π_{r}^{'} Π_{r})}{θ_{r, \min}^{14} π_{r, \min}}) . \end{matrix}

(A3)

Remark A4.Similar as Equation (A2), Equation (A3) supports our statement that building the theoretical framework of DiMSC benefits a lot by introducing the DiMSC-equivalence algorithm since

∥ {\hat{B}}_{2 R} - P_{r}^{'} B_{2 R} ∥_{2 \to \infty}

is obtained from DiMSC-equivalence (i.e., inputing

{\hat{U}}_{*, 2}

in the SVM-cone algorithm obtains

∥ {\hat{B}}_{2 R} - P_{r}^{'} B_{2 R} ∥_{2 \to \infty}

).

Then, we have

\begin{matrix} ∥ e_{i}^{'} ({\hat{Y}}_{*} - Y_{*} P_{r}) ∥_{F} & \leq O (ϖ \frac{θ_{r, \max} \sqrt{K κ (Π_{r}^{'} Π_{r})}}{θ_{r, \min}}) + ∥ e_{i}^{'} U (U^{'} \hat{U}) {({\hat{B}}_{R}^{- 1} - (P_{r}^{'} B_{R} U^{'} \hat{U}))}^{- 1} {) ∥}_{F} \\ \leq O (ϖ \frac{θ_{r, \max} \sqrt{K κ (Π_{r}^{'} Π_{r})}}{θ_{r, \min}}) + O (\frac{K^{4.5} θ_{r, \max}^{14} ϖ κ^{4.5} (Π_{r}^{'} Π_{r}) λ_{1} (Π_{r}^{'} Π_{r})}{θ_{r, \min}^{14} π_{r, \min}}) \\ = O (\frac{K^{4.5} θ_{r, \max}^{14} ϖ κ^{4.5} (Π_{r}^{'} Π_{r}) λ_{1} (Π_{r}^{'} Π_{r})}{θ_{r, \min}^{14} π_{r, \min}}) . \end{matrix}

for $∥ e_{i}^{'} Y_{*} ∥_{F}$ , since $Y_{*} = U U_{*}^{- 1} (I_{r}, :)$ , by Lemmas (A3) and (A4), we have

$\begin{matrix} ∥ e_{i}^{'} Y_{*} ∥_{F} \leq {∥ U (i, :) ∥}_{F} {∥ U_{*}^{- 1} (I_{r}, :) ∥}_{F} \leq \frac{\sqrt{K} {∥ U (i, :) ∥}_{F}}{\sqrt{λ_{K} (U_{*} (I_{r}, :) U_{*}^{'} (I_{r}, :))}} \leq \frac{θ_{r, \max}^{2} \sqrt{K λ_{1} (Π_{r}^{'} Π_{r})}}{θ_{r, \min}^{2} λ_{K} (Π_{r}^{'} Π_{r})} . \end{matrix}$

(A4)
for $∥ {\hat{J}}_{*} ∥_{F}$ , recall that ${\hat{J}}_{*} = diag ({\hat{U}}_{*} ({\hat{I}}_{r}, :) \hat{Λ} {\hat{V}}^{'} ({\hat{I}}_{c}, :))$ , we have

$\begin{matrix} ∥ {\hat{J}}_{*} ∥ = \max_{1 \leq k \leq K} {\hat{J}}_{*} (k, k) = \max_{1 \leq k \leq K} e_{k}^{'} {\hat{U}}_{*} ({\hat{I}}_{r}, :) \hat{Λ} {\hat{V}}^{'} ({\hat{I}}_{c}, :) e_{k} \\ = \max_{1 \leq k \leq K} ∥ e_{k}^{'} {\hat{U}}_{*} ({\hat{I}}_{r}, :) \hat{Λ} {\hat{V}}^{'} ({\hat{I}}_{c}, :) e_{k} ∥ \\ \leq \max_{1 \leq k \leq K} ∥ e_{k}^{'} {\hat{U}}_{*} ({\hat{I}}_{r}, :) ∥ ∥ \hat{Λ} ∥ ∥ {\hat{V}}^{'} ({\hat{I}}_{c}, :) e_{k} ∥ \leq \max_{1 \leq k \leq K} ∥ e_{k}^{'} {\hat{U}}_{*} ({\hat{I}}_{r}, :) ∥_{F} ∥ \hat{Λ} ∥ ∥ {\hat{V}}^{'} ({\hat{I}}_{c}, :) e_{k} ∥ \\ = \max_{1 \leq k \leq K} ∥ A ∥ ∥ {\hat{V}}^{'} ({\hat{I}}_{c}, :) e_{k} ∥ = \max_{1 \leq k \leq K} ∥ A ∥ ∥ {(e_{k}^{'} \hat{V} ({\hat{I}}_{c}, :))}^{'} ∥ = \max_{1 \leq k \leq K} ∥ A ∥ ∥ e_{k}^{'} \hat{V} ({\hat{I}}_{c}, :) ∥ \\ \leq \max_{1 \leq k \leq K} ∥ A ∥ ∥ e_{k}^{'} \hat{V} ({\hat{I}}_{c}, :) ∥_{F} \leq ∥ A ∥ ∥ \hat{V} ∥_{2 \to \infty} = ∥ A ∥ ∥ \hat{V} sgn (H_{\hat{V}}) {- V + V ∥}_{2 \to \infty} \\ \leq ∥ A ∥ (∥ \hat{V} sgn (H_{\hat{V}}) - V ∥_{2 \to \infty} + ∥ V ∥_{2 \to \infty}) . \end{matrix}$

By Lemmas (A6) and (A5), $∥ A ∥ = ∥ A - Ω + Ω ∥ \leq ∥ A - Ω ∥ + σ_{1} (Ω) \leq ∥ A - Ω ∥ + θ_{r, \max} σ_{1} (P) σ_{1} (Π_{r}) σ_{1} (Π_{c}) = O (θ_{r, \max} σ_{1} (Π_{r}) σ_{1} (Π_{c}))$ . By Lemma (A5) and Equation (A1),
$∥ \hat{V} sgn (H_{\hat{V}}) {- V ∥}_{2 \to \infty} \leq C \frac{\sqrt{θ_{r, \max} K \log (n_{r} + n_{c})}}{θ_{r, \min} σ_{K} (P) σ_{K} (Π_{r}) σ_{K} (Π_{c})}$ . By Lemma A3, ${∥ V ∥}_{2 \to \infty} \leq \sqrt{\frac{1}{λ_{K} (Π_{c}^{'} Π_{c})}}$ , which gives $∥ \hat{V} sgn (H_{\hat{V}}) {- V ∥}_{2 \to \infty} + {∥ V ∥}_{2 \to \infty} = O (\sqrt{\frac{1}{λ_{K} (Π_{c}^{'} Π_{c})}})$ (this can be seen as simply using ${∥ V ∥}_{2 \to \infty}$ to estimate $∥ \hat{V} ∥_{2 \to \infty}$ since $\sqrt{\frac{1}{λ_{K} (Π_{c}^{'} Π_{c})}}$ is the same order as $\frac{\sqrt{θ_{r, \max} K \log (n_{r} + n_{c})}}{θ_{r, \min} σ_{K} (P) σ_{K} (Π_{r}) σ_{K} (Π_{c})}$ ). Then, we have $∥ {\hat{J}}_{*} ∥ = O (θ_{r, \max} σ_{1} (Π_{r}) κ (Π_{c}))$ , which gives that $∥ {\hat{J}}_{*} ∥_{F} = O (θ_{r, \max} \sqrt{K} σ_{1} (Π_{r}) κ (Π_{c}))$ .
for $∥ {\hat{J}}_{*} - P_{r}^{'} J_{*} P_{r} ∥_{F}$ , since $J_{*} = N_{U} (I_{r}, I_{r}) Θ_{r} (I_{r}, I_{r})$ , we have $∥ J_{*} ∥ \leq N_{U, \max} θ_{r, \max} \leq \frac{θ_{r, \max}^{2} \sqrt{K λ_{1} (Π_{r}^{'} Π_{r})}}{θ_{r, \min}}$ , which gives that $∥ J_{*} ∥_{F} \leq \frac{θ_{r, \max}^{2} K σ_{1} (Π_{r})}{θ_{r, \min}}$ . Thus, we have $∥ {\hat{J}}_{*} - P_{r}^{'} J_{*} P_{r} ∥_{F} = O (\frac{θ_{r, \max}^{2} K σ_{1} (Π_{r})}{θ_{r, \min}})$ .

Combining the above results, we have

\begin{matrix} ∥ e_{i}^{'} ({\hat{Z}}_{r} - Z_{r} P_{r}) ∥_{F} \leq ∥ e_{i}^{'} ({\hat{Y}}_{*} - Y_{*} P_{r}) ∥_{F} ∥ {\hat{J}}_{*} ∥_{F} + ∥ e_{i}^{'} Y_{*} ∥_{F} {∥ {\hat{J}}_{*} - P_{r}^{'} J_{*} P_{r} ∥}_{F} \\ = O (\frac{K^{4.5} θ_{r, \max}^{14} ϖ κ^{4.5} (Π_{r}^{'} Π_{r}) λ_{1} (Π_{r}^{'} Π_{r})}{θ_{r, \min}^{14} π_{r, \min}}) O (θ_{r, \max} \sqrt{K} σ_{1} (Π_{r}) κ (Π_{c})) \\ + \frac{θ_{r, \max}^{2} \sqrt{K λ_{1} (Π_{r}^{'} Π_{r})}}{θ_{r, \min}^{2} λ_{K} (Π_{r} Π_{r})} O (\frac{θ_{r, \max}^{2} K σ_{1} (Π_{r})}{θ_{r, \min}}) = O (\frac{K^{5} θ_{r, \max}^{15} ϖ κ^{4.5} (Π_{r}^{'} Π_{r}) κ (Π_{c}) λ_{1}^{1.5} (Π_{r}^{'} Π_{r})}{θ_{r, \min}^{14} π_{r, \min}}) . \end{matrix}

□

Appendix E.1. Proof of Theorem 2

Proof.

We bound

∥ e_{j}^{'} ({\hat{Π}}_{c} - Π_{c} P_{c}) ∥_{1}

first. Recall that

Z_{c} = Π_{c}, Π_{c} (j, :) = \frac{Z_{c} (j, :)}{∥ Z_{c} {(j, :) ∥}_{1}}, {\hat{Π}}_{c} (i, :) = \frac{{\hat{Z}}_{c} (j, :)}{∥ {\hat{Z}}_{c} {(j, :) ∥}_{1}}

, for

1 \leq j \leq n_{c}

, since

\begin{matrix} ∥ e_{j}^{'} ({\hat{Π}}_{c} - Π_{c} P_{c}) ∥_{1} & = ∥ \frac{e_{j}^{'} {\hat{Z}}_{c}}{∥ e_{j}^{'} {\hat{Z}}_{c} ∥_{1}} - \frac{e_{j}^{'} Z_{c} P_{c}}{∥ e_{j}^{'} Z_{c} P_{c} ∥_{1}} ∥_{1} = {∥ \frac{e_{j}^{'} {\hat{Z}}_{c} ∥ e_{j}^{'} Z_{c} ∥_{1} - e_{j}^{'} Z_{c} P_{c} {∥ e_{j}^{'} {\hat{Z}}_{c} ∥}_{1}}{∥ e_{j}^{'} {\hat{Z}}_{c} ∥_{1} {∥ e_{j}^{'} Z_{c} ∥}_{1}} ∥}_{1} \\ = ∥ \frac{e_{j}^{'} {\hat{Z}}_{c} ∥ e_{j}^{'} Z_{c} ∥_{1} - e_{j}^{'} {\hat{Z}}_{c} ∥ e_{j}^{'} {\hat{Z}}_{c} ∥_{1} + e_{j}^{'} {\hat{Z}}_{c} ∥ e_{j}^{'} {\hat{Z}}_{c} ∥_{1} - e_{j}^{'} Z_{c} P {∥ e_{j}^{'} {\hat{Z}}_{c} ∥}_{1}}{∥ e_{j}^{'} {\hat{Z}}_{c} ∥_{1} {∥ e_{j}^{'} Z_{c} ∥}_{1}} ∥_{1} \\ \leq \frac{∥ e_{j}^{'} {\hat{Z}}_{c} ∥ e_{j}^{'} Z_{c} ∥_{1} - e_{j}^{'} {\hat{Z}}_{c} ∥ e_{j}^{'} {\hat{Z}}_{c} ∥_{1} ∥_{1} + ∥ e_{j}^{'} {\hat{Z}}_{c} ∥ e_{j}^{'} {\hat{Z}}_{c} ∥_{1} - e_{j}^{'} Z_{c} P_{c} ∥ e_{j}^{'} {\hat{Z}}_{c} {∥_{1} ∥}_{1}}{∥ e_{j}^{'} {\hat{Z}}_{c} ∥_{1} {∥ e_{j}^{'} Z_{c} ∥}_{1}} \\ = \frac{| ∥ e_{j}^{'} Z_{c} ∥_{1} - ∥ e_{j}^{'} {\hat{Z}}_{c} ∥_{1} | + ∥ e_{j}^{'} {\hat{Z}}_{c} - e_{j}^{'} Z_{c} P_{c} ∥_{1}}{∥ e_{j}^{'} Z_{c} ∥_{1}} \leq \frac{2 ∥ e_{j}^{'} ({\hat{Z}}_{c} - Z_{c} P_{c}) ∥_{1}}{∥ e_{j}^{'} Z_{c} ∥_{1}} \\ = \frac{2 ∥ e_{j}^{'} ({\hat{Z}}_{c} - Z_{c} P_{c}) ∥_{1}}{∥ e_{j}^{'} Π_{c} ∥_{1}} = 2 ∥ e_{j}^{'} ({\hat{Z}}_{c} - Z_{c} P_{c}) ∥_{1} \leq 2 \sqrt{K} {∥ e_{j}^{'} ({\hat{Z}}_{c} - Z_{c} P_{c}) ∥}_{F}, \end{matrix}

by Lemma A9, we have

\begin{matrix} ∥ e_{j}^{'} ({\hat{Π}}_{c} - Π_{c} P_{c}) ∥_{1} = O (\sqrt{K} ∥ e_{j}^{'} ({\hat{Z}}_{c} - Z_{c} P_{c}) ∥_{F}) = O (ϖ K κ (Π_{c}^{'} Π_{c}) \sqrt{λ_{1} (Π_{c}^{'} Π_{c})}) . \end{matrix}

For row nodes

1 \leq i \leq n_{r}

, recall that

Z_{r} = Y_{*} J_{*} \equiv N_{U}^{- 1} N_{M} Π_{r}, {\hat{Z}}_{r} = {\hat{Y}}_{*} {\hat{J}}_{*}, Π_{r} (i, :) = \frac{Z_{r} (i, :)}{∥ Z_{r} {(i, :) ∥}_{1}}

and

{\hat{Π}}_{r} (i, :) = \frac{{\hat{Z}}_{r} (i, :)}{∥ {\hat{Z}}_{r} {(i, :) ∥}_{1}}

, where

N_{M}

and M are defined in the proof of Lemma 1 such that

U = Θ_{r} M \equiv Θ_{r} Π_{r} B_{r}

and

N_{M} (i, i) = \frac{1}{{∥ M (i, :) ∥}_{F}}

, similar as the proof for column nodes, we have

\begin{matrix} ∥ e_{i}^{'} ({\hat{Π}}_{r} - Π_{r} P_{r}) ∥_{1} \leq \frac{2 ∥ e_{i}^{'} ({\hat{Z}}_{r} - Z_{r} P_{r}) ∥_{1}}{∥ e_{i}^{'} Z_{r} ∥_{1}} \leq \frac{2 \sqrt{K} {∥ e_{i}^{'} ({\hat{Z}}_{r} - Z_{r} P_{r}) ∥}_{F}}{∥ e_{i}^{'} Z_{r} ∥_{1}} . \end{matrix}

Now, we provide a lower bound of

∥ e_{i}^{'} Z_{r} ∥_{1}

as below

\begin{matrix} ∥ e_{i}^{'} Z_{r} ∥_{1} & = ∥ e_{i}^{'} N_{U}^{- 1} N_{M} Π_{r} ∥_{1} = ∥ N_{U}^{- 1} (i, i) e_{i}^{'} N_{M} Π_{r} ∥_{1} = N_{U}^{- 1} (i, i) {∥ N_{M} (i, i) e_{i}^{'} Π_{r} ∥}_{1} = \frac{N_{M} (i, i)}{N_{U} (i, i)} \\ = {∥ U (i, :) ∥}_{F} N_{M} (i, i) = {∥ U (i, :) ∥}_{F} \frac{1}{{∥ M (i, :) ∥}_{F}} = {∥ U (i, :) ∥}_{F} \frac{1}{∥ e_{i}^{'} {M ∥}_{F}} \\ = {∥ U (i, :) ∥}_{F} \frac{1}{∥ e_{i}^{'} Θ_{r}^{- 1} {U ∥}_{F}} = {∥ U (i, :) ∥}_{F} \frac{1}{∥ Θ_{r}^{- 1} (i, i) e_{i}^{'} {U ∥}_{F}} = θ_{r} (i) \geq θ_{r, \min} . \end{matrix}

Therefore, by Lemma A9, we have

\begin{matrix} ∥ e_{i}^{'} ({\hat{Π}}_{r} - Π_{r} P_{r}) ∥_{1} & \leq \frac{2 \sqrt{K} {∥ e_{i}^{'} ({\hat{Z}}_{r} - Z_{r} P_{r}) ∥}_{F}}{∥ e_{i}^{'} Z_{r} ∥_{1}} \leq \frac{2 \sqrt{K} {∥ e_{i}^{'} ({\hat{Z}}_{r} - Z_{r} P_{r}) ∥}_{F}}{θ_{r, \min}} \\ = O (\frac{K^{5.5} θ_{r, \max}^{15} ϖ κ^{4.5} (Π_{r}^{'} Π_{r}) κ (Π_{c}) λ_{1}^{1.5} (Π_{r}^{'} Π_{r})}{θ_{r, \min}^{15} π_{r, \min}}) . \end{matrix}

□

Appendix E.2. Proof of Corollary 1

Proof.

Under conditions of Corollary 1, we have

\begin{matrix} ∥ e_{i}^{'} ({\hat{Π}}_{r} - Π_{r} P_{r}) ∥_{1} = O (\frac{K^{5.5} θ_{r, \max}^{15} ϖ κ^{4.5} (Π_{r}^{'} Π_{r}) κ (Π_{c}) λ_{1}^{1.5} (Π_{r}^{'} Π_{r})}{θ_{r, \min}^{15} π_{r, \min}}) = O (\frac{θ_{r, \max}^{15} ϖ \sqrt{n_{r}}}{θ_{r, \min}^{15}}), \\ ∥ e_{j}^{'} ({\hat{Π}}_{c} - Π_{c} P_{c}) ∥_{1} = O (ϖ K κ (Π_{c}^{'} Π_{c}) \sqrt{λ_{1} (Π_{c}^{'} Π_{c})}) = O (ϖ \sqrt{n_{c}}) . \end{matrix}

Under conditions of Corollary 1, Lemma A7 gives

ϖ = O (\frac{\sqrt{P_{\max} θ_{r, \max} \log (n_{r} + n_{c})}}{θ_{r, \min} σ_{K} (P) \sqrt{n_{r} n_{c}}})

, which gives that

\begin{matrix} ∥ e_{i}^{'} ({\hat{Π}}_{r} - Π_{r} P_{r}) ∥_{1} = O (\frac{θ_{r, \max}^{15} ϖ \sqrt{n_{r}}}{θ_{r, \min}^{15}}) = O (\frac{θ_{r, \max}^{15.5} \sqrt{P_{\max} \log (n_{r} + n_{c})}}{θ_{r, \min}^{16} σ_{K} (P) \sqrt{n_{c}}}), \\ ∥ e_{j}^{'} ({\hat{Π}}_{c} - Π_{c} P_{c}) ∥_{1} = O (ϖ \sqrt{n_{c}}) = O (\frac{\sqrt{P_{\max} θ_{r, \max} \log (n_{r} + n_{c})}}{θ_{r, \min} σ_{K} (P) \sqrt{n_{r}}}) . \end{matrix}

By basic algebra, this corollary follows. □

Appendix F. SVM-Cone Algorithm

For readers’ convenience, we briefly introduce the SVM-cone algorithm given in [26] and provide another view that the SVM-cone algorithm exactly recovers

Π_{r}

when the input is

U_{*}

(or

U_{*, 2}

). Let S be a matrix whose rows have unit

l_{2}

norm, and S can be written as

S = H S_{C}

, where

H \in R^{n \times K}

with nonnegative entries, no row of H is 0, and

S_{C} \in R^{K \times m}

corresponding to K rows of S (i.e., there exists an index set

I

with K entries such that

S_{C} = S (I, :)

). Inferring H from S is called the ideal cone problem, i.e., Problem 1 in [26]. The ideal cone problem can be solved by applying one-class SVM to the rows of S, and the K rows of

S_{C}

are the support vectors found by one-class SVM:

\begin{matrix} maximize b s . t . w^{'} S (i, :) \geq b (for i = 1, 2, \dots, n) and {∥ w ∥}_{F} \leq 1 . \end{matrix}

(A5)

The solution

(w, b)

for the ideal cone problem when

{(S_{C} S_{C}^{'})}^{- 1} 1 > 0

is given by

\begin{matrix} w = b^{- 1} \cdot S_{C}^{'} \frac{{(S_{C} S_{C}^{'})}^{- 1} 1}{1^{'} {(S_{C} S_{C}^{'})}^{- 1} 1}, b = \frac{1}{\sqrt{1^{'} {(S_{C} S_{C}^{'})}^{- 1} 1}} . \end{matrix}

(A6)

For the empirical case, let

\hat{S} \in R^{n \times m}

be a matrix where all rows have unit

l_{2}

norm, infer H from

\hat{S}

with given K is called the empirical cone problem, i.e., Problem 2 in [26]. For the empirical cone problem, one-class SVM is applied to all rows of

\hat{S}

to obtain

w

and b’s estimations

\hat{w}

and

\hat{b}

. Then, apply the K-means algorithm to rows of

\hat{S}

that are close to the hyperplane into K clusters, and an estimation of the index set

I

can be obtained from the K clusters provided. Algorithm A3 below is the SVM-cone algorithm provided in [26].

Algorithm 3: SVM-cone [26]

Require:: $\hat{S} \in R^{n \times m}$ with rows having unit $l_{2}$ norm, number of corners K, estimated distance corners from hyperplane $γ$ .
Ensure:: The near-corner index set $\hat{I}$ .
1:: Run one-class SVM on $\hat{S} (i, :)$ to obtain $\hat{w}$ and $\hat{b}$ .
2:: Run K-means algorithm to the set ${\hat{S} (i, :) | \hat{S} (i, :) \hat{w} \leq \hat{b} + γ}$ that are close to the hyperplane into K clusters.
3:: Pick one point from each cluster to obtain the near-corner set $\hat{I}$ .

As suggested in [26], we can start

γ = 0

and incrementally increase it until K distinct clusters are found.

Now, turn to our DiMSC algorithm, and focus on estimating

I_{r}

with given

U_{*}, U_{*, 2}

, and K. By Lemmas 1 and A1, we know that

U_{*}

and

U_{*, 2}

enjoy the ideal cone structure, and Lemma 2 guarantees that one-class SVM can be applied to rows of

U_{*}

and

U_{*, 2}

. Set

w_{1} = b_{1}^{- 1} U_{*}^{'} (I_{r}, :) \frac{{(U_{*} (I_{r}, :) U_{*}^{'} (I_{r}, :))}^{- 1} 1}{1^{'} {(U_{*} (I_{r}, :) U_{*}^{'} (I_{r}, :))}^{- 1}}, b_{1} = \frac{1}{\sqrt{1^{'} {(U_{*} (I_{r}, :) U_{*}^{'} (I_{r}, :))}^{- 1} 1}}

, and

w_{2} = b_{2}^{- 1} U_{*, 2}^{'} (I_{r}, :) \frac{{(U_{*, 2} (I_{r}, :) U_{*, 2}^{'} (I_{r}, :))}^{- 1} 1}{1^{'} {(U_{*, 2} (I_{r}, :) U_{*, 2}^{'} (I_{r}, :))}^{- 1}}, b_{2} = \frac{1}{\sqrt{1^{'} {(U_{*, 2} (I_{r}, :) U_{*, 2}^{'} (I_{r}, :))}^{- 1} 1}}

. Now that

w_{1}

and

b_{1}

are solutions of the one-class SVM in Equation (A5) by setting

S = U_{*}

, and

w_{2}

and

b_{2}

are solutions of the one-class SVM in Equation (A5) by setting

S = U_{*, 2}

. Lemma A11 says that if row node i is a pure node, we have

U_{*} (i, :) w_{1} = b_{1}

, and this suggests that in the SVM-cone algorithm, if the input matrix is

U_{*}

, by setting

γ = 0

, we can find all pure row nodes, i.e., the set

{U_{*} (i, :) | U_{*} (i, :) w_{1} = b_{1}}

contains all rows of

U_{*}

respective to pure row nodes while including mixed row nodes. By Lemma 1, these pure row nodes belong to the K distinct row communities such that if row nodes

i, \bar{i}

are in the same row community, then we have

U_{*} (i, :) = U_{*} (\bar{i}, :)

, and this is the reason that we need to apply the K-means algorithm on the set obtained in Step 2 in the SVM-cone algorithm to obtain the K distinct row communities, and this is also the reason that we said the SVM-cone algorithm returns the index set

I

exactly when the input is

U_{*}

. These conclusions also hold when we set the input in the SVM-cone algorithm as

U_{*, 2}

.

Lemma A10.

Under

D i D C M M_{n_{r}, n_{c}} (K, P, Π_{r}, Π_{c}, Θ_{r})

, for

1 \leq i \leq n_{r}

,

U_{*} (i, :)

can be written as

U_{*} (i, :) = r_{1} (i) Φ_{1} (i, :) U_{*} (I_{r}, :)

, where

r_{1} (i) \geq 1

. Meanwhile,

r_{1} (i) = 1

and

Φ_{1} (i, :) = e_{k}^{'}

if i is a pure node such that

Π_{r} (i, k) = 1

;

r_{1} (i) > 1

and

Φ_{1} (i, :) \neq e_{k}^{'}

if

Π_{r} (i, k) < 1

for

1 \leq k \leq K

. Similarly,

U_{*, 2} (i, :)

can be written as

U_{*, 2} (i, :) = r_{2} (i) Φ_{2} (i, :) U_{*, 2} (I_{r}, :)

, where

r_{2} (i) \geq 1

. Meanwhile,

r_{2} (i) = 1

and

Φ_{2} (i, :) = e_{k}^{'}

if

Π_{r} (i, k) = 1

;

r_{2} (i) > 1

and

Φ_{2} (i, :) \neq e_{k}^{'}

if

Π_{r} (i, k) < 1

for

1 \leq k \leq K

.

Proof.

Since

U_{*} = Y U_{*} (I_{r}, :)

by Lemma 1, for

1 \leq i \leq n_{r}

, we have

\begin{matrix} U_{*} (i, :) = Y (i, :) U_{*} (I_{r}, :) = Y (i, :) 1 \frac{Y (i, :)}{Y (i, :) 1} U_{*} (I_{r}, :) = r_{1} (i) Φ_{1} (i, :) U_{*} (I_{r}, :), \end{matrix}

where we set

r_{1} (i) = Y (i, :) 1

,

Φ_{1} (i, :) = \frac{Y (i, :)}{Y (i, :) 1}

, and

1

is a

K \times 1

vector with all entries being ones.

By the proof of Lemma 1,

Y (i, :) = \frac{Π_{r} (i, :)}{{∥ M (i, :) ∥}_{F}} Θ_{r}^{- 1} (I_{r}, I_{r}) N_{U}^{- 1} (I_{r}, I_{r})

, where

M = Π_{r} Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :)

. For convenience, set

T = Θ_{r}^{- 1} (I_{r}, I_{r}), Q = N_{U}^{- 1} (I_{r}, I_{r})

, and

R = U (I_{r}, :)

(such setting of

T, Q, R

is only for used for notation convenience for the proof of Lemma A10).

On the one hand, if row node i is pure such that

Π_{r} (i, k) = 1

for certain k among

{1, 2, \dots, K}

(i.e.,

Π_{r} (i, :) = e_{k}

if

Π_{r} (i, k) = 1

), we have

M (i, :) = Π_{r} (i, :) Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :) = T (k, k) R (k, :)

, and

Π_{r} (i, :) T Q = T (k, k) Q (k, :)

, which give that

Y (i, :) = \frac{T (k, k) Q (k, :)}{{∥ T (k, k) R (k, :) ∥}_{F}} = \frac{Q (k, :)}{{∥ R (k, :) ∥}_{F}}

. Recall that the k-th diagonal entry of

N_{U}^{- 1} (I_{r}, I_{r})

is

∥ [U (I_{r}, :)] (k, :) ∥_{F}

, i.e.,

Q (k, :) 1 = {∥ R (k, :) ∥}_{F}

, which gives that

r_{1} (i) = Y (i, :) 1 = 1

and

Φ_{1} (i, :) = e_{k}^{'}

when

Π_{r} (i, k) = 1

.

On the other hand, if i is a mixed node, since

{∥ M (i, :) ∥}_{F} = ∥ Π_{r} (i, :) Θ_{r}^{- 1} (I_{r}, I_{r}) U (I_{r}, :) ∥_{F} = ∥ \sum_{k = 1}^{K} Π_{r} (i, k) T (k, k) R (k, :) ∥_{F} < \sum_{k = 1}^{K} Π_{r} (i, k) T (k, k) {∥ R (k, :) ∥}_{F} = \sum_{k = 1}^{K} Π_{r} (i, k) T (k, k) Q (k, k)

, combine it with

Π_{r} (i, :) T Q 1 = \sum_{k = 1}^{K} Π_{r} (i, k) T (k, k) Q (k, k)

, so

r_{1} (i) = Y (i, :) 1 = \frac{Π_{r} (i, :) T Q 1}{{∥ M (i, :) ∥}_{F}} > 1

. The lemma follows by a similar analysis for

U_{*, 2}

. □

Lemma A11.

Under

D i D C M M_{n_{r}, n_{c}} (K, P, Π_{r}, Π_{c}, Θ_{r})

, for

1 \leq i \leq n_{r}

, if row node i is a pure node such that

Π_{r} (i, k) = 1

for certain k, we have

\begin{matrix} U_{*} (i, :) w_{1} = b_{1} and U_{*, 2} (i, :) w_{2} = b_{2}, \end{matrix}

Meanwhile, if row node i is a mixed node, the above equalities do not hold.

Proof.

For the claim that

U_{*} (i, :) w_{1} = b_{1}

holds when i is pure, by Lemma A10, when i is a pure node such that

Π_{r} (i, k) = 1

,

U_{*} (i, :)

can be written as

U_{*} (i, :) = e_{k}^{'} U_{*} (I_{r}, :)

, so

U_{*} (i, :) w_{1} = b_{1}

holds surely. When i is a mixed node, by Lemma A10,

r_{1} (i) > 1

and

Φ_{1} (i, :) \neq e_{k}

for any

k = 1, 2, \dots, K

; hence

U_{*} (i, :) \neq e_{k}^{'} U_{*} (I_{r}, :)

if i is mixed, which gives the result. Follow a similar analysis, we obtain the results associated with

U_{*, 2}

, and the lemma follows. □

References

Fortunato, S. Community detection in graphs. Phys. Rep. 2010, 486, 75–174. [Google Scholar] [CrossRef]
Fortunato, S.; Hric, D. Community detection in networks: A user guide. Phys. Rep. 2016, 659, 1–44. [Google Scholar] [CrossRef]
Holland, P.W.; Laskey, K.B.; Leinhardt, S. Stochastic blockmodels: First steps. Soc. Netw. 1983, 5, 109–137. [Google Scholar] [CrossRef]
Karrer, B.; Newman, M.E.J. Stochastic blockmodels and community structure in networks. Phys. Rev. E 2011, 83, 16107. [Google Scholar] [CrossRef]
Rohe, K.; Chatterjee, S.; Yu, B. Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Stat. 2011, 39, 1878–1915. [Google Scholar] [CrossRef]
Zhao, Y.; Levina, E.; Zhu, J. Consistency of community detection in networks under degree-corrected stochastic block models. Ann. Stat. 2012, 40, 2266–2292. [Google Scholar] [CrossRef]
Qin, T.; Rohe, K. Regularized spectral clustering under the degree-corrected stochastic blockmodel. Adv. Neural Inf. Process. Syst. 2013, 26, 3120–3128. [Google Scholar]
Jin, J. Fast community detection by SCORE. Ann. Stat. 2015, 43, 57–89. [Google Scholar] [CrossRef]
Lei, J.; Rinaldo, A. Consistency of spectral clustering in stochastic block models. Ann. Stat. 2015, 43, 215–237. [Google Scholar] [CrossRef]
Cai, T.T.; Li, X. Robust and computationally feasible community detection in the presence of arbitrary outlier nodes. Ann. Stat. 2015, 43, 1027–1059. [Google Scholar] [CrossRef]
Joseph, A.; Yu, B. Impact of regularization on spectral clustering. Ann. Stat. 2016, 44, 1765–1791. [Google Scholar] [CrossRef]
Chen, Y.; Li, X.; Xu, J. Convexified modularity maximization for degree-corrected stochastic block models. Ann. Stat. 2018, 46, 1573–1602. [Google Scholar] [CrossRef]
Passino, F.S.; Heard, N.A. Bayesian estimation of the latent dimension and communities in stochastic blockmodels. Stat. Comput. 2020, 30, 1291–1307. [Google Scholar] [CrossRef]
Li, X.; Chen, Y.; Xu, J. Convex Relaxation Methods for Community Detection. Stat. Sci. 2021, 36, 2–15. [Google Scholar] [CrossRef]
Jing, B.; Li, T.; Ying, N.; Yu, X. Community detection in sparse networks using the symmetrized Laplacian inverse matrix (SLIM). Stat. Sin. 2022, 32, 1–22. [Google Scholar] [CrossRef]
Abbe, E. Community detection and stochastic block models: Recent developments. J. Mach. Learn. Res. 2017, 18, 6446–6531. [Google Scholar]
Airoldi, E.M.; Blei, D.M.; Fienberg, S.E.; Xing, E.P. Mixed Membership Stochastic Blockmodels. J. Mach. Learn. Res. 2008, 9, 1981–2014. [Google Scholar]
Ball, B.; Karrer, B.; Newman, M.E.J. Efficient and principled method for detecting communities in networks. Phys. Rev. E 2011, 84, 36103. [Google Scholar] [CrossRef]
Wang, F.; Li, T.; Wang, X.; Zhu, S.; Ding, C. Community discovery using nonnegative matrix factorization. Data Min. Knowl. Discov. 2011, 22, 493–521. [Google Scholar] [CrossRef]
Gopalan, P.K.; Blei, D.M. Efficient discovery of overlapping communities in massive networks. Proc. Natl. Acad. Sci. USA 2013, 110, 14534–14539. [Google Scholar] [CrossRef]
Anandkumar, A.; Ge, R.; Hsu, D.; Kakade, S.M. A tensor approach to learning mixed membership community models. J. Mach. Learn. Res. 2014, 15, 2239–2312. [Google Scholar]
Kaufmann, E.; Bonald, T.; Lelarge, M. A spectral algorithm with additive clustering for the recovery of overlapping communities in networks. Theor. Comput. Sci. 2017, 742, 3–26. [Google Scholar] [CrossRef]
Panov, M.; Slavnov, K.; Ushakov, R. Consistent estimation of mixed memberships with successive projections. In Proceedings of the International Conference on Complex Networks and their Applications, Lyon, France, 29 November–1 December 2017; Springer: Cham, Switzerland, 2017; pp. 53–64. [Google Scholar]
Jin, J.; Ke, Z.T.; Luo, S. Mixed membership estimation for social networks. arXiv 2017, arXiv:1708.07852. [Google Scholar] [CrossRef]
Mao, X.; Sarkar, P.; Chakrabarti, D. On Mixed Memberships and Symmetric Nonnegative Matrix Factorizations. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 2324–2333. [Google Scholar]
Mao, X.; Sarkar, P.; Chakrabarti, D. Overlapping Clustering Models, and One (class) SVM to Bind Them All. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; Volume 31, pp. 2126–2136. [Google Scholar]
Mao, X.; Sarkar, P.; Chakrabarti, D. Estimating Mixed Memberships With Sharp Eigenvector Deviations. J. Am. Stat. Assoc. 2020, 116, 1928–1940. [Google Scholar] [CrossRef]
Wang, Y.; Bu, Z.; Yang, H.; Li, H.J.; Cao, J. An effective and scalable overlapping community detection approach: Integrating social identity model and game theory. Appl. Math. Comput. 2021, 390, 125601. [Google Scholar] [CrossRef]
Zhang, Y.; Levina, E.; Zhu, J. Detecting Overlapping Communities in Networks Using Spectral Methods. SIAM J. Math. Data Sci. 2020, 2, 265–283. [Google Scholar] [CrossRef]
Rohe, K.; Qin, T.; Yu, B. Co-clustering directed graphs to discover asymmetries and directional communities. Proc. Natl. Acad. Sci. USA 2016, 113, 12679–12684. [Google Scholar] [CrossRef]
Wang, Z.; Liang, Y.; Ji, P. Spectral Algorithms for Community Detection in Directed Networks. J. Mach. Learn. Res. 2020, 21, 1–45. [Google Scholar]
Ji, P.; Jin, J. Coauthorship and citation networks for statisticians. Ann. Appl. Stat. 2016, 10, 1779–1812. [Google Scholar] [CrossRef]
Zhou, Z.; Amini, A.A. Analysis of spectral clustering algorithms for community detection: The general bipartite setting. J. Mach. Learn. Res. 2019, 20, 1–47. [Google Scholar]
Laenen, S.; Sun, H. Higher-order spectral clustering of directed graphs. Adv. Neural Inf. Process. Syst. 2020, 33, 941–951. [Google Scholar]
Qing, H.; Wang, J. Directed mixed membership stochastic blockmodel. arXiv 2021, arXiv:2101.02307. [Google Scholar]
Wang, Y.J.; Wong, G.Y. Stochastic Blockmodels for Directed Graphs. J. Am. Stat. Assoc. 1987, 82, 8–19. [Google Scholar] [CrossRef]
Fagiolo, G. Clustering in complex directed networks. Phys. Rev. E 2007, 76, 026107. [Google Scholar] [CrossRef] [PubMed]
Leicht, E.A.; Newman, M.E. Community structure in directed networks. Phys. Rev. Lett. 2008, 100, 118703. [Google Scholar] [CrossRef] [PubMed]
Kim, Y.; Son, S.W.; Jeong, H. Finding communities in directed networks. Phys. Rev. E 2010, 81, 016103. [Google Scholar] [CrossRef]
Malliaros, F.D.; Vazirgiannis, M. Clustering and Community Detection in Directed Networks: A Survey. Phys. Rep. 2013, 533, 95–142. [Google Scholar] [CrossRef]
Zhang, X.; Lian, B.; Lewis, F.L.; Wan, Y.; Cheng, D. Directed Graph Clustering Algorithms, Topology, and Weak Links. IEEE Trans. Syst. Man, Cybern. Syst. 2021, 52, 3995–4009. [Google Scholar] [CrossRef]
Zhang, J.; Wang, J. Identifiability and parameter estimation of the overlapped stochastic co-block model. Stat. Comput. 2022, 32, 1–14. [Google Scholar] [CrossRef]
Florescu, L.; Perkins, W. Spectral thresholds in the bipartite stochastic block model. In Proceedings of the Conference on Learning Theory. PMLR, New York, NY, USA, 23–26 June 2016; pp. 943–959. [Google Scholar]
Neumann, S. Bipartite stochastic block models with tiny clusters. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NIPS 2018), Montréal, QC, Canada, 3–8 December 2018. [Google Scholar]
Ndaoud, M.; Sigalla, S.; Tsybakov, A.B. Improved clustering algorithms for the bipartite stochastic block model. IEEE Trans. Inf. Theory 2021, 68, 1960–1975. [Google Scholar] [CrossRef]
Mantzaris, A.V. Uncovering nodes that spread information between communities in social networks. EPJ Data Sci. 2014, 3, 1–17. [Google Scholar] [CrossRef]
McSherry, F. Spectral partitioning of random graphs. In Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science, Newport Beach, CA, USA, 8–11 October 2001; pp. 529–537. [Google Scholar]
Massoulié, L. Community detection thresholds and the weak Ramanujan property. In Proceedings of the Forty-Sixth Annual ACM Symposium on Theory of Computing, New York, NY, USA, 31 May–3 June 2014; pp. 694–703. [Google Scholar]
Mossel, E.; Neeman, J.; Sly, A. Reconstruction and estimation in the planted partition model. Probab. Theory Relat. Fields 2015, 162, 431–461. [Google Scholar] [CrossRef]
Abbe, E.; Bandeira, A.S.; Hall, G. Exact recovery in the stochastic block model. IEEE Trans. Inf. Theory 2015, 62, 471–487. [Google Scholar] [CrossRef]
Hajek, B.; Wu, Y.; Xu, J. Achieving exact cluster recovery threshold via semidefinite programming. IEEE Trans. Inf. Theory 2016, 62, 2788–2797. [Google Scholar] [CrossRef]
Mossel, E.; Neeman, J.; Sly, A. A proof of the block model threshold conjecture. Combinatorica 2018, 38, 665–708. [Google Scholar] [CrossRef]
Qing, H. Studying Asymmetric Structure in Directed Networks by Overlapping and Non-Overlapping Models. Entropy 2022, 24, 1216. [Google Scholar] [CrossRef]
Gillis, N.; Vavasis, S.A. Semidefinite Programming Based Preconditioning for More Robust Near-Separable Nonnegative Matrix Factorization. SIAM J. Optim. 2015, 25, 677–698. [Google Scholar] [CrossRef]
Qing, H. A useful criterion on studying consistent estimation in community detection. Entropy 2022, 24, 1098. [Google Scholar] [CrossRef] [PubMed]
Von Luxburg, U. A tutorial on spectral clustering. Stat. Comput. 2007, 17, 395–416. [Google Scholar] [CrossRef]
Ke, Z.T.; Jin, J. The SCORE normalization, especially for highly heterogeneous network and text data. arXiv 2022, arXiv:2204.11097. [Google Scholar]
Newman, M.E. Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 2006, 103, 8577–8582. [Google Scholar] [CrossRef] [PubMed]
Chang, C.C.; Lin, C.J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. (Tist) 2011, 2, 1–27. [Google Scholar] [CrossRef]
Xu, R.; Wunsch, D. Survey of clustering algorithms. IEEE Trans. Neural Netw. 2005, 16, 645–678. [Google Scholar] [CrossRef] [PubMed]
Palmer, W.R.; Zheng, T. Spectral clustering for directed networks. In Proceedings of the International Conference on Complex Networks and Their Applications, Madrid, Spain, 1–3 December 2020; Springer: Cham, Switzerland, 2020; pp. 87–99. [Google Scholar]
Qing, H. Degree-corrected distribution-free model for community detection in weighted networks. Sci. Rep. 2022, 12, 15153. [Google Scholar] [CrossRef] [PubMed]
Erdös, P.; Rényi, A. On the evolution of random graphs. In The Structure and Dynamics of Networks; Princeton University Press: Princeton, NJ, USA, 2011; pp. 38–82. [Google Scholar] [CrossRef]
Chen, Y.; Chi, Y.; Fan, J.; Ma, C. Spectral Methods for Data Science: A Statistical Perspective. Found. Trends Mach. Learn. 2021, 14, 566–806. [Google Scholar] [CrossRef]
Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
Žalik, K.R.; Žalik, B. Memetic algorithm using node entropy and partition entropy for community detection in networks. Inf. Sci. 2018, 445, 38–49. [Google Scholar] [CrossRef]
Feutrill, A.; Roughan, M. A review of Shannon and differential entropy rate estimation. Entropy 2021, 23, 1046. [Google Scholar] [CrossRef]
Adamic, L.A.; Glance, N. The political blogosphere and the 2004 US election: Divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery, Chicago, IL, USA, 21–25 August 2005; pp. 36–43. [Google Scholar]
Kunegis, J. Konect: The koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, 13–17 May 2013; pp. 1343–1350. [Google Scholar]
Zhang, H.; Guo, X.; Chang, X. Randomized spectral clustering in large-scale stochastic block models. J. Comput. Graph. Stat. 2022, 31, 887–906. [Google Scholar] [CrossRef]
Tropp, J.A. User-Friendly Tail Bounds for Sums of Random Matrices. Found. Comput. Math. 2012, 12, 389–434. [Google Scholar] [CrossRef]

Figure 1. Illustration for directed network and bipartite network. Panel (a): directed network; Panel (b): bipartite network.

Figure 2. Panel (a): plot of

U_{*}

and the hyperplane formed by

U_{*} (I_{r}, :)

. Blue points denote rows respective to mixed row nodes of

U_{*}

, and black points denote the K rows of the corner matrix

U_{*} (I_{r}, :)

. The plane in Panel (a) is the hyperplane formed by the triangle of the 3 rows of

U_{*} (I_{r}, :)

. Panel (b): plot of V and the ideal simplex formed by

V (I_{c}, :)

. Blue points denote rows respective to mixed column nodes of V, and black points denote the K rows of the corner matrix

V (I_{c}, :)

. Since

K = 3

, for visualization, we have projected these points from

R^{3}

to

R^{2}

.

Figure 3. Flowchart of Algorithm 1.

Figure 4. Errors against increasing

n_{0}

. y-axis: MHamm. Panel (a): Experiment 1 (a); Panel (b): Experiment 1 (b); Panel (c): Experiment 1 (c); Panel (d): Experiment 1 (d).

Figure 5. Errors against increasing z. y-axis: MHamm. Panel (a): Experiment 2 (a); Panel (b): Experiment 2 (b); Panel (c): Experiment 2 (c); Panel (d): Experiment 2 (d); Panel (e): Experiment 2 (e); Panel (f): Experiment 2 (f); Panel (g): Experiment 2 (g); Panel (h): Experiment 2 (h).

Figure 6. Errors against increasing

β

. y-axis: MHamm. Panel (a): Experiment 3 (a); Panel (b): Experiment 3 (b); Panel (c): Experiment 3 (c); Panel (d): Experiment 3 (d); Panel (e): Experiment 3 (e); Panel (f): Experiment 3 (f); Panel (g): Experiment 3 (g); Panel (h): Experiment 3 (h).

Figure 7. Errors against increasing

ρ

. y-axis: MHamm: Panel (a): Experiment 4 (a); Panel (b): Experiment 4 (b); Panel (c): Experiment 4 (c); Panel (d): Experiment 4 (d); Panel (e): Experiment 4 (e); Panel (f): Experiment 4 (f); Panel(g): Experiment 4 (g); Panel (h): Experiment 4 (h).

Figure 8. Phase transition for DiMSC: darker pixels represent lower error rates. The red lines represent

\frac{| α_{in} - α_{out} |}{\sqrt{\max (α_{in}, α_{out})}} = 1

. Panel (a): Experiment 5 (a); Panel (b): Experiment 5 (b).

Figure 9. Numerical results for Experiment 6. Panel (a): MHamm; Panel (b): running time.

Figure 10. Illustration for a directed network under Model Setup 1. Panel (a): Adjacency matrix of

N

, where black square denotes 1; Panel (b): directed network

N

, where red (blue) points indicate row (column) nodes. The error rate MHamm defined in Equation (16) of our DiSMC algorithm for this directed network

N

is 0.0377.

Figure 11. Illustration for a directed network under Model Setup 2. Panel (a): Adjacency matrix A; Panel (b): directed network

N

. MHamm of DiMSC for this directed network

N

is 0.0424.

Figure 12. Illustration for a bipartite network under Model Setup 3. Panel (a): Adjacency matrix A; Panel (b): bipartite network

N

. MHamm of DiMSC for this bipartite network

N

is 0.0313.

Figure 13. Illustration for a bipartite network under Model Setup 4. Panel (a): Adjacency matrix A; Panel (b): bipartite network

N

. MHamm of DiMSC for this bipartite network

N

is 0.0320.

Figure 14. Illustration for a directed network under Model Setup 5. Panels (a,b) show the row and column communities, respectively. In these two panels, dots in the same color are pure nodes in the same communities, and a square indicates mixed nodes. MHamm of DiMSC for this directed network

N

is 0.0181.

Figure 15. Illustration for a directed network under Model Setup 6. Panels (a,b) show the row and column communities, respectively. MHamm of DiMSC for this directed network

N

is 0.0185.

Figure 16. Illustration for a directed network under Model Setup 7. Panels (a,b) show the row and column communities, respectively. MHamm of DiMSC for this directed network

N

is 0.0266.

Figure 17. Illustration for a directed network under Model Setup 8. Panels (a,b) show the row and column communities, respectively. MHamm of DiMSC for this directed network

N

is 0.0279.

Figure 18. Leading 20 singular values of real-world directed networks used in this paper. Panel (a): political blogs

A_{1}

; Panel (b): political blogs

A_{3}

; Panel (c): political blogs

A_{6}

; Panel (d): political blogs

A_{9}

; Panel (e): Wikipedia links (gan)

A_{1}

; Panel (f): Wikipedia links (gan)

A_{30}

; Panel (g): Wikipedia links (gan)

A_{60}

; Panel (h): Wikipedia links (gan)

A_{90}

; Panel (i): Wikipedia links (nah)

A_{1}

; Panel (j): Wikipedia links (nah)

A_{20}

; Panel (k): Wikipedia links (nah)

A_{30}

; Panel (l): Wikipedia links (nah)

A_{40}

.

Figure 19. Row and column communities detected by DiMSC for political blogs. Colors indicate clusters, and a green square indicates highly mixed nodes, where the row and column communities are obtained from

{\hat{ℓ}}_{r}

and

{\hat{ℓ}}_{c}

, respectively. Panel (a): political blogs

A_{1}

; Panel (b): political blogs

A_{1}

; Panel (c): political blogs

A_{3}

; Panel (d): political blogs

A_{3}

; Panel (e): political blogs

A_{6}

; Panel (f): political blogs

A_{6}

; Panel (g): political blogs

A_{9}

; Panel (h): political blogs

A_{9}

.

Figure 20. Row and column communities detected by DiMSC for Wikipedia links (gan)

A_{90}

and Wikipedia links (nah)

A_{40}

. Colors indicate clusters, where the row and column communities are obtained from

{\hat{ℓ}}_{r}

and

{\hat{ℓ}}_{c}

, respectively. Panel (a): Wikipedia links (gan)

A_{90}

; Panel (b): Wikipedia links (gan)

A_{90}

; Panel (c): Wikipedia links (nah)

A_{40}

; Panel (d): Wikipedia links (nah)

A_{40}

.

Table 1.

τ_{r}, τ_{c}

, and

{Hamm}_{r c}

obtained from DiMSC for real-world directed networks used in this paper.

Table 1.

τ_{r}, τ_{c}

, and

{Hamm}_{r c}

obtained from DiMSC for real-world directed networks used in this paper.

Data	$τ$ $_{r}$	$τ$ $_{c}$	Hamm $_{rc}$
Political blogs $A_{1}$	0.0455	0.1353	0.0893
Political blogs $A_{3}$	0.0481	0.1570	0.0705
Political blogs $A_{6}$	0.0386	0.1368	0.0662
Political blogs $A_{9}$	0.0443	0.1772	0.0771
Wikipedia links (gan) $A_{1}$	0.1505	0.6051	0.3528
Wikipedia links (gan) $A_{30}$	0.0817	0.1902	0.0547
Wikipedia links (gan) $A_{60}$	0.0054	0.1145	0.0664
Wikipedia links (gan) $A_{90}$	0	0	0.0203
Wikipedia links (nah) $A_{1}$	0.2718	0.3521	0.2065
Wikipedia links (nah) $A_{20}$	0.0937	0.1722	0.0488
Wikipedia links (nah) $A_{30}$	0	0	0.0046
Wikipedia links (nah) $A_{40}$	0	0	0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Estimating Mixed Memberships in Directed Networks by Spectral Clustering

Abstract

1. Introduction

2. The Directed Degree Corrected Mixed Membership Model

3. Algorithm

3.1. The Ideal Simplex (IS), the Ideal Cone (IC), and the Ideal DiMSC

3.2. Dimsc Algorithm

3.3. Computational Complexity

4. Consistency Results

5. Simulations

6. Application to Real-World Directed Networks

7. Discussion and Conclusions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Proof for Identifiability

Appendix A.1. Proof of Proposition 1

Appendix B. Ideal Simplex, Ideal Cone

Appendix B.1. Proof of Lemma 1

Appendix B.2. Proof of Lemma 2

Appendix B.3. Proof of Theorem 1

Appendix C. Equivalence Algorithm

Appendix D. Basic Properties of Ω

Appendix E. Proof of Consistency for DiMSC

Appendix E.1. Proof of Theorem 2

Appendix E.2. Proof of Corollary 1

Appendix F. SVM-Cone Algorithm

References

Article Metrics

Citations

Article Access Statistics