Structural Entropy of the Stochastic Block Models

Han, Jie; Guo, Tao; Zhou, Qiaoqiao; Han, Wei; Bai, Bo; Zhang, Gong

doi:10.3390/e24010081

Open AccessArticle

Structural Entropy of the Stochastic Block Models

by

Jie Han

¹

,

Tao Guo

^1,*

,

Qiaoqiao Zhou

²

,

Wei Han

¹

,

Bo Bai

¹

and

Gong Zhang

¹

Theory Lab, Central Research Institute, 2012 Labs, Huawei Tech. Co., Ltd., Hong Kong SAR, China

²

Department of Computer Science, School of Computing, National University of Singapore, Singapore 11741, Singapore

^*

Author to whom correspondence should be addressed.

Entropy 2022, 24(1), 81; https://doi.org/10.3390/e24010081

Submission received: 24 November 2021 / Revised: 28 December 2021 / Accepted: 30 December 2021 / Published: 3 January 2022

(This article belongs to the Collection Graphs and Networks from an Algorithmic Information Perspective)

Download

Browse Figure

Versions Notes

Abstract

:

With the rapid expansion of graphs and networks and the growing magnitude of data from all areas of science, effective treatment and compression schemes of context-dependent data is extremely desirable. A particularly interesting direction is to compress the data while keeping the “structural information” only and ignoring the concrete labelings. Under this direction, Choi and Szpankowski introduced the structures (unlabeled graphs) which allowed them to compute the structural entropy of the Erdős–Rényi random graph model. Moreover, they also provided an asymptotically optimal compression algorithm that (asymptotically) achieves this entropy limit and runs in expectation in linear time. In this paper, we consider the stochastic block models with an arbitrary number of parts. Indeed, we define a partitioned structural entropy for stochastic block models, which generalizes the structural entropy for unlabeled graphs and encodes the partition information as well. We then compute the partitioned structural entropy of the stochastic block models, and provide a compression scheme that asymptotically achieves this entropy limit.

Keywords:

structural entropy; stochastic block model (SBM); network compression; optimal compression algorithm

1. Introduction

Shannon’s metric of “Entropy” of information is a foundational concept of information theory [1,2]. Given a discrete random variable X with support set (that is, the possible outcomes)

x_{1}, x_{2}, \dots, x_{n}

, which occurs with probability

p_{1}, p_{2}, \dots, p_{n}

, the entropy of X is defined as

H (X) : = - \sum_{i = 1}^{n} p_{i} log p_{i},

where the logarithm here and throughout this paper is of base 2. Note that the entropy of X is a function of the probability distribution of X.

The entropy was originally created by Shannon in [3] as part of their theory of communication, where a data communication system consists of a data source X, a channel and a receiver. The fundamental problem of communication is for the receiver to reliably recover what data was generated by the source, based on the bits it receives through the channel. Shannon proved that the entropy of the source X plays a central role—in their source coding theorem it is shown that the entropy is the mathematical limit on how well the data can be losslessly compressed.

The question then arises: How to compress data that has structures, e.g., data in social networks? In Shannon’s 1953 less known paper [4] he argued for an extension of information theory, where data is considered as observations of a source, to “non-conventional data” (that is, lattices). Indeed, nowadays data appears in various formats and structures (e.g., sequences, expressions, interactions) and in drastically increasing amounts. In many scenarios, data is highly context-dependent and in particular, the structural information and the context information seem to be two conceptually different aspects. Therefore it is desirable to develop novel theory and efficient algorithms for extracting useful information from non-conventional data structures. Roughly speaking, such data consists of structural information, which, might be understood as the “shape” of the data, and context information which should be recognized as data labels.

It is well-known that complex networks (e.g., social networks) admit community structures [5]. That is, users within a group interact with each other more frequently than those outside the group. The stochastic block model (SBM) [6] is a celebrated random graph model that has been widely used to study the community structures in graphs and networks. It provides a good benchmark to evaluate the performance of community detection algorithms and inspires the design of many algorithms for community detection tasks. The theoretical underpinnings of the SBM have been extensively studied and sharp thresholds for exact recovery have been successively established [7,8,9,10,11,12]. We refer readers to [13] for a recent survey, where other interesting and important problems in SBM are also discussed.

In addition to the SBM discussed in [13], there are other angles to study compression of data with graph structures. Asadi et al. [14] investigated data compression on graphs with clusters. Zenil et al. [15] have surveyed information-theoretic methods, in particular Shannon entropy and algorithmic complexity, for characterizing graphs and networks.

1.1. Compression of Graphs

In recent years, graphical data and the network structures supporting them are becoming increasingly common and important in branches of engineering and sciences. To better represent and transmit graphical data, many works consider the problem of compressing the (random) graph up to isomorphism, i.e., compressing the structure of a graph. A graph G contains a finite set V of vertices and a set E of edges each of which connects two vertices. A graph can be represented by a binary matrix (the adjacency matrix) that further can be viewed as a binary sequence. Thus, encoding a labeled graph (that is, all vertices need to be distinguished) is equivalent to encoding the

(\binom{| V |}{2})

-digit binary sequence, given certain probability distribution on all

(\binom{| V |}{2})

possible edges. However, such a string does not reflect internal symmetries that are conveyed by the graph automorphism, and sometimes we are only interested in the local or global structures in the graph, rather than the exact vertex labelings. The structural entropy is defined when the graphs are considered unlabeled, or simply called structures, where the vertices are viewed as undistinguishable. The goal of this natural definition is to capture the information of the structure, and thus provides a fundamental measure in graph/structure compression schemes.

The problem actually has a strong theoretical background. Back to 1984, Turán [16] raised the question of finding an efficient coding method for general unlabeled graphs on n vertices, where a lower bound of

(\binom{n}{2}) - n log n + O (n)

bits is suggested. This lower bound can be seen by the number of unlabeled graphs [17]. The question was later answered by Naor [18] in 1990 who proposed such a representation that is optimal up to the first two leading terms when all unlabeled graphs are equally likely. In a recent paper Kieffer et al. [19] proved a structural complexity of a binary tree. There also have been some heuristic methods for real-world graph compression schemes, see [20,21,22,23,24]. Rather recently, Choi and Szpankowski [25] studied the structural entropy of the Erdős–Rényi random graph

G (n, p)

. They computed the structural entropy given that p is not (very) close to 0 or 1 and also gave a compression scheme that matches their computation. Later, the structural entropy for other randomly generated graphs, e.g., the preferential attachment graphs and web graphs are also studied [26,27,28,29].

However, it is well-known that the Erdős–Rényi model is too simplistic to model real networks, in particular due to its strong homogeneity and absence of community structure. In this paper, we consider the compression of graphical structures of the SBM, which in general model real networks better and circumvent the issues of the ER-model. In summary, our contributions are as follows:

We introduce the partitioned structural entropy which generalizes the structural entropy for unlabeled graphs and we show that it reflects the partition information of the SBM.
We provide an explicit formula for the partitioned structural entropy of the SBM.
We also propose a compression scheme that asymptotically achieves this entropy limit.

Semantic communications are considered as a key component of future generation networks, where a natural problem to consider is how to efficiently extract and transmit the “semantic information”. In the case of graph data, one may view the (partitioned) structures as the information that needs to be abstracted while the concrete labeling information is considered redundant. From this point of view, our result is a step for the study of semantic compression/communication under appropriate contexts.

1.2. Related Works

Finally, we would like to point out that there are some other information metrics defined on graphs. The term “graph entropy” has been defined and used in the history. For example, graph entropy introduced by Kőrner in [30] denotes the number of bits one has to convey to resolve the ambiguity of a vertex in a graph. This notion also turns out to be useful in other areas, including combinatorics. Chromatic entropy introduced in [31] is the lowest entropy of any coloring of a graph. It finds application in zero-error source coding. We remark that the structural entropy we considered is quite different from the Kőrner graph entropy and chromatic entropy.

On the other hand, a concept of graph entropy (also called topological information content of a graph) was introduced by Rashevsky [32] and Trucco [33], and later by Mowshowitz [34,35,36,37,38,39], which is defined as a function of (the structure of) a graph and an equivalence relation defined on its vertices or edges. Such a concept is a measure of the graph itself and does not involve any probability distribution.

2. Preliminaries

2.1. Structural Entropy of Unlabeled Graphs

Now let us formally define the structural entropy given a probability distribution on unlabeled graphs. In this subsection, we use notations borrowed from [25].

Given an integer n, define

G_{n}

as the collection of all n-vertex labeled graphs.

Definition 1

(Entropy of Random Graph). Given an integer n and a random graph

G

distributed over

G_{n}

, the entropy of

G

is defined as:

H_{G} = E [- log P (G)] = - \sum_{G \in G_{n}} P (G) log P (G)

where

P (G) ≜ P (G = G)

is the probability of a graph G in

G_{n}

.

Then the random structure model

S_{n}

associated with the probability distribution

G_{n}

, is defined as the unlabeled version of

G_{n}

. For a given

S \in S_{n}

, the probability of S can be computed as:

P (S) = \sum_{G ≅ S, G \in G_{n}} P (G) .

Here

G ≅ S

means that G and S have the same structure, that is, S is isomorphic to G. Clearly if all isomorphic labeled graphs have the same probability, then for any labeled graph

G ≅ S

, one has:

P (S) = N (S) \cdot P (G)

where

N (S)

stands for the number of different labeled graphs that have the same structure as S.

Definition 2

(Structural Entropy). The structural entropy

H_{S}

of a random graph

G

is defined as the entropy of a random structure

S

associated with

G_{n}

, that is,

H_{S} = E [- log P (S)] = - \sum_{S \in S} P (S) log P (S)

where the sum is over all distinct structures.

The Erdos–Rényi random graph

G (n, p)

, also called the binomial random graph, is a fundamental random graph model, which has n vertices and each pair of vertices is connected with probability p, independent of other pairs. In 2012, Choi and Szpankowski [25] proved the following for the Erdős–Rényi random graphs.

Theorem 1

(Choi and Szpankowski, [25]). For large n and all p satisfying

n^{- 1} ln n ≪ p

and

1 - p ≫ n^{- 1} ln n

, the following holds:

1.: The structural entropy $H_{S}$ of $G (n, p)$ is:

$H_{S} = (\binom{n}{2}) h (p) - log n! + O (\frac{log n}{n^{α}})$

for some $α > 0$ .
2.: For a structure S of n vertices and $ε > 0$

$P (|- \frac{1}{(\binom{n}{2})} log P (S) - h (p) + \frac{log n!}{(\binom{n}{2})}| < ε) > 1 - 2 ε$

where $h (p) = - p log p - (1 - p) log (1 - p)$ is the entropy rate of a binary memoryless source.

Furthermore, they [25] also presented a compression algorithm for unlabeled graphs that asymptotically achieves the structural entropy up to an

O (n)

error term.

2.2. Stochastic Block Model–Our Result

As the ER model is not appropriate to model real networks, the stochastic block model is introduced on the assumption that vertices in a network connect independently but with probability based on their profiles, or equivalently, on their community assignment. For example, in the SBM with two communities and symmetric parameters, also known as the planted bisection model, denoted by

G (n, p, q)

, the vertex set is partitioned into two sets

V_{1}

and

V_{2}

, any pair of vertices inside

V_{1}

or

V_{2}

are connected with probability p and any pair of vertices across the clusters are connected with probability q, and all these connections are independent.

As an illuminating example, consider a context G where there are

n / 2

users and

n / 2

devices, and each pair of users and each pair of devices are connected with probability p, a user and a device is connected with probability q and each of these connections is independent of all other connections. Suppose that we need to compress the information of G. However, in the context it is not appropriate to view G as an unlabeled graph, that is, in addition to the structure information, it is also important to keep the “community” information – the compression also needs to encode the information that who is a user and who is device.

Definition 3

(Partition-respecting isomorphism, Partitioned Unlabeled Graphs). Let

r \leq n

be integers. Suppose V is a set of n vertices and

P = {V_{1}, V_{2}, \dots, V_{r}}

is a partition of V into r parts. The partition-respecting isomorphism, denoted by “

≅_{P}

” is defined as follows. For any two labeled graphs G and

G^{'}

, we write

G ≅_{P} G^{'}

if and only if

G ≅ G^{'}

they are isomorphic via an isomorphism function

ϕ : V \to V

such that

ϕ (V_{i}) = V_{i}

, for

1 \leq i \leq r

. Then

Γ_{P}

is defined as the collection of n-vertex graphs on V where we ignore the labels of vertices inside each

V_{i}

,

1 \leq i \leq r

, namely, the equivalence classes under partition-respecting isomorphism, with respect to

P

.

Note that every labeled graph G corresponds to a unique structure

S \in Γ_{P}

, and we use

G ≅_{P} S

to denote this relation. Furthermore, under the above definition, general unlabeled graphs correspond to the case

r = 1

.

Definition 4

(Partitioned Structural Entropy). Let V be a set of n vertices where

n \in N

. Suppose

P = {V_{1}, V_{2}, \dots, V_{r}}

is a partition of V into r parts and

S_{n}

is a probability distribution over all partitioned unlabeled graphs on n vertices. Then the structural entropy

H_{S}

associated to

S_{n}

is defined by:

H_{S} = E [- log P (S)] = - \sum_{S \in S_{n}} P (S) log P (S) .

In this paper, we extend Theorem 1 to the structural entropy of the stochastic block model with any given number of blocks, and provide a compression algorithm that asymptotically matches this structural entropy. For ease of comprehension, we first give the result for the balanced bipartition case

G (n, p, q)

.

Theorem 2.

Let n be a positive even integer and let

V = V_{1} \cup V_{2}

be a set of n vertices with

| V_{1} | = | V_{2} | = n / 2

. Suppose

G (n, p, q)

is a probability distribution of graphs on V where every edge inside

V_{1}

or

V_{2}

is present with probability p and every edge between

V_{1}

and

V_{2}

is present with probability q, and these edges are mutually independent. For large even n and all p satisfying

n^{- 1} ln n ≪ p, q

and

1 - p ≫ n^{- 1} ln n

, the following holds:

(i): The partitioned structural entropy $H_{S}$ of $G (n, p, q)$ is:

$H_{S} = 2 (\binom{n / 2}{2}) h (p) + \frac{n^{2}}{4} h (q) - 2 log (\frac{n}{2})! + O (\frac{log n}{n^{α}})$

(1)

for some $α > 0$ .
(ii): For a balanced bipartitioned structure S and $ε > 0$

$P (|- \frac{1}{(\binom{n}{2})} log P (S) - \frac{n - 2}{2 n - 2} h (p) - \frac{n}{2 n - 2} h (q) + \frac{2 log (n / 2)!}{(\binom{n}{2})}| < 3 ε) > 1 - 4 ε$

where $h (p) = - p log p - (1 - p) log (1 - p)$ is the entropy rate of a binary memoryless source.

Note that the structural entropy

H_{S}

here is larger than that in Theorem 1 (even if

p = q

), which reflects the fact that the SBM with “a planted (bi-)partition” contains prefixed structures, so has less symmetries than

G (n, p)

, the pure random model (For

G (n, p)

, when it is asymmetric, comparing with the completely labeled graphs, Theorem 1 saves a term as

log n!

; this saving becomes

2 log (n / 2)!

for the planted balanced bipartition case in Theorem 2).

3. Proof of Theorem 2

One key ingredient in the proof of Theorem 1 in [25] is the following lemma on the symmetry of

G (n, p)

. A graph is called asymmetric if its automorphism group does not contain any permutation other than identity; otherwise it is called symmetric.

Lemma 1

(Kim, Sudakov and Vu, 2002). For all p satisfying

n^{- 1} ln n ≪ p

and

1 - p ≫ n^{- 1} ln n

, a random graph

G \in G (n, p)

is symmetric with probability

O (n^{- w})

for any positive constant w.

Proof

(Proof of Theorem 2). Note that every pair of vertices in

V_{1}

or in

V_{2}

should be considered as undistinguishable, but not the pairs of vertices in

V_{1} \times V_{2}

. Recall that we write

G ≅_{P} S

for a graph G and a structure S if S represents the structure of G (with respect to the partition

P

).

Let

G : = G (n, p, q)

. We first compute

H_{G}

. Note that there are

(\binom{n}{2})

possible edges in

G \in G

, and we can view it as a binary sequence of length

(\binom{n}{2})

, where each digit is a Bernoulli random variable. Moreover, for edges inside

V_{1}

or

V_{2}

, the random variable, denoted by

X_{1}

, has expectation p and for edges in

V_{1} \times V_{2}

the random variable, denoted by

X_{2}

, has expectation q. Thus, we have:

\begin{matrix} H_{G} & = - E [log X_{1}^{2 (\binom{n / 2}{2})} X_{2}^{n^{2} / 4}] \\ = - 2 (\binom{n / 2}{2}) E [log X_{1}] - \frac{n^{2}}{4} E [log X_{2}] \\ = 2 (\binom{n / 2}{2}) h (p) + \frac{n^{2}}{4} h (q) . \end{matrix}

Now write

S_{n}

for the probability distribution on V over all partitioned unlabeled graphs inherited from

G

, namely, given

S \in Γ_{P}

,

P (S) = \sum_{G ≅_{P} S} P (G)

. Let

H_{S}

be the partitioned structural entropy of

S_{n}

. Therefore, compared with our goal, it remains to show that:

H_{S} - H_{G} = - 2 log (n / 2)! + O (\frac{log n}{n^{α}}) .

(2)

Note that in

G (n, p, q)

, all labeled graphs

G \in G

such that

G ≅_{P} S

have the same probability

P (G)

. Thus, given a (labeled) graph

G \in G

, we have

P (G) = P (S) / N (S)

, where

S \in S_{n}

is such that

G ≅_{P} S

. So the graph entropy of

G = G (n, p, q)

can be written as:

\begin{matrix} H_{G} & = - \sum_{G \in G} P (G) log P (G) \\ = - \sum_{S \in S_{n}} \sum_{G ≅_{P} S, G \in G} P (G) log P (G) \\ = - \sum_{S \in S_{n}} \sum_{G ≅_{P} S, G \in G} \frac{P (S)}{N (S)} log \frac{P (S)}{N (S)} \\ = - \sum_{S \in S_{n}} P (S) log \frac{P (S)}{N (S)} \\ = H_{S} + \sum_{S \in S} P (S) log N (S) \end{matrix}

(3)

Define

S [W]

be be S restricted on W for

W \in V

. Now we split S into

S_{1}

and

S_{2}

, i.e.,

S_{1} = S [V_{1}]

and

S_{2} = S [V_{2}]

. Write

A u t (S_{i})

for the automorphism group for

S_{i}

, and we naturally have:

N (S) = \frac{(n / 2)! \cdot (n / 2)!}{| A u t (S_{1}) | | A u t (S_{2}) |} .

Combining this with (2) and (3), it remains to show that:

\sum_{S \in S} P (S) log | A u t (S_{1}) | | A u t (S_{2}) | = O (\frac{log n}{n^{α}}) .

In the summation above we only need to focus on S such that either

S_{1}

or

S_{2}

is symmetric, as otherwise

log | A u t (S_{1}) | | A u t (S_{2}) | = log 1 = 0

. By Lemma 1, we conclude that the probability of S restricted on

V_{1}

or

V_{2}

is symmetric is

O (n^{- 1 - α})

for some

α > 0

, and for such S we use the trivial bound

log | A u t (S_{1}) | | A u t (S_{2}) | \leq 2 log (n / 2)! \leq 2 n log n

. This gives us the desired estimate in (i)

\sum_{S \in S} P (S) log | A u t (S_{1}) | | A u t (S_{2}) | \leq 2 n log n \cdot O (n^{- 1 - α}) = O (\frac{log n}{n^{α}}) .

To show (ii), for a set V of n vertices and a balanced bipartition

P = (V_{1}, V_{2})

of V, we define the typical set

T_{ε}^{n}

as the set of structures S on n vertices satisfying:

(a): S is asymmetric on $V_{1}$ and $V_{2}$ , respectively;
(b): $2^{- 2 (\binom{n / 2}{2}) h (p) - \frac{n^{2}}{4} h (q) - (\binom{n}{2}) ε} \leq P (G) \leq 2^{- 2 (\binom{n / 2}{2}) h (p) - \frac{n^{2}}{4} h (q) + (\binom{n}{2}) ε}$ , for $G ≅_{P} S$ .

Denote by

T_{1}^{n}

and

T_{2}^{n}

the sets of structures satisfying the properties (a) and (b), respectively, and thus we have

T_{ε}^{n} = T_{1}^{n} \cap T_{2}^{n}

. Firstly, by the asymmetry of

G (n, p)

(Lemma 1), we conclude that

P (T_{1}^{n}) > 1 - 2 ε

for large n. Secondly, we use a binary sequence of length

(\binom{n}{2})

to represent a (labeled) instance G of

G (n, p, q)

, where the first

(\binom{n / 2}{2})

bits

L_{1}

represent the induced subgraph on

V_{1}

, the next

(\binom{n / 2}{2})

bits

L_{2}

represent the induced subgraph on

V_{2}

, and finally the rest

n^{2} / 4

bits

L_{12}

represent the bipartite graph on

V_{1} \times V_{2}

. Since all edges of G are generated independently, both

L_{1}

and

L_{2}

have in expectation

(\binom{n / 2}{2}) p

1’s and the AEP property of the binary sequences implies that:

2^{- (\binom{n / 2}{2}) h (p) - (\binom{n}{2}) ε} \leq P (G [V_{1}]), P (G [V_{2}]) \leq 2^{- (\binom{n / 2}{2}) h (p) + (\binom{n}{2}) ε}

holds with probability at least

1 - 2 ε

. Similarly,

L_{12}

has in expectation

(n^{2} / 4) q

1’s and the AEP property of the binary sequences gives that with probability at least

1 - ε

,

2^{- \frac{n^{2}}{4} h (q) - (\binom{n}{2}) ε} \leq P (G [V_{1}, V_{2}]) \leq 2^{- \frac{n^{2}}{4} h (q) + (\binom{n}{2}) ε}

Since these edges are independent, we finally conclude that (b) holds with probability at least

1 - 3 ε

. Thus,

P (T_{ε}^{n}) \geq 1 - 4 ε

. Now we can compute

P (S)

for

S \in T_{ε}^{n}

. By (a),

P (S) = (n / 2)! (n / 2)! P (G)

for any

G ≅ S

. Together with (b) and straightforward computation, the assertion of (ii) follows. □

4. SBM Compression Algorithm

Given the computation of the structural entropy, a natural next step is to design efficient compression schemes that are close to or even (asymptotically) achieve this entropy limit. Choi and Szpankowski [25] presented such an algorithm (which they named Szip) for (unlabeled) random graphs, which uses in expectation at most

(\binom{n}{2}) h (p) - n log n + O (n)

bits and asymptotically achieves the structural entropy given in Theorem 1. Roughly speaking, Szip greedily peels off vertices from the graph and (efficiently) store the neighborhood information. This procedure can be simply reversed but the labeling of the recovered graph may be different from the original graph, which is the reason on why a saving of the codeword length is achieved. Refinements and analysis [25] are also provided to achieve the proposed performance.

Here we give an algorithm that optimally compresses SBMs which uses the Szip algorithm as building blocks and matches the structural entropy computation in Theorem 2. The algorithm consists of two stages. It first compresses

S [V_{1}]

and

S [V_{2}]

using Szip and then compresses

S [V_{1}, V_{2}]

using an arithmetic compression algorithm with the help of Szip decoding outputs.

To give a brief description of the compression algorithm, we again use the balanced bipartition

V_{1} \cup V_{2}

as an example. The encoding and decoding procedure of the algorithm is illustrated in Figure 1. The algorithm encodes the observed

S (n, p, q)

into a binary string as follows. It uses Szip as a subroutine to compress

S [V_{1}]

and

S [V_{2}]

into binary sequences

L_{1}

and

L_{2}

. Then, as part of the encoder, we run the Szip decoder on

L_{1}

and

L_{2}

to obtain decoded structures

S^{'} [V_{1}]

and

S^{'} [V_{2}]

, respectively. We then compress

S [V_{1}, V_{2}]

as a labeled bipartite graph under the vertex labeling of

S^{'} [V_{1}]

and

S^{'} [V_{2}]

into

L_{12}

. This “Labeled Encoder" can be done by treating it as a binary sequence of length

n^{2} / 4

and compressing using a standard arithmetic encoder [40,41,42]. The concatenation of Szip algorithms and the arithmetic encoder forms the cascade encoder of our algorithm and obtains the codeword

(L_{1}, L_{2}, L_{12})

. Upon receiving the codeword, we decode them parallelly using Szip decoder and the arithmetic decoder. This completes our algorithm.

The main challenge in the design of our algorithm is how the decoder can retrieve the consistency between the bipartite graph

S [V_{1}, V_{2}]

and the decoded version of

S [V_{1}]

and

S [V_{2}]

. A key observation here is that since Szip is a deterministic algorithm, although it may permute the vertex labelings, its output is an invariant given the same input. Given this, our solution here is to first run Szip (both encoding and decoding) at the encoder, and obtain structures

S^{'} [V_{1}]

and

S^{'} [V_{2}]

, respectively. We then compress

S [V_{1}, V_{2}]

(as a labeled bipartite graph) under the vertex labeling of

S^{'} [V_{1}]

and

S^{'} [V_{2}]

. This would guarantee that the decoded structures

\hat{S} [V_{1}]

,

\hat{S} [V_{2}]

and

\hat{S} [V_{1}, V_{2}]

share the same vertex labeling as

S^{'} [V_{1}]

and

S^{'} [V_{2}]

, namely, S is recovered.

Before discussing the performance of the algorithm, we first describe some useful properties of the arithmetic compression algorithm in the following lemma. We omit the proof of the lemma, which follows from the analysis in [40,41,42] and AEP properties in [1,2].

Lemma 2.

Let L be the codeword length of the arithmetic compression algorithm when compressing a binary sequence with length m and entropy rate h. For large m, the following holds:

(i): The expected codeword length asymptotically achieves the entropy of the message, i.e.,

$E [L] = m h + O (log m) .$

(4)
(ii): For any $ϵ > 0$ ,

$P (| L - E [L] | \leq ϵ log m) \geq 1 - o (1) .$

(5)
(iii): The arithmetic algorithm runs in time $O (m)$ .

The following theorem characterizes the performance of our algorithm. It is immediate from Theorem 2 in [25] (performance of Szip) and 2, we omit the detailed proofs here.

Theorem 3.

Let

V = V_{1} \cup V_{2}

be a set of n vertices and

| V_{1} | = | V_{2} | = n / 2

. Given a partitioned unlabeled graph S on V, let

L (S)

be the codeword length given by our algorithm. For large n, our algorithm runs in time

O (n^{2})

, and satisfies the following:

(i): The algorithm asymptotically achieves the structural entropy in (1) (Note that $(n / 2) log (n / 2)$ $= n log n + O (n)$ .), i.e.,

$E [L (S)] \leq 2 (\binom{n / 2}{2}) h (p) + \frac{n^{2}}{4} h (q) - n log n + O (n) .$
(ii): For any $ϵ > 0$ ,

$P (| L (S) - E [L (S)] | \leq ϵ n log n) \geq 1 - o (1) .$

5. General SBM with $R \geq 2$ Blocks

In previous sections, we discussed the structural entropy of SBM and the compression algorithm that asymptotically achieves this structural entropy for the balanced bipartition case (

r = 2

). The corresponding results in Theorem 2 and 3 can be easily generalized to the general r-partition case. We briefly describe the generalizations below.

5.1. Structural Entropy

Our approach can deal with general SBMs similarly. In a general SBM with

r \geq 2

parts, the transition matrix, an

r \times r

symmetric matrix

P = (p_{i j})

is used to describe the probabilities between and within the communities, where two vertices

u \in V_{i}

and

v \in V_{j}

are connected by an edge with probability

p_{i j} \in [0, 1]

(i and j are not necessarily distinct). We first give the result on the computation of the partitioned structural entropy of SBM.

Theorem 4.

Fix r reals

x_{1}, x_{2}, \dots, x_{r}

in

(0, 1)

whose sum is 1. Let

V = V_{1} \cup V_{2} \cup \dots \cup V_{r}

be a set of n vertices with a partition into r parts such that

| V_{i} | = x_{i} n

. Let S be a partitioned structure on V with transition matrix

P = (p_{i j})

. For large n and all

1 \leq i \leq r

satisfying

n^{- 1} ln n ≪ p_{i, i}

and

1 - p_{i, i} ≫ n^{- 1} ln n

, the following holds:

(i): The r-partitioned structural entropy $H_{S}^{r}$ for $S$ is

$H_{S}^{r} = \sum_{i = 1}^{r} (\binom{x_{i} n}{2}) h (p_{i, i}) + \sum_{1 \leq i < j \leq r} x_{i} x_{j} n^{2} h (p_{i, j}) - \sum_{i = 1}^{r} log (x_{i} n)! + O (\frac{log n}{n^{α}})$

(6)

for some $α > 0$ .
(ii): For $ε > 0$ ,

P (\frac{1}{(\binom{n}{2})} \cdot |- log P (S) - \sum_{i = 1}^{r} (\binom{x_{i} n}{2}) h (p_{i, i}) - \sum_{1 \leq i < j \leq r} x_{i} x_{j} n^{2} h (p_{i, j}) + \sum_{i = 1}^{r} log (x_{i} n)!| < 3 ε) > 1 - 4 ε .

5.2. Compression Algorithm

The compression algorithm for a general r with vertex partition

{V_{1}, V_{2}, \dots, V_{r}}

can be viewed as a union of the compression algorithms for

S [V_{i}]

and

S [V_{i}, V_{j}]

(

i < j \in {1, 2, \dots, r}

). To be more precise, we describe the algorithm as follows. It first compresses all

S [V_{i}]

into

L_{i}

using Szip. Then run the Szip decoder with input

L_{i}

to obtain the decoded structure

S^{'} [V_{i}]

. With the indices of

S^{'} [V_{i}]

,

i = 1, 2, \dots, r

, we can compress

S [V_{1}, V_{2}, \dots, V_{r}]

as a labeled r-partite graph into

L

using an arithmetic encoder. This completes the encoding procedure and gives the codewords

L_{1}, \dots, L_{r}, L

, for which we concatenate together and get the final codeword. The decoding is to simply run the Szip decoders and labeled (arithmetic) decoders parallelly. The correctness of the decoding output can also be argued accordingly.

The performance of the algorithm can be obtained similar to 3 as follows.

Theorem 5.

Fix r reals

x_{1}, x_{2}, \dots, x_{r}

in

(0, 1)

whose sum is 1. Let

V = V_{1} \cup V_{2} \cup \dots \cup V_{r}

be a set of n vertices with a partition into r parts such that

| V_{i} | = x_{i} n

. Given a partitioned unlabeled graph S on V with transition matrix

P = (p_{i j})

, let

L (S)

be the codeword length given by our algorithm. For large n, our algorithm runs in time

O (n^{2})

, and satisfies the following:

(i): The algorithm asymptotically achieves the structural entropy in (6), i.e.,

$E [L (S)] \leq \sum_{i = 1}^{r} (\binom{x_{i} n}{2}) h (p_{i, i}) + \sum_{1 \leq i < j \leq r} x_{i} x_{j} n^{2} h (p_{i, j}) - n log n + O (n) .$
(ii): For any $ϵ > 0$ ,

$P (| L (S) - E [L (S)] | \leq ϵ n log n) \geq 1 - o (1) .$

6. Conclusions

In this paper, we defined the partitioned unlabeled graphs and partitioned structural entropy, which generalize the structural entropy for unlabeled graphs introduced by Choi and Szpankowski [25]. We then computed the partitioned structural entropy for stochastic block models and gave a compression algorithm that asymptotically achieves this structural entropy limit. As mentioned earlier, we believe that in appropriate contexts the structural information of a graph or network can be interpreted as a kind of semantic information, in which case, the communication schemes may benefit from structural compressions which considerably reduce the cost.

Author Contributions

Conceptualization, J.H., W.H. and B.B.; methodology, J.H., T.G. and Q.Z.; software, J.H. and T.G.; validation, J.H., T.G., Q.Z. and B.B.; formal analysis, J.H. and T.G.; investigation, J.H., W.H. and B.B.; resources, W.H.; data curation, T.G.; writing—original draft preparation, J.H., T.G. and Q.Z.; writing—review and editing, J.H., T.G., Q.Z., W.H. and B.B.; visualization, J.H., T.G., W.H. and B.B.; supervision, W.H., B.B. and G.Z.; project administration, W.H., B.B. and G.Z.; funding acquisition, W.H., B.B. and G.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SBM	Stochastic Block Models
Szip	Compression algorithm in [25]
AEP	Asymptotic Equipartition

References

Yeung, R.W. Information Theory and Network Coding; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley-Interscience: Hoboken, NJ, USA, 2006. [Google Scholar]
Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef] [Green Version]
Shannon, C.E. The lattice theory of information. Trans. Ire Prof. Group Inf. Theory 1953, 1, 105–107. [Google Scholar] [CrossRef]
Palla, G.; Derényi, I.; Farkas, I.; Vicsek, T. Uncovering the overlapping community structure of complex networks in nature and society. Nature 2005, 435, 814–818. [Google Scholar] [CrossRef] [Green Version]
Holland, P.W.; Laskey, K.B.; Leinhardt, S. Stochastic blockmodels: First steps. Social Netw. 1983, 5, 109–137. [Google Scholar] [CrossRef]
Abbe, E.; Bandeira, A.S.; Hall, G. Exact recovery in the stochastic block model. IEEE Trans. Inf. Theory 2015, 62, 471–487. [Google Scholar] [CrossRef] [Green Version]
Mossel, E.; Neeman, J.; Sly, A. Consistency thresholds for the planted bisection model. In Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, Portland, OR, USA, 14–17 June 2015; pp. 69–75. [Google Scholar]
Abbe, E.; Sandon, C. Community detection in general stochastic block models: Fundamental limits and efficient algorithms for recovery. In Proceedings of the 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, Berkeley, CA, USA, 17–20 October 2015; pp. 670–688. [Google Scholar]
Hajek, B.; Wu, Y.; Xu, J. Information limits for recovering a hidden community. IEEE Trans. Inf. Theory 2017, 63, 4729–4745. [Google Scholar] [CrossRef]
Bianconi, G. Entropy of network ensembles. Phys. Rev. E 2009, 79, 036114. [Google Scholar] [CrossRef] [Green Version]
Peixoto, T.P. Entropy of stochastic blockmodel ensembles. Phys. Rev. E 2012, 85, 056122. [Google Scholar] [CrossRef] [Green Version]
Abbe, E. Community detection and stochastic block models: Recent developments. J. Mach. Learn. Res. 2017, 18, 6446–6531. [Google Scholar]
Asadi, A.R.; Abbe, E.; Verdú, S. Compressing data on graphs with clusters. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 1583–1587. [Google Scholar] [CrossRef]
Zenil, H.; Kiani, N.A.; Tegnér, J. A Review of Graph and Network Complexity from an Algorithmic Information Perspective. Entropy 2018, 20, 551. [Google Scholar] [CrossRef] [Green Version]
Turán, G. On the succinct representation of graphs. Discr. Appl. Math. 1984, 8, 289–294. [Google Scholar] [CrossRef] [Green Version]
Harary, F.; Palmer, E.M. Graphical Enumeration; Academic Press: Cambridge, MA, USA, 1973. [Google Scholar]
Naor, M. Succinct representation of general unlabeled graphs. Discr. Appl. Math. 1990, 28, 303–307. [Google Scholar] [CrossRef] [Green Version]
Kieffer, J.C.; Yang, E.H.; Szpankowski, W. Structural complexity of random binary trees. In Proceedings of the 2009 IEEE International Symposium on Information Theory (ISIT), Seoul, Korea, 28 June–3 July 2009; pp. 635–639. [Google Scholar]
Adler, M.; Mitzenmacher, M. Towards compressing Web graphs. In Proceedings of the DCC 2001. Data Compression Conference, Snowbird, Utah, 27–29 March 2001; pp. 203–212. [Google Scholar]
Chierichetti, F.; Kumar, R.; Lattanzi, S.; Mitzenmacher, M.; Panconesi, A.; Raghavan, P. On Compressing Social Networks. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 28 June–1 July 2009; pp. 219–228. [Google Scholar]
Peshkin, L. Structure induction by lossless graph compression. In Proceedings of the 2007 Data Compression Conference (DCC’07), Snowbird, Utah, 27–29 March 2007; pp. 53–62. [Google Scholar]
Savari, S.A. Compression of words over a partially commutative alphabet. IEEE Trans. Inf. Theory 2004, 50, 1425–1441. [Google Scholar] [CrossRef]
Sun, J.; Bollt, E.; Ben-Avraham, D. Graph Compression-Save Information by Exploiting Redundancy. J. Statist. Mechan. Theory Exper. 2008, 2008, 06001. [Google Scholar] [CrossRef]
Choi, Y.; Szpankowski, W. Compression of Graphical Structures: Fundamental Limits, Algorithms, and Experiments. IEEE Trans. Inf. Theory 2012, 58, 620–638. [Google Scholar] [CrossRef] [Green Version]
Łuczak, T.; Magner, A.; Szpankowski, W. Compression of Preferential Attachment Graphs. In Proceedings of the 2019 IEEE International Symposium on Information Theory (ISIT), Paris, France, 7–12 July 2019; pp. 1697–1701. [Google Scholar]
Łuczak, T.; Magner, A.; Szpankowski, W. Asymmetry and structural information in preferential attachment graphs. Random Struct. Algorithms 2019, 55, 696–718. [Google Scholar] [CrossRef] [Green Version]
Sauerhoff, M. On the Entropy of Models for the Web Graph. 2016. Available online: https://www.researchgate.net/publication/255589046OntheEntropyofModelsfortheWebGraph (accessed on 1 November 2021).
Kontoyiannis, I.; Lim, Y.H.; Papakonstantinopoulou, K.; Szpankowski, W. Symmetry and the Entropy of Small-World Structures and Graphs. In Proceedings of the 2021 IEEE International Symposium on Information Theory (ISIT), Victoria, Australia, 11–16 July 2021; pp. 3026–3031. [Google Scholar]
Körner, J. Coding of an information source having ambiguous alphabet and the entropy of graphs. In 6th Prague Conference on Information Theory; Walter de Gruyter: Berlin, Germany, 1973; pp. 411–425. [Google Scholar]
Alon, N.; Orlitsky, A. Source coding and graph entropies. IEEE Trans. Inf. Theory 1996, 42, 1329–1339. [Google Scholar] [CrossRef] [Green Version]
Rashevsky, N. Life information theory and topology. Bull. Math. Biophys. 1955, 17, 229–235. [Google Scholar] [CrossRef]
Trucco, E. A note on the information content of graphs. Bull. Math. Biophys. 1956, 18, 129–135. [Google Scholar] [CrossRef]
Mowshowitz, A. Entropy and the complexity of the graphs I: An index of the relative complexity of a graph. Bull. Math. Biophys. 1968, 30, 175–204. [Google Scholar] [CrossRef] [PubMed]
Mowshowitz, A. Entropy and the complexity of graphs II: The information content of digraphs and infinite graphs. Bull. Math. Biophys. 1968, 30, 225–240. [Google Scholar] [CrossRef]
Mowshowitz, A. Entropy and the complexity of graphs III: Graphs with prescribed information content. Bull. Math. Biophys. 1968, 30, 387–414. [Google Scholar] [CrossRef]
Mowshowitz, A. Entropy and the complexity of graphs IV: Entropy measures and graphical structure. Bull. Math. Biophys. 1968, 30, 533–546. [Google Scholar] [CrossRef]
Mowshowitz, A.; Dehmer, M. Entropy and the Complexity of Graphs Revisited. Entropy 2012, 14, 559–570. [Google Scholar] [CrossRef]
Dehmer, M.; Mowshowitz, A. A History of Graph Entropy Measures. Inf. Sci. 2011, 181, 57–78. [Google Scholar] [CrossRef]
Pasco, R.C. Source Coding Algorithms for Fast Data Compression. Ph.D. Thesis, Stanford University, Stanford, CA, USA, 1976. [Google Scholar]
Rissanen, J.J. Generalized Kraft Inequality and Arithmetic Coding. IBM J. Res. Dev. 1976, 20, 198–203. [Google Scholar] [CrossRef]
Willems, F.; Shtarkov, Y.; Tjalkens, T. The context-tree weighting method: Basic properties. IEEE Trans. Inf. Theory 1995, 41, 653–664. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Illustration of compression algorithm.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Han, J.; Guo, T.; Zhou, Q.; Han, W.; Bai, B.; Zhang, G. Structural Entropy of the Stochastic Block Models. Entropy 2022, 24, 81. https://doi.org/10.3390/e24010081

AMA Style

Han J, Guo T, Zhou Q, Han W, Bai B, Zhang G. Structural Entropy of the Stochastic Block Models. Entropy. 2022; 24(1):81. https://doi.org/10.3390/e24010081

Chicago/Turabian Style

Han, Jie, Tao Guo, Qiaoqiao Zhou, Wei Han, Bo Bai, and Gong Zhang. 2022. "Structural Entropy of the Stochastic Block Models" Entropy 24, no. 1: 81. https://doi.org/10.3390/e24010081

APA Style

Han, J., Guo, T., Zhou, Q., Han, W., Bai, B., & Zhang, G. (2022). Structural Entropy of the Stochastic Block Models. Entropy, 24(1), 81. https://doi.org/10.3390/e24010081

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Structural Entropy of the Stochastic Block Models

Abstract

1. Introduction

1.1. Compression of Graphs

1.2. Related Works

2. Preliminaries

2.1. Structural Entropy of Unlabeled Graphs

2.2. Stochastic Block Model–Our Result

3. Proof of Theorem 2

4. SBM Compression Algorithm

5. General SBM with $R \geq 2$ Blocks

5.1. Structural Entropy

5.2. Compression Algorithm

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Structural Entropy of the Stochastic Block Models

Abstract

1. Introduction

1.1. Compression of Graphs

1.2. Related Works

2. Preliminaries

2.1. Structural Entropy of Unlabeled Graphs

2.2. Stochastic Block Model–Our Result

3. Proof of Theorem 2

4. SBM Compression Algorithm

5. General SBM with R ≥ 2 Blocks

5.1. Structural Entropy

5.2. Compression Algorithm

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

5. General SBM with $R \geq 2$ Blocks