# Structural Entropy of the Stochastic Block Models

## Abstract

## 1. Introduction

#### 1.1. Compression of Graphs

- We introduce the partitioned structural entropy which generalizes the structural entropy for unlabeled graphs and we show that it reflects the partition information of the SBM.
- We provide an explicit formula for the partitioned structural entropy of the SBM.
- We also propose a compression scheme that asymptotically achieves this entropy limit.

#### 1.2. Related Works

## 2. Preliminaries

#### 2.1. Structural Entropy of Unlabeled Graphs

**Theorem**

- The structural entropy${H}_{S}$ of $\mathcal{G}(n,p)$ is:$${H}_{S}=\left(\genfrac{}{}{0pt}{}{n}{2}\right)h\left(p\right)-logn!+O\left(\frac{logn}{{n}^{\alpha}}\right)$$for some$\alpha >0$.
- For a structure S of n vertices and$\epsilon >0$$$P\left(\left|-\frac{1}{\left(\genfrac{}{}{0pt}{}{n}{2}\right)}logP\left(S\right)-h\left(p\right)+\frac{logn!}{\left(\genfrac{}{}{0pt}{}{n}{2}\right)}\right|<\epsilon \right)>1-2\epsilon $$

#### 2.2. Stochastic Block Model–Our Result

**Definition**

**Theorem**

- The partitioned structural entropy${H}_{S}$ of $\mathcal{G}(n,p,q)$ is:$${H}_{S}=2\left(\genfrac{}{}{0pt}{}{n/2}{2}\right)h\left(p\right)+\frac{{n}^{2}}{4}h\left(q\right)-2log\left(\frac{n}{2}\right)!+O\left(\frac{logn}{{n}^{\alpha}}\right)$$for some$\alpha >0$.
- For a balanced bipartitioned structure S and$\epsilon >0$$$P\left(\left|-\frac{1}{\left(\genfrac{}{}{0pt}{}{n}{2}\right)}logP\left(S\right)-\frac{n-2}{2n-2}h\left(p\right)-\frac{n}{2n-2}h\left(q\right)+\frac{2log(n/2)!}{\left(\genfrac{}{}{0pt}{}{n}{2}\right)}\right|<3\epsilon \right)>1-4\epsilon $$

## 3. Proof of Theorem 2

**Proof**

- S is asymmetric on ${V}_{1}$ and ${V}_{2}$, respectively;
- ${2}^{-2\left(\genfrac{}{}{0pt}{}{n/2}{2}\right)h\left(p\right)-{\textstyle \frac{{n}^{2}}{4}}h\left(q\right)-\left(\genfrac{}{}{0pt}{}{n}{2}\right)\epsilon}\le P\left(G\right)\le {2}^{-2\left(\genfrac{}{}{0pt}{}{n/2}{2}\right)h\left(p\right)-{\textstyle \frac{{n}^{2}}{4}}h\left(q\right)+\left(\genfrac{}{}{0pt}{}{n}{2}\right)\epsilon}$, for $G{\cong}_{\mathcal{P}}S$.

## 4. SBM Compression Algorithm

- The expected codeword length asymptotically achieves the entropy of the message, i.e.,$$\mathbb{E}\left[L\right]=mh+O(logm).$$
- For any $\u03f5>0$,$$P\left(\right|L-\mathbb{E}\left[L\right]|\le \u03f5logm)\ge 1-o\left(1\right).$$
- The arithmetic algorithm runs in time $O\left(m\right)$.

- The algorithm asymptotically achieves the structural entropy in (1) (Note that $(n/2)log(n/2)$$=nlogn+O\left(n\right)$.), i.e.,$$\mathbb{E}\left[L\left(S\right)\right]\le 2\left(\genfrac{}{}{0pt}{}{n/2}{2}\right)h\left(p\right)+\frac{{n}^{2}}{4}h\left(q\right)-nlogn+O\left(n\right).$$
- For any $\u03f5>0$,$$P\left(\right|L\left(S\right)-\mathbb{E}\left[L\right(S\left)\right]|\le \u03f5nlogn)\ge 1-o\left(1\right).$$

## 5. General SBM with $\mathbf{R}\ge \mathbf{2}$ Blocks

#### 5.1. Structural Entropy

- The r-partitioned structural entropy ${H}_{S}^{r}$ for $S$ is$${H}_{S}^{r}=\sum _{i=1}^{r}\left(\genfrac{}{}{0pt}{}{{x}_{i}n}{2}\right)h\left({p}_{i,i}\right)+\sum _{1\le i<j\le r}{x}_{i}{x}_{j}{n}^{2}h\left({p}_{i,j}\right)-\sum _{i=1}^{r}log\left({x}_{i}n\right)!+O\left(\frac{logn}{{n}^{\alpha}}\right)$$for some $\alpha >0$.
- For $\epsilon >0$,

#### 5.2. Compression Algorithm

- The algorithm asymptotically achieves the structural entropy in (6), i.e.,$$\mathbb{E}\left[L\left(S\right)\right]\le \sum _{i=1}^{r}\left(\genfrac{}{}{0pt}{}{{x}_{i}n}{2}\right)h\left({p}_{i,i}\right)+\sum _{1\le i<j\le r}{x}_{i}{x}_{j}{n}^{2}h\left({p}_{i,j}\right)-nlogn+O\left(n\right).$$
- For any $\u03f5>0$,$$P\left(\right|L\left(S\right)-\mathbb{E}\left[L\right(S\left)\right]|\le \u03f5nlogn)\ge 1-o\left(1\right).$$

## 6. Conclusions

## Abbreviations

SBM | Stochastic Block Models |

Szip | Compression algorithm in [25] |

AEP | Asymptotic Equipartition |

