# NISQ-Ready Community Detection Based on Separation-Node Identification

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

**C**ommunity

**D**etection based on

**S**eparation-

**N**ode identification (CDSN). This approach is specialized for (quantum heuristic) QUBO solving that uses a smaller search space than the state-of-the-art quantum modularity maximization approach [13]. This objective led to the sociologically inspired approach of defining a community by its extreme ends, similar to, e.g., differentiating political parties by their position on the left–right spectrum. For graphs, we translate this idea to the existence of what we later define as a bijective set of separation nodes. The removal of the nodes contained in this set then yields connected components, which represent the “cores” of the communities. We subsequently conduct experiments that indicate that this essentially solves the computationally hard part of the community detection problem, as the community assignment for the separation nodes can typically be obtained using a greedy optimizer.

## 2. Background

- Randomly split the given graph $G=\left(V,E\right)$ into two equally sized partitions $A\dot{\cup}B=V$ and delete all edges inside the partitions to yield a bipartite graph.
- Find subsets $X\subseteq A$ and $Y\subseteq B$ such that $X=\left\{{v}_{i}\in A\mid {s}_{i}=1\right\}$ and$Y=\left\{{v}_{j}\in B\mid {s}_{j}=1\right\}$ where $s=\left({s}_{1},\dots ,{s}_{\left|V\right|}\right)$ is the solution to the quadratic program given by$$\underset{s\in {\left\{0,1\right\}}^{\left|V\right|}}{arg\; min}\sum _{\begin{array}{c}{v}_{i}\in A\\ {v}_{j}\in B\end{array}}\left(d(A,B)-{a}_{ij}\right){s}_{i}{s}_{j}.$$
- Identify $C:=X\cup Y$ to be a community and repeat Steps 1 and 2 for the subgraph induced on G by $V\setminus C:=\left\{v\in V\mid v\notin C\right\}$.

## 3. Proposed Model

#### 3.1. Separation-Node Sets

- (1)
- Identifying a set of nodes separating communities and thus revealing the fundamental community structure (see Section 3.4 and Section 3.5).
- (2)
- Classifying the community of each separation node to finalize community detection (see Section 3.6).

**Definition**

**1.**

**Theorem**

**1.**

**Proof.**

#### 3.2. Proving Theorem 1

**Lemma**

**1.**

**Proof.**

**Lemma**

**2.**

**Proof.**

**Lemma**

**3.**

**Proof.**

**Corollary**

**1.**

**Proof.**

**Proof.**

- $P\left(\tilde{x}\right)=0$ and the separation-node set $\tilde{S}$ is smaller than S;
- $P\left(\tilde{x}\right)>0$ and the separation-node set $\tilde{S}$ is much smaller than S.

- $\left|{S}^{*}\right|<\left|S\right|$;
- $\left|{S}^{*}\right|>\left|S\right|$.

#### 3.3. Constructing Penalty Terms for the In- and Surjectivity Constraints

**Lemma**

**4.**

**Proof.**

**Lemma**

**5.**

**Proof.**

**Lemma**

**6.**

**Proof.**

#### 3.4. Modularity-Based Separation Edge Estimation

- ${m}_{ij}>0$, if less connectivity between ${v}_{i}$ and ${v}_{j}$ was to be expected, indicating that ${v}_{i}$ and ${v}_{j}$ likely belong to the same community;
- ${m}_{ij}<0$, if more connectivity between ${v}_{i}$ and ${v}_{j}$ was to be expected, indicating that ${v}_{i}$ and ${v}_{j}$ likely belong to different communities.

#### 3.5. Separation Edge Estimation Based on Edge Neighborhood Connectivity

- (1)
- Consider connections between r-neighborhoods with radius $r\ge 0$;
- (2)
- Consider paths of length 2.

#### 3.6. Assigning the Separation Nodes to Communities

- (1)
- Count the number of edges to every identified community for each separation node.
- (2)
- Assign the node with the most edges to a single community to that community. In case of a tie, the community that reached the highest number of edges first during the iteration over all adjacent nodes is selected.
- (3)
- Update the counts for every neighboring separation node.
- (4)
- Repeat steps two and three until every separation node is properly assigned to a community.

## 4. Evaluation

- (1)
- The assignment of separation nodes to their communities is computationally easy given a good enough estimator, i.e., it can be executed accurately in linear runtime with respect to the number of communities for each separation node.
- (2)
- Neighborhood connectivity constitutes a suitable estimator for separation edges, i.e., it can be employed to identify an adequate separation-node set in the here-proposed approach to conduct community detection in practice.

- Modularity. For the comparability between different datasets, we use the approximation ratio based on the best known solution. This yields values between 0 (bad) and 1 (good).
- NMI score. The NMI score is used to compare the community assignments with known ground truth. It yields values between 0 (bad) and 1 (good).
- ${R}^{2}$ score. This score is used to estimate predictive performance of the separation edge classification. It yields values between 0 (bad) and 1 (good).

#### 4.1. Evidence That Separation-Node Assignment Is Computationally Cheap

#### 4.2. Neighborhood Connectivity Constitutes a Suitable Estimator for Separation Edges

#### 4.3. Evaluating the Performance of Edge Neighborhood Connectivity

**Figure 5.**This box plot displays the fraction of the achieved modularity score by the best known solution for selected standard benchmark datasets: (1) the social network of a karate club [46], (2) the social interactions between dolphins [47], (3) the collectively appearing characters in the book “Les Miserables” [48], (4) protein–protein interactions [49] and (5) jointly bought political books [50]. Each graph was analyzed 10 times using simulated annealing. Our approach clearly does not work well for the karate club network. Closer inspections yield the result signifying that the connected components resulting from the found separation-node sets often only consist of single nodes, indicating suboptimality in using neighborhood connectivity for this relatively small dataset.

#### 4.4. Evaluating the Separation-Node Assignment

**Figure 7.**${R}^{2}$ score of the edge-neighborhood-connectivity-based separation edge estimator. In practice, an ${R}^{2}$ score of 30% implies that merely 30% of the variability of the ground truth has been accounted for. A strict trend towards worse results for harder datasets is clearly visible. This shows that the performance of the estimator decreases for harder problem instances as to be expected while still yielding somewhat accurate results.

## 5. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Bondy, J.A.; Murty, U.S.R. Graph Theory with Applications; Elsevier: New York, NY, USA, 1976. [Google Scholar]
- Mashaghi, A.R.; Ramezanpour, A.; Karimipour, V. Investigation of a protein complex network. Eur. Phys. J. B Condens. Matter Complex Syst.
**2004**, 41, 113–121. [Google Scholar] [CrossRef] [Green Version] - Shah, P.; Ashourvan, A.; Mikhail, F.; Pines, A.; Kini, L.; Oechsel, K.; Das, S.R.; Stein, J.M.; Shinohara, R.T.; Bassett, D.S.; et al. Characterizing the role of the structural connectome in seizure dynamics. Brain
**2019**, 142, 1955–1972. [Google Scholar] [CrossRef] [PubMed] - Fortunato, S. Community detection in graphs. Phys. Rep.
**2010**, 486, 75–174. [Google Scholar] [CrossRef] [Green Version] - Girvan, M.; Newman, M.E.J. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA
**2002**, 99, 7821–7826. [Google Scholar] [CrossRef] - Fani, H.; Bagheri, E. Community detection in social networks. Encycl. Semant. Comput. Robot. Intell.
**2017**, 1, 1630001. [Google Scholar] [CrossRef] - Vilenchik, D. Simple Statistics Are Sometime Too Simple: A Case Study in Social Media Data. IEEE Trans. Knowl. Data Eng.
**2020**, 32, 402–408. [Google Scholar] [CrossRef] - Nadakuditi, R.R.; Newman, M.E.J. Graph Spectra and the Detectability of Community Structure in Networks. Phys. Rev. Lett.
**2012**, 108, 188701. [Google Scholar] [CrossRef] [Green Version] - Brandes, U.; Delling, D.; Gaertler, M.; Goerke, R.; Hoefer, M.; Nikoloski, Z.; Wagner, D. Maximizing Modularity is hard. arXiv
**2006**, arXiv:physics/0608255. [Google Scholar] [CrossRef] - Decelle, A.; Krzakala, F.; Moore, C.; Zdeborová, L. Inference and Phase Transitions in the Detection of Modules in Sparse Networks. Phys. Rev. Lett.
**2011**, 107, 065701. [Google Scholar] [CrossRef] [Green Version] - Newman, M.E.J. Equivalence between modularity optimization and maximum likelihood methods for community detection. Phys. Rev. E
**2016**, 94, 052315. [Google Scholar] [CrossRef] [Green Version] - Arute, F.; Arya, K.; Babbush, R.; Bacon, D.; Bardin, J.C.; Barends, R.; Biswas, R.; Boixo, S.; Brandao, F.G.S.L.; Buell, D.A.; et al. Quantum supremacy using a programmable superconducting processor. Nature
**2019**, 574, 505–510. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Shaydulin, R.; Ushijima-Mwesigwa, H.; Safro, I.; Mniszewski, S.; Alexeev, Y. Network Community Detection on Small Quantum Computers. Adv. Quantum Technol.
**2019**, 2, 1900029. [Google Scholar] [CrossRef] [Green Version] - Denchev, V.S.; Boixo, S.; Isakov, S.V.; Ding, N.; Babbush, R.; Smelyanskiy, V.; Martinis, J.; Neven, H. What is the Computational Value of Finite-Range Tunneling? Phys. Rev. X
**2016**, 6, 031015. [Google Scholar] [CrossRef] [Green Version] - Albash, T.; Lidar, D.A. Demonstration of a Scaling Advantage for a Quantum Annealer over Simulated Annealing. Phys. Rev. X
**2018**, 8, 031016. [Google Scholar] [CrossRef] [Green Version] - Grover, L.K. A fast quantum mechanical algorithm for database search. In Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing—STOC’96, Association for Computing Machinery, Philadelphia, PA, USA, 22–24 May 1996; pp. 212–219. [Google Scholar] [CrossRef] [Green Version]
- Shor, P.W. Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer. SIAM J. Comput.
**1997**, 26, 1484–1509. [Google Scholar] [CrossRef] [Green Version] - Lloyd, S. Universal Quantum Simulators. Science
**1996**, 273, 1073–1078. [Google Scholar] [CrossRef] - Ushijima-Mwesigwa, H.; Negre, C.F.A.; Mniszewski, S.M. Graph Partitioning Using Quantum Annealing on the D-Wave System. In Proceedings of the Second International Workshop on Post Moores Era Supercomputing, Denver, CO, USA, 12–17 November 2017; PMES’17, Association for Computing Machinery: New York, NY, USA, 2017; pp. 22–29. [Google Scholar] [CrossRef] [Green Version]
- Newman, M.E.J.; Girvan, M. Finding and evaluating community structure in networks. Phys. Rev. E
**2004**, 69, 026113. [Google Scholar] [CrossRef] [Green Version] - Kadowaki, T.; Nishimori, H. Quantum annealing in the transverse Ising model. Phys. Rev. E
**1998**, 58, 5355–5363. [Google Scholar] [CrossRef] [Green Version] - Preskill, J. Quantum Computing in the NISQ era and beyond. Quantum
**2018**, 2, 79. [Google Scholar] [CrossRef] - Dalyac, C.; Henriet, L.; Jeandel, E.; Lechner, W.; Perdrix, S.; Porcheron, M.; Veshchezerova, M. Qualifying quantum approaches for hard industrial optimization problems. A case study in the field of smart-charging of electric vehicles. EPJ Quantum Technol.
**2021**, 8, 12. [Google Scholar] [CrossRef] - Akbar, S.; Saritha, S.K. Towards quantum computing based community detection. Comput. Sci. Rev.
**2020**, 38, 100313. [Google Scholar] [CrossRef] - Zahedinejad, E.; Crawford, D.; Adolphs, C.; Oberoi, J.S. Multiple Global Community Detection in Signed Graphs. In Proceedings of the Future Technologies Conference (FTC) 2019; Arai, K., Bhatia, R., Kapoor, S., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 688–707. [Google Scholar]
- Sedghpour, A.S.; Nikanjam, A. Overlapping Community Detection in Social Networks Using a Quantum-Based Genetic Algorithm. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, Denver, CO, USA, 12–17 November 2017; GECCO ’17, Association for Computing Machinery: New York, NY, USA, 2017; pp. 197–198. [Google Scholar] [CrossRef]
- Mukai, K.; Hatano, N. Discrete-time quantum walk on complex networks for community detection. Phys. Rev. Res.
**2020**, 2, 023378. [Google Scholar] [CrossRef] - Reittu, H.; Kotovirta, V.; Leskelä, L.; Rummukainen, H.; Räty, T. Towards analyzing large graphs with quantum annealing. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; pp. 2457–2464. [Google Scholar] [CrossRef]
- Chan, E.Y.K.; Yeung, D.Y. A Convex Formulation of Modularity Maximization for Community Detection. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, IJCAI’11, Barcelona, Spain, 16–22 July 2011; AAAI Press: Menlo Park, CA, USA, 2011; Volume 3, pp. 2218–2225. [Google Scholar]
- Chen, Y.; Li, X.; Xu, J. Convexified Modularity Maximization for Degree-Corrected Stochastic Block Models. Ann. Stat.
**2018**, 46, 1573–1602. [Google Scholar] [CrossRef] [Green Version] - Abdalla, P.; Bandeira, A.S. Community detection with a subsampled semidefinite program. Sampl. Theory Signal Process. Data Anal.
**2022**, 20, 6. [Google Scholar] [CrossRef] - Li, W. Visualizing network communities with a semi-definite programming method. Security and privacy information technologies and applications for wireless pervasive computing environments. Inf. Sci.
**2015**, 321, 1–13. [Google Scholar] [CrossRef] - Brandes, U. A faster algorithm for betweenness centrality. J. Math. Sociol.
**2001**, 25, 163–177. [Google Scholar] [CrossRef] - Negre, C.F.; Ushijima-Mwesigwa, H.; Mniszewski, S.M. Detecting multiple communities using quantum annealing on the D-Wave system. PLoS ONE
**2020**, 15, 227538. [Google Scholar] [CrossRef] [Green Version] - Chapuis, G.; Djidjev, H.; Hahn, G.; Rizk, G. Finding Maximum Cliques on the D-Wave Quantum Annealer. J. Signal Process. Syst.
**2019**, 91, 363–377. [Google Scholar] [CrossRef] [Green Version] - Rosenberg, I.G. Reduction of bivalent maximization to the quadratic case. Cah. Cent. D’Etudes Rech. Oper.
**1975**, 17, 71–74. [Google Scholar] - Stein, J.; Chamanian, F.; Zorn, M.; Nüßlein, J.; Zielinski, S.; Kölle, M.; Linnhoff-Popien, C. Evidence that PUBO outperforms QUBO when solving continuous optimization problems with the QAOA. arXiv
**2023**, arXiv:2305.03390. [Google Scholar] - Thorndike, R.L. Who belongs in the family? Psychometrika
**1953**, 18, 267–276. [Google Scholar] [CrossRef] - Sedgewick, R. Algorithms in c, Part 5: Graph Algorithms, 3rd ed.; Addison-Wesley Professional: Hoboken, NJ, USA, 2001. [Google Scholar]
- Van Der Hofstad, R. Random Graphs and Complex Networks; Cambridge University Press: Cambridge, UK, 2009; Volume 11, p. 60. Available online: https://www.win.tue.nl/~rhofstad/NotesRGCN.pdf (accessed on 23 July 2023).
- Amin, M.H.; Andriyash, E.; Rolfe, J.; Kulchytskyy, B.; Melko, R. Quantum Boltzmann Machine. Phys. Rev. X
**2018**, 8, 021050. [Google Scholar] [CrossRef] [Green Version] - Holland, P.W.; Laskey, K.B.; Leinhardt, S. Stochastic blockmodels: First steps. Soc. Netw.
**1983**, 5, 109–137. [Google Scholar] [CrossRef] - Fred, A.L.N.; Jain, A.K. Robust data clustering. In Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Madison, WI, USA, 27 June–2 July 2004; Volume 2. [Google Scholar]
- Kuncheva, L.; Hadjitodorov, S. Using diversity in cluster ensembles. In Proceedings of the 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583), Hague, The Netherlands, 10–13 October 2004; Volume 2, pp. 1214–1219. [Google Scholar] [CrossRef]
- Danon, L.; Díaz-Guilera, A.; Duch, J.; Arenas, A. Comparing community structure identification. J. Stat. Mech. Theory Exp.
**2005**, 2005, P09008. [Google Scholar] [CrossRef] [Green Version] - Zachary, W.W. An Information Flow Model for Conflict and Fission in Small Groups. J. Anthropol. Res.
**1977**, 33, 452–473. [Google Scholar] [CrossRef] [Green Version] - Lusseau, D.; Schneider, K.; Boisseau, O.J.; Haase, P.; Slooten, E.; Dawson, S.M. The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations. Behav. Ecol. Sociobiol.
**2003**, 54, 396–405. [Google Scholar] [CrossRef] - Knuth, D.E. The Stanford GraphBase: A Platform for Combinatorial Algorithms. In Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, Society for Industrial and Applied Mathematics, SODA ’93, Austin, TX, USA, 25–27 January 1993; pp. 41–43. [Google Scholar]
- Palla, G.; Derényi, I.; Farkas, I.; Vicsek, T. Uncovering the overlapping community structure of complex networks in nature and society. Nature
**2005**, 435, 814–818. [Google Scholar] [CrossRef] [Green Version] - Newman, M.E.J. Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA
**2006**, 103, 8577–8582. [Google Scholar] [CrossRef] - Li, Y.; Li, W.; Tan, Y.; Liu, F.; Cao, Y.; Lee, K.Y. Hierarchical Decomposition for Betweenness Centrality Measure of Complex Networks. Sci. Rep.
**2017**, 7, 46491. [Google Scholar] [CrossRef] [Green Version]

**Figure 1.**Outline of the workflow for the proposed approach of community detection via separation-node identification. The computationally expensive tasks of identifying a set of separation nodes (

**b**) and classifying the communities for these nodes (

**d**) are performed using quantum computing, while the computationally cheap tasks of removing the classified separation-nodes and identifying the resulting connected components (

**c**) are performed classically.

**Figure 2.**Counterexample proving no-free-lunch when using Theorem 1 to find surjective separation-node sets.

**Figure 3.**Counterexample indicating no-free-lunch when using Theorem 1 to find injective separation-node sets.

**Figure 4.**This figure shows the Normalized Mutual Information (NMI) score of the presented approach for 50 different graphs each based on ground truth and a perfect separation edge estimator coupled with the greedy separation-node assignment. The NMI score as defined in [43,44] was used, as it resembles a well-proven measure for the accuracy of a community given the ground truth [45]. The different probabilities for intra-community edges in the chosen SBM model resemble different difficulties according to the phase transition known for this model. The lower the stated probability, the harder the problem. The probabilities were chosen such that the hardest graphs barely differed from a null model inheriting no measurable structure up to the hardest that still allowed perfect NMI scores. For this dataset, the phase transition can be calculated to be at a probability of 0.2865 for the intra-community edges. As modularity maximization has been shown to perform very well up until the sharp phase transition (which is not reached here), the constantly good results for the SA based approach appear to be reasonable. Additional tests show a sharp performance drop off to NMI values at around 0.5 for smaller intra-probabilities such as 0.23.

**Figure 6.**The y-axis depicts the deviation factor from the best-known separation-node set in size. Notably, the absolute sizes of the identified separation-node sets are typically similar over the different difficulties, while they rise slightly for larger graphs.

**Figure 8.**This figure depicts the normalized mutual information score of the selected SBM benchmark graphs using the greedy assignment of separation nodes to communities. A substantial drop off in performance can be observed for the harder datasets. Meanwhile, as all problem instances are significantly above the phase transition for modularity maximization in these datasets (an intra-prob of 0.2865), our classical baseline easily identifies close to optimal solutions. Notably, however, it is promisingly slightly outperformed by our approach in the case of the easiest dataset.

**Figure 9.**This figure depicts the normalized mutual information score of the selected SBM benchmark graph using a simulated annealing-based approach of assigning the separation nodes to communities. The worse performance for the easy dataset clearly indicates that the chosen simulated annealing approach based on the QUBO as described in Section 3.6 is suboptimal in general.

**Table 1.**Summary of all employed datasets. Note that five different SBM graphs were utilized, each with the same number of nodes and communities, but with varying probabilites of edges being inside communities. For details on all datasets, see the following sections.

No. of Nodes | No. of Communities | Intra Prob | |
---|---|---|---|

SBM graphs | 250 | 7 | $[0.75,0.625,0.5,0.4,0.3]$ |

Karate Club | 24 | 2 | cannot be specified |

Dolphins | 62 | 4 | cannot be specified |

Miserables | 77 | 5 | cannot be specified |

Protein | 83 | 9 | cannot be specified |

Books | 105 | 3 | cannot be specified |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Stein, J.; Ott, D.; Nüßlein, J.; Bucher, D.; Schönfeld, M.; Feld, S.
NISQ-Ready Community Detection Based on Separation-Node Identification. *Mathematics* **2023**, *11*, 3323.
https://doi.org/10.3390/math11153323

**AMA Style**

Stein J, Ott D, Nüßlein J, Bucher D, Schönfeld M, Feld S.
NISQ-Ready Community Detection Based on Separation-Node Identification. *Mathematics*. 2023; 11(15):3323.
https://doi.org/10.3390/math11153323

**Chicago/Turabian Style**

Stein, Jonas, Dominik Ott, Jonas Nüßlein, David Bucher, Mirco Schönfeld, and Sebastian Feld.
2023. "NISQ-Ready Community Detection Based on Separation-Node Identification" *Mathematics* 11, no. 15: 3323.
https://doi.org/10.3390/math11153323