NISQ-ready community detection based on separation-node identification

The analysis of network structure is essential to many scientific areas, ranging from biology to sociology. As the computational task of clustering these networks into partitions, i.e., solving the community detection problem, is generally NP-hard, heuristic solutions are indispensable. The exploration of expedient heuristics has led to the development of particularly promising approaches in the emerging technology of quantum computing. Motivated by the substantial hardware demands for all established quantum community detection approaches, we introduce a novel QUBO based approach that only needs number-of-nodes many qubits and is represented by a QUBO-matrix as sparse as the input graph's adjacency matrix. The substantial improvement on the sparsity of the QUBO-matrix, which is typically very dense in related work, is achieved through the novel concept of separation-nodes. Instead of assigning every node to a community directly, this approach relies on the identification of a separation-node set, which -- upon its removal from the graph -- yields a set of connected components, representing the core components of the communities. Employing a greedy heuristic to assign the nodes from the separation-node sets to the identified community cores, subsequent experimental results yield a proof of concept. This work hence displays a promising approach to NISQ ready quantum community detection, catalyzing the application of quantum computers for the network structure analysis of large scale, real world problem instances.


I. INTRODUCTION
In the era of digitization, the amount of collected data is rising rapidly.This poses substantial problems in data analysis as the algorithms employed there typically have superlinear and thus deficient runtime for many relevant datasets.In this work, we investigate a new approach to cope with this problem in the domain of graph structure analysis.Graphs are one of the central data structures used in information theory and find application in a vast range of scientific disciplines [1], [2], [3].The task of identifying the inherit structure of a graph is known as community detection [4].In practice, the use of corresponding clustering methods allows for the discovery of structural information from real world networks in domains ranging from social science to biology [5], [6].
Although no exact definition has been agreed upon, a graph is typically said to inherit a community structure if it can be partitioned in a way such that the number of edges within the partitions is higher than the number of edges between the partitions [5].While some approaches exist that can provably find existing community structures, all of them are NP-hard [7], [8], [9], [10].This indicates a general NP-hardness of community detection and hence poses a demand for efficient heuristics to acquire solutions in reasonable time.Motivated by recent advancements and promising results in solving NP-hard problems in the field of quantum computing (QC) [11], [12], [13], [14], we investigate possible advantages in building such heuristics by utilizing the more powerful algorithmic toolset available in QC.
In general, quantum computers allow for the application of quantum mechanical effects to do computation.Based on the concepts of superposition and entanglement, quantum computers can solve many computational problems provably faster than classical computers [15], [16], [17].In the case of community detection, related work has shown promising results using the popular modularity maximization approach [12], [18].Modularity is a measure for the quality of a given partitioning based on comparing the edge distribution of the given graph to the edge distribution of a graph with the same node degree but inheriting no community structure [19].The more these distributions differ, the higher the modularity indicating a clearer community structure.While this approach is provably optimal in the sense that no other approach could detect a community structure when modularity maximization cannot [7], its implementation on a quantum computer is cumbersome, especially for current quantum computers.
Present implementations of modularity maximization on quantum computers make use of the intrinsic quadratic nature of the modularity [12], [18].Simulating the time evolution of a specific quantum physical system, i.e., typically the transverse field Ising Model under adiabatic time evolution, a quantum heuristic solver for quadratic unconstraint binary optimization (QUBO) problems (e.g., modularity maximization) can be implemented on a quantum computer [12], [20].Even though no quantum speedups where proven for solving NP-hard optimization problems with this approach jet, many cases of potential scaling advantages have been identified, with modularity maximization being one of them [12], [13], [14].
A critical limitation of the established quantum modularity maximization approach hindering its execution on near term quantum hardware is the size of the search space in optimiza-tion.Scaling linearly in the number of nodes and the number of communities, the required amount of quantum bits (qubits) needed for representing a specific solution quickly exceeds the number of qubits available in present noisy intermediate scale quantum (NISQ) hardware [21].
Motivated by these results, we develop a novel approach to community detection, specialized for (quantum heuristic) QUBO solving that uses a smaller search space than the state-of-the-art quantum modularity maximization approach.This objective led to the sociologically inspired approach of defining a community by its extreme ends, similar to, e.g., differentiating political parties by their position on the left-right spectrum.For graphs, we translate this idea to the existence of, what we will later define as, a bijective set of separation nodes.The removal of the nodes contained in this set then yields connected components, which represent the "cores" of the communities.We subsequently conduct experiments, that indicate that this essentially solves the hard part of the community detection problem, as the community assignment for the separation nodes can typically be obtained using a greedy optimizer.
This idea allows for a quantum-classical hybrid detection of communities while merely using one qubit for every node in the graph with a single call to a QUBO solver.We show empirically that such a set of separation nodes can be found for graphs inheriting community structure and introduce a quantum heuristic approach to find them, constituting a proofof-concept.
This paper is structured in the following way: in section II we describe the current state of the art of quantum community detection, in section III the separation node set approach to (quantum) community detection is introduced to then get evaluated in section IV before concluding the findings in section V.

II. RELATED WORK
With the advent of quantum optimization heuristics like quantum annealing, possible quantum advantages have been explored for many optimization problems [22].Easily allowing for a binary encoding of solutions and showing promising performance, community detection quickly became a popular problem in quantum optimization [23].
Representing community detection natively as a QUBO problem in the basic case of partitioning into k = 2 communities, modularity maximization was the first approach used in quantum community detection [18].For a given graph G = (V, E), the modularity of a partitioning into |V | is given by for given node degrees d = d 1 , ..., d |V | and a ij denoting the entries of the adjacency matrix A of G. Straightforward calculations yield the resulting QUBO matrix which is sufficient to apply practically all currently available quantum optimization heuristics.This approach to can be generalized to k > 2 communities by introducing one-hot encoding.Here, the community assignment of a node v i ∈ V is encoded by a k-dimensional bit string the node v i is assigned to community l.The resulting optimization term is hence given by: In order to formulate this as a QUBO problem, we have to add a suitably weighted penalty term P (x) (for details, see [24]) to the optimization term to indirectly enforce the onehot encoding by P (x) = 0 if every node is assigned to exactly one community and P (x) > 0 otherwise: Apart from capitalizing on the inherent QUBO form of modularity maximization, many other quantum computing based approaches to community detection like Quantum Genetic Algorithms and Quantum Walks have been proposed in recent literature [25], [26].A particularly promising approach for near term application on large graphs is based on exploiting the quadratic nature of regularity checking related to Szemeredi's Regularity Lemma (SRL) [27].While similar to our approach, as the QUBO problems solved only involve |V | qubits, it fundamentally works differently, as communities are identified iteratively.In essence, the algorithm proposed in [27] executes the following steps: 1) Randomly split the given graph G = (V, E) into two equally sized partitions A ∪B = V and delete all edges inside the partitions to yield a bipartite graph 2) Find subsets X ⊆ A and Y ⊆ B such that X = {v i ∈ A | s i = 1} and Y = {v j ∈ B | s j = 1} where s = s 1 , ..., s |V | is the solution to the quadratic program given by: Here, given by e(V1,V2) |V1||V2| and e(V 1 , V 2 ) represents the number of edges connecting V 1 and V 2 .
3) Identify C := X ∪ Y to be a community and repeat steps 1) and 2) for the subgraph induced on G by V \ C. While this approach has a solid graph theoretic foundation, the high number of needed solver calls and the dense QUBO matrix still pose nontrivial hardware execution challenges in the NISQ era.
Aiming to minimize the demands to the QUBO solver, we propose a radically different approach that only needs a single QPU call and whose QUBO matrix is topologically identical to adjacency matrix of the given graph and hence, equally sparse.The approach presented in this work essentially purifies a solution of relaxed community detection problem, i.e., the final community structure is represented by the solution of a QUBO problem which is based on classically computed, probabilistic community assignments for each node.While we introduce a particularly efficient approach to calculate the needed input for the QUBO problem, many other approaches to relaxed community detection have been proposed in related work like semidefinite programming or convexification [28], [29], [30], [31].
As derived in detail in the next section, our approach requires a solution for a novel relaxation of the community detection problem as input to the QUBO problem formulation.In essence, our approach demands for an estimate value for each edge, specifying whether it connects nodes belonging to different or the same communities.While such estimates could in principle be computed based on the output of solvers for the relaxed community detection problem by using, e.g., the KLdivergence of the community affiliations of neighboring nodes, we introduce a specialized estimation method tailored to this task.Notably, metrics like the edge betweenness centrality [32] also do not yield satisfactory results for our approach, as the difference in values between separation-and non-separationedges is seemingly too small.

III. CONCEPT
In the following, we explore the idea of performing community detection based on finding a suitable set of nodes separating the communities as defined in III.1 in a rigorous mathematical manner.Meeting the demand from the derived QUBO formulation for a separation edge estimator, we subsequently introduce a promising heuristic approach based on the concept of modularity.

A. Separation-node sets
The approach presented in this paper consists of two steps: (1) identifying a set of nodes separating communities and thus revealing the fundamental community structure (sec.III-B and III-C) (2) classifying the community of each separation-node to finalize the community detection (sec.III-D) Either using a trivial, greedy approach introduced in III-D or a slight adaptation of the well-known QUBO-formulation of modularity maximization [33] to perform (2), the main objective of this paper is the development of a QUBOapproach realizing (1).To provide a more formal definition of (1), we now introduce the concept of separation-node sets.
In the following, we use S to denote the set of all separationnode sets.
Definition III.1.For a graph G = (V, E) and a ground truth community structure C partitioning V , we call S ⊆ V a set of separation-nodes iff the connected components S i partitioning the graph induced by V \ S are distributed such that S i i is a refinement of C.
Equivalent to this definition, one could also demand the existence of a refinement map ϕ : Utilizing the notion of separation-node sets, (1) can be formulated as finding a smallest set of separation-nodes whose associated refinement map ϕ is ideally bijective.An example of a set of separationnodes satisfying these conditions is depicted in subfigure 1b, which is part of figure 1 displaying the proposed approach.As it will become apparent in the evaluation, such well behaved separation node sets can also be found in graphs with application near topologies.Fig. 1: Outline of the workflow for the approach proposed approach of community detection via separation-node identification.The computationally expensive tasks of identifying a set of separation-nodes (subfigure 1b) and classifying the communities for these nodes (subfigure 1d) are performed using quantum computing, while the computationally cheap tasks of removing the classified separation-nodes and identifying the resulting connected components (subfigure 1c) are done classically.
The surjectivity of ϕ ensures that each community gets detected and its injectivity ascertains, that no communities get split.In the following, we will call separation-node sets injective, surjective or bijective iff the respective refinement function satisfies these conditions.In order to formulate a QUBO problem where the optimal solution represents the minimal separation-node set, we start by stating an alternate, more convenient, definition of minimal separation-node sets.
Theorem III.1.For an adequate penalty term P ensuring the separation-node set properties, the following equation states an equivalent definition of the set containing all minimal separation-node sets S min .
Here, we used x ∈ {0, 1} |V | as a 0-flag for separation-nodes, a ij to denote the entries of the adjacency matrix, c : V → C as a mapping of nodes to their ground truth community and the Kronecker delta δ xy .For a penalty term P ensuring the validity of the separation-node set definition by penalizing incident node pairs from strictly different communities where neither node is element of the sought-after separation-node set, see the following definition: Therefore, the task of finding a smallest set of separationnodes for any given graph is native to the concept of QUBO and its formulation can be reduced to approximating δ c(vi)c(vj ) for incident node pairs v i , v j ∈ V .This can be understood as calculating the probability of an edge being an interconnection of adjacent nodes belonging to different communities, or more formally, a separation-edge.
Most interestingly, we can show that solving the QUBO problem stated in equation 5 is NP-hard for a specific estimator.To see this, we start by observing a substantial similarity of our QUBO formulation with the QUBO formulation of the Max-Clique problem as stated in [34]: for a given graph G = (V, E) and its corresponding adjacency matrix A with entries a ij .Choosing the estimator s : V × V → {0, 1} by s ((v i , v j )) := a ij , it becomes apparent, that the QUBO formulations are identical if we specify to use a complete graph of size |V | as an input to our QUBO formulation.Leaving an extensive mathematical analysis of the NP-hardness for more realistic estimators to future work, this shows that the problem of finding a minimal separation-node set is NP-hard when treating the estimator as a variable.This result supports the pursuit of the proposed approach of using quantum computing in order to find a minimal separation-node.
Returning to the initial goal of finding bijective separationnode sets, we now explore their surjectivity.A significant discovery regarding surjectivity is illustrated in figure 2, showing no-free-lunch when using theorem III.1 to find surjective separation-node sets.This necessitates the addition of a penalty term to the QUBO formulation in order to ensure surjectivity when building upon theorem III.1.For the formulation of a suitable penalty term, see appendix V-B.(c) Minimal surjective separationnode set consisting of all nodes marked in red.Fig. 2: Counterexample proving no-free-lunch when using theorem III.1 to find surjective separation-node sets.
As our formulation results in a PUBO (polynomial unconstrained binary optimization) problem of degree O(log 2 |V |), we conjecture, that this constraint cannot be realized in QUBO form without the addition of ancillary variables.Using the standard quadratization approach with the Rosenberg polynomial [35], a QUBO formulation of this term demands superpolynomially many ancillary variables, i.e., O |V | 2 log 2 log 2 |V | .In the context of quantum annealing, this scaling beyond a quadratic number of qubits makes the surjective separation-node approach overly complex compared to the standard modularity maximization.In the gate model, the QAOA can be used to solve PUBO problems in principle, but as current hardware limitations prohibit adequate evaluation, we leave the exploration of the surjectivity constraint to future work.As a consequence of not enforcing surjectivity, there exists a possibility that number of communities is incorrect after step (1) of detecting the fundamental community structure by separation-node set identification.Modifying step (2) slightly, this could in principle be compensated by iteratively increasing the number of possible communities until no further improvement of the modularity can be achieved.A clever way to do this, could be the elbow-method as known in clustering [36].For the alternative greedy approach for the second step (2), the possibility of merging communities could be allowed.
Fortunately, conducted experiments show that topological structures precluding free lunch are scarce in practice.Therefore, we will omit the explicit demand for surjective separation-node sets in the following.
Analog to the surjectivity, there exist graph topologies like the one displayed in figure 3 showing no free lunch when using theorem III.1 to find injective separation-node sets.Hence, it appears necessary to ensure injectivity explicitly using a penalty term when building upon theorem III.1 in principle, as well.The formulation of such a penalty term also turns out to be rather tedious, as can be seen in appendix V. 6.In this case, we end up with an even higher dimensional PUBO problem for the injectivity than for the surjectivity.Luckily, compared to the surjectivity, the injectivity of a separationnode set is of less importance, as the second step (2) could easily be adapted to cope with this.Analog to the case of surjectivity, we observe such topological structures preventing free lunch quite rarely in conducted experiments, resulting in the analog dismissal of an explicit demand for the separationnode sets to be injective in practice.Fig. 3: Counterexample indicating no-free-lunch when using theorem III.1 to find injective separation-node sets.
In summary, the apparent infrequence of topological structures preventing free lunch regarding bijectivity renders the QUBO-formulation stated in theorem III.1 to be a wellfounded starting point for the stated proposition of QUBO based community detection via separation-node sets.
While this approach provides exact results for a perfect classification of separation-edges, it fully relies on a suitable estimation heuristic.Although many known measures for a variety of edge properties exist (as described in II), none showed to be entirely suitable for detecting separation-edges according pretesting conducted for this paper.Consequently, we now motivate a novel approach tailored for exactly this task on the concept of modularity.

B. Modularity based separation-edge estimation
Motivated by the proven optimality of modularity and by the fact that at its core, modularity stems on essentially estimating whether each node pair is likely to belong to the same or different communities, we start by showing how this idea can be used to estimate δ c(vi)c(vj ) .For this, recall the definition of the entries of the modularity matrix: As before a ij are the entries of the respective adjacency matrix, while E [e ij ] denotes the expected value of the number of edges between v i and v j , J ij .Upon closer inspection, we observe two main cases: • m ij > 0, iff less connectivity between v i and v j was to be expected, indicating that v i and v j likely belong to the same community • m ij < 0, iff more connectivity between v i and v j was to be expected, indicating that v i and v j likely belong to different communities As the matrix entries m ij are normalized to the interval of [−1, 1] by the division with |E|, we can see, that using proper rescaling to the intervall of [0, 1], i.e., via 2 (m ij + 1), this allows for an estimation of the term δ c(vi)c(vj ) in principle.
In practice however, this approach yields extremely bad estimations, as only the entries m ij of the modularity matrix are relevant, that correspond to a given edge (v i , v j ) ∈ E. For these, it quickly becomes apparent, that m ij is typically larger than 0, making this exact idea infeasible in practice.These considerations motivate an adaptation of modularity for the estimation of separation-edges as proposed in the following.

C. Edge neighborhood connectivity based separation-edge estimation
Exploiting the mathematical structure of modularity for a straightforward separation-edge estimation, we now introduce a promising generalization of the previous approach, which we coin as the neighborhood connectivity of an edge.Instead of merely taking the direct connection between two nodes into account (i.e., an edge), the neighborhood connectivity of an edge considers connections between the neighborhoods of the nodes.In this context, the neighborhood N r (v) of a node v ∈ V is defined to be the set of nodes with a shortest path of length r to v.
Based on this idea, we can rephrase the basic case of our generalization, i.e., modularity, as merely counting the number of unique edges on paths of length 1 between the 0-neighborhoods N 0 (v i ) and N 0 (v j ) of the respective nodes v i and v J .The here proposed generalization introduces the following two new notions: (1) Consider connections between r-neighborhoods with radius r ≥ 0 (2) Also consider paths of length 2 Stating this more precisely in mathematical form, we now define the neighborhood connectivity ν (l) r of an edge given a path length l, and a neighborhood size r: In this definition, a r denotes the number of unique edges contained in paths of length l connecting the r-neighborhoods of the given nodes which do not involve nodes or edges contained by the (r − 1)-neighborhoods (as this would result in possible double counting of edges).Analogously to the definition of modularity, E a r can assume.These values can be calculated based on a simple breadth first search with depth r iterating of the neighborhood layers while choosing v i and v j as starting nodes.As for the expected value calculation, the configuration model has shown to be an adequate choice (which is in line with modularity).For details on this, we refer to our implementation which can be made available upon request to the authors.
Our preferred method of combining the results into the neighborhood connectivity ν of a given edge based on all ν (l) r is the dot product with a weight vector w with entries w (l) r ∈ R + 0 such that their sum equals 1: As we know that the standard modularity value is of little use, we chose w (1) 0 = 0. We consider the remaining weights as hyperparameters, for which w 1 have proven to be suitable values according to conducted experiments.

D. Assigning the separation-nodes to communities
As stated in sec.III-A, we propose two different approaches to assigning the separation-nodes to communities, i.e., (1) a greedy strategy and (2) modularity maximization.In the experiments conducted in this paper, the greedy strategy was mainly employed for all experiments.It works as follows: (1) count the number of edges to every know community for each separation-node (2) assign the node with the most edges to a single community to that community (3) update the counts for every neighboring separation-node (4) repeat steps two and three until every separation-node is properly assigned to a community This algorithm has a runtime of the number of separation nodes S times the number of communities |C|, O (S • |C|) and hence runs very efficiently.
As the results calculated based on the edge neighborhood connectivity did not always show reasonable quality to sensibly use this greedy optimizer, we chose to use the standard of modularity maximization for these special cases.Fortunately, the well known QUBO approach to this [18], can be easily adapted to our situation, i.e., by clamping the values of the known community assignments, where clamping is to be understood in the same way as it is used in quantum Boltzmann machines [37].This yields a QUBO problem of size O (S • |C|), which often can be solved a lot quicker than the original problem, as S < |V | in

IV. EVALUATION
The evaluation aims at the examination of the validity of the following two claims: (1) the assignment of separation-nodes to their communities is computationally easy, given a good enough estimator (2) neighborhood connectivity allows for proof of concept results As we will show in the following, both claims appear to be valid according to the conducted experiments.
For investigating claim (1), we propose to check, if the greedy separation-node assignment as described in sec.III-D is sufficient to assign the nodes of a well behaved separationnodes to the correct communities.If this approach is indeed sufficient to obtain (nearly) perfect solutions, we reason that the claim is most likely valid.
In order to eliminate the possibility of an insufficient separation-node set, we use a synthetic dataset with known community structure, allowing for the use of a perfect estimator for the separation-edges.In order to find a very good separation-node set, we utilize a simple simulated annealing approach to solve the associated QUBO as defined in 5. Regarding the synthetic dataset, we choose the stochastic block model (SBM) [38] which is a widely used tool for benchmarking in the realm of community detection.Aiming to achieve realistic results, we use a graph of size |V | = 250, structured into seven equally sized communities with varying intra-and interconnections between the communities, resembling three different difficulties, according to the phase transition of community detection on SBMs (for details on the phase transition, see [9]).As it becomes apparent in the corresponding figure 4, the greedy separation-node assignment indeed yields optimal or at least close to optimal results, indicating the validity of claim (1).Fig. 4: This figure shows the NMI score of the presented approach for 50 different graphs each, based on ground truth and a perfect separation-edge estimator coupled with the greedy separation-node assignment.The NMI score as defined in [39], [40] was used, as it resembles a well proven measure for the accuracy of a community given ground truth [41].The different probabilities for intra-community edges in the chosen SBM model resemble different difficulties according to the phase transition known for this model.The lower the stated probability, the harder the problem.The probabilities were chosen such that the hardest graphs barely differed from a null model inheriting no measurable structure up to the hardest that still allowed perfect NMI scores.
Having seen solid results for the optimal estimator, we now want to investigate the performance of the here presented "neighborhood connectivity" approach for real world data and hence explore claim (2).For this, we choose the greedy separation-node assignment, so that the separationnode identification displays the only non-trivial task, capable of solving the problem instances.Choosing standard real world benchmark graphs of varying size, we can observe stable results for most datasets in figure 5, while often achieving 90 to 95% optimal results.(2) the social interactions between dolphins [43], (3) the collectively appearing characters in the book "Les Miserables" [44], (4) protein protein interacations [45] and ( 5) jointly bought political books [46].Each graph was analyzed 10 times using simulated annealing.Our approach clearly does not work well for the karate club network.Closer inspections yield that the connected components resulting from the found separation-node sets often only consist of single nodes, indicating suboptimality in using neighborhood connectivtiy for this dataset.
Motivated by these proof of concept results, we now investigate the performance of the proposed estimator (edge neighborhood connectivity) in order to explore its optimal mission scenario.For this, we again resort to equally formed SBM benchmark graphs with slightly higher intra-community connection probabilities, as they offer the comparison with ground truth information.Concretely, we chose these probabilities to be 0.75 for the easy case, 0.625 for the medium case and 0.5 for the hard case, which was the easy case for the experiments conducted with the perfect estimator (and picked previously as the hardest case to still yield perfect results).
Analogously to the perfect estimator, the identified separation-node sets were all valid and bijective in a small test run on 10 graphs.Switching the main optimization goal, we now examine the size of the identified separation-node sets for graphs of different difficulty, as displayed in figure 6.Here, we can see, that the sizes of the separation-node sets found are substantially larger than the best known solution, this becomes apparent especially for easy problem instances.Interestingly, the performance quality increases for harder problems in relative perspective, showing promising scaling behavior.
Although the separation-node sets found are well behaved, the combination with the greedy separation-node assignment to communities does yield substantially worse results than the perfect estimator, as shown in figure 7.
Subsequent experiments show, that the performance for the medium and hard datasets can be improved significantly by exchanging the greedy approach for a simulated annealing based one, as shown in figure 8.
As described in the caption of figure 8, simulated annealing based on the QUBO as described in III-D seems to be a suboptimal choice to assign separation-nodes to communities.We suspect that the reason for this resides in the large size of the search space for the given problem instances due to the employed one-hot encoding.As identified separation-node sets are typically sized up to 200 nodes (compared to the roughly 120 nodes for the perfect estimator), the search space for the problem instances thus contains roughly (200 • 7)! = 1400!possible solutions, as 7 different communities exist.
In order to put the results of the developed separationedge estimator based on edge neighborhood connectivity into perspective with an optimal estimator, we now investigate its R 2 score in figure 9. Interestingly, the worse performance for Fig. 8: This figure depicts the normalized mutual information score of the selected SBM benchmark graph using a simulated annealing based approach of assigning the separation-nodes to communities.The worse performance for the easy dataset clearly indicates that the chosen simulated annealing approach based on the QUBO as described in III-D is suboptimal in general.
larger datasets has no impact on the validty and bijectivity of the subsequently identified separation-node set, which is very promising in regards to scaling.Fig. 9: R 2 score of the edge neighborhood connectivity based separation-edge estimator.In practice, an R 2 score of 30% implies that merely 30% of the variability of the ground truth has been accounted for.A strict trend towards worse results for harder datasets is clearly visible.This shows that the performance of the estimator decreases for harder problem instances as to be expected while still yielding somewhat accurate results.

V. CONCLUSION
Having set out with the goal of developing a quantum community detection approach that allows for the analysis of large graphs in the NISQ era, we presented the idea of identifying communities via their borders.The derived separation-node set based approach was shown to yield (close to) optimal results depending on the accuracy of the classical separation-edge estimator.The therefore proposed heuristic approach based on the introduced concept of "edge neighborhood connectivity" enabled for proof of concept results on real world data.In particular, as our approach merely requires |V | qubits and as the corresponding QUBO is as sparse as the input graph G = (V, E), separation-node based community detection resembles the least hardware demanding quantum computing approach to community detection to the best of our knowledge.The underlying trade-off necessary for this accomplishment clearly is the more demanding classical part in this hybrid approach (i.e., the separation-edge estimation).We firmly encourage future work on this heuristic, while conjecturing the incorporation of solutions of the relaxed community detection problem as highly beneficial.Furthermore, the exploration of adaptations of similar known metrics like edge betweenness centrality [47] also seem very interesting.Overall, we conclude our approach to be highly promising for accelerating the possibility of solving real world community detection problems using quantum computers and thus opening up a path towards network structure analysis in big data.
(a) Exemplary graph consisting of three connected cliques.(b) Identification of a set of separation-nodes (marked in red).(c) Removal of the set of identified separation-nodes and identification of the resulting connected components.(d) Classification of the community of all identified separation-nodes.

r
acts as a normalization factor denoting the highest possible number a (l)

Fig. 5 :
Fig.5: This box plot displays the fraction of the achieved modularity score by the best known solution for selected standard benchmark datasets: (1) the social network of a karate club[42],(2) the social interactions between dolphins[43], (3) the collectively appearing characters in the book "Les Miserables"[44], (4) protein protein interacations[45] and (5) jointly bought political books[46].Each graph was analyzed 10 times using simulated annealing.Our approach clearly does not work well for the karate club network.Closer inspections yield that the connected components resulting from the found separation-node sets often only consist of single nodes, indicating suboptimality in using neighborhood connectivtiy for this dataset.

Fig. 6 :Fig. 7 :
Fig.6: The y-axis depicts the deviation factor from the best known separation-node set in size.Notably, the absolute sizes of the identified separation-node sets are typically similar over the different difficulties, while they rise slightly for larger graphs.