Stochastic Local Community Detection in Networks

Papei, Hadi; Li, Yang

doi:10.3390/a16010022

Open AccessArticle

Stochastic Local Community Detection in Networks

by

Hadi Papei

¹ and

Yang Li

^2,*

¹

Department of Physics and Astronomy, The University of Western Ontario, London, ON N6A 3K7, Canada

²

Department of Mathematical Sciences, Florida Atlantic University, Boca Raton, FL 33431, USA

^*

Author to whom correspondence should be addressed.

Algorithms 2023, 16(1), 22; https://doi.org/10.3390/a16010022

Submission received: 11 November 2022 / Revised: 24 December 2022 / Accepted: 28 December 2022 / Published: 1 January 2023

(This article belongs to the Special Issue Recent Advances in Community Detection Algorithms and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

We propose a stochastic agglomerative algorithm to detect the local community of some given seed vertex/vertices in a network. Instead of giving a deterministic binary local community in the output, our method assigns every vertex a value that is the probability that this particular vertex would be in the local community of the seed. The proposed procedure has several advantages over the existing deterministic algorithms, including avoiding random tie-breaking, evaluating uncertainties, detecting hierarchical community structure, etc. Synthetic and real data examples are included for illustration.

Keywords:

community; stochastic algorithm; network science; simulated annealing

1. Introduction

The structure of complex networks, such as social, metabolic, economic, and informational networks, has been an active research area of network science in the past decades [1,2,3,4]. An interesting area in network science is detecting the community structure in complex networks. Loosely speaking, a community is qualitatively defined as a group of vertices whose interconnected edges are much denser than the edges connecting vertices outside of the group. Communities exist in all types of networks and play an important role in the underlying structure, dynamics, and evolution of the network. Vertices belonging to a tightly bounded group tend to have some common properties. Details on community detection algorithms can be found in [5,6].

While partitioning the whole network requires its global structure information, it is sometimes more desirable to find out the local community starting from a seed, either a single vertex or a group of vertices. For instance, it would be impractical to divide more than two billion Facebook users into separate communities. In another example, the World Wide Web (WWW) is a dynamic network whose global structure is too large to be fully known at any given time point. Instead, a more reasonable question would be to find out those users who are closely related to a given person in his/her social network or to find out all topically related websites starting from a seed page. Bagrow and Bollt proposed an agglomerative L-shell method where vertices are added to the local community based on their geodesic distance from the origin [7]. Papadopoulos et al. introduced a similar method, bridge bounding, which identifies edges that act as boundaries to the local community [8,9]. Clauset proposed a greedy maximization algorithm for finding a local community iteratively through a local modularity measure [10]. While Clauset’s algorithm is intuitive in concept and straightforward to implement [10,11], it does have some drawbacks. At an intermediate iteration, if two vertices lead to the same amount of increase in the local modularity, the tie will be broken randomly. As a result of that, two completely different local communities can be found. Moreover, the output is only qualitative such that any vertex is either included in or excluded from the local community. A numerical measure that quantifies the likelihood of being in the local community would be more informative. Further, a deterministic algorithm may be stuck in local optima, i.e., solutions with the sub-optimal path in which further addition of vertices leads to worse solutions. These drawbacks are common for deterministic algorithms.

In this paper, we propose a stochastic algorithm to search for local communities that can remedy the aforementioned limitations and provide more information on the underlying structure of the network. The paper is organized as follows. In Section 2, a stochastic agglomerative search algorithm is introduced. Its advantages over existing methods are illustrated. In Section 3, the proposed algorithm is applied to synthetic datasets to justify its validity. It is further utilized on the well-known Zachary’s karate club data and Lusseau’s network of bottlenose dolphins. In Section 4, some discussions and conclusions are presented.

2. Materials and Methods

Consider a connected network

G = (V, E)

where V and E are the sets of vertices and edges of

G

, respectively. Suppose

v_{0}

is a seed vertex or a group of seed vertices, and we intend to find the local community originating from

v_{0}

. For example, a web crawler starts from a list of seed web addresses, searches for hyperlinks in the current pages, and visits the newly discovered pages recursively. In a recommendation network, a single item can lead to a group of other topically related items through similar browsing and purchasing history. Several methods have been proposed to identify local communities that lack a global network structure [7,8,10,12]. Among these works, Clauset proposed a very intuitive way of finding the local community in an iterative bottom-up procedure [10]. Let us denote the local community as

C

. Its neighbor set

U

is defined to be those vertices that are not in

C

but adjacent to at least one vertex in

C

. The boundary set

B

of

C

is defined to be those vertices in

C

that have at least one neighbor in

U

. See Figure 1 for an illustration. A new vertex is added to the current local community one at a time such that the local modularity,

R (C) = \frac{the number of edges with both ends in C}{the number of edges with one or more ends in C},

(1)

is increased the most.

R (C)

quantifies the proportion of internal edges of

C

, and thus can be considered an indicator of the bounding strength in

C

. Higher value of

R (C)

represents a more tightly bounded group of vertices, which has a higher chance of being a community. It is possible to use other local criteria, e.g., clustering coefficient [13], to define tightly connected local groups.

This algorithm is deterministic in the sense that exactly the same result (local community) will be obtained if we rerun the algorithm starting from the same seed. Moreover, the result is qualitative since a given vertex will be either included in or excluded from the detected local community. In fact, two different vertices in the local community might differ in a quantitative way such that one vertex is closely connected to the seed while the other one barely makes it. Therefore, a more natural and informative way is to gauge the likelihood that a given vertex should be in the local community. Every vertex will be marked by a numerical value such that a higher value implies a larger chance of being in the local community than a lower value.

2.1. A Stochastic Algorithm

We propose an iterative stochastic agglomerative procedure to search for the local community of seed

v_{0}

. As mentioned before,

v_{0}

could be a single vertex or a group of vertices. The local community is still denoted by

C

, whose element vertices are given by the index set

S_{C}

. The index sets for

U

and

B

are represented by

S_{U}

and

S_{B}

, respectively. See Figure 1. At each iteration, a new vertex in

U

could be added to

C

, aiming to increase

R (C)

. Meanwhile, a vertex currently already in

C

is also subject to removal. Specifically, for each member

v_{j}

in

U

(j \in S_{U})

, we compute the change in local modularity

Δ R_{j}^{A} = R (C + v_{j}) - R (C)

if

v_{j}

is added to

C

. Meanwhile, we also compute the change

Δ R_{k}^{R} = R (C - v_{k}) - R (C)

if a vertex

v_{k}

in

C

is removed from

C

except for those vertices that are either in the seed or whose removal will make

C

disconnected. Let

Δ R = {Δ R^{A}, Δ R^{R}}

be the changes of all candidate vertices. Instead of choosing the one with the largest increase

Δ R

, we select a vertex in a probabilistic way.

We introduce a tolerance parameter

ϵ

that specifies the allowed reduction in

Δ R

between iterations. All candidate vertices with

Δ R < - ϵ

will not be considered since the reduction in

R (C)

is more than what we can tolerate (points below the dashed line in Figure 2). We randomly select one vertex

v_{j}

from the remaining candidates with probability

(Δ R_{j} + ϵ) / \sum_{k} (Δ R_{k} + ϵ)

. In this way, the vertex with the largest increase will have the largest chance to be selected, but other vertices are not completely eliminated, even though their chances are lower.

Figure 2 shows a toy example of the algorithm on a small network with nine vertices. The seed is vertex 1. The tolerance parameter

ϵ

is chosen to be 0.05. Three steps are shown where red squares are vertices in the current local community

C

, and blue circles are the neighbor set

U

. In the first step (Figure 2a), there are three candidate vertices (3, 4, and 6) to add. The bottom panel of Figure 2a shows the three new

R (C)

values (1/5, 1/5, and 1/6) if each of them is added. The old

R (C)

is 0 (the solid line) since there is only one vertex in

C

. One of these three vertices is randomly selected based on the increase

Δ R (C) + ϵ

(the distance from the point to the dotted line). It turns out that vertex 3 is selected, highlighted by the vertical arrow on the plot. Note that any vertex with new

R (C)

below the dotted line will not be considered because it causes a reduction in

R (C)

more than our tolerance level

ϵ

. The dashed line represents the new

R (C)

after the current step with vertices 1 and 3 in

C

. The second step is shown in Figure 2b, where vertex 9 is added to the local community. Note that we also consider the possibility of removing the existing vertices (vertex 3) from

C

. In the right panel of Figure 2, vertex 6 is added to

C

. Removal of vertex 3 is not considered in Figure 2c because it will make the local community disconnected. The algorithm keeps running until no vertices can be added or removed from

C

; that is,

R (C)

cannot be improved anymore. Subset

C

, when the algorithm terminates, is one realization of the local community of seed

v_{0}

. The pseudocode for this process is illustrated in Algorithm 1.

Algorithm 1 The algorithm for the stochastic maximization of local modularity.

add initial seed $v_{0}$ to $C$
add all neighbors of $v_{0}$ to $U$
set $B$ to be the boundary of $C$
compute $R (C)$ , the local modularity of $C$
while stopping rules are not satisfied do
choose an adaptive value for $ϵ$
Set $S_{Add}$ and $S_{Remove}$ to be empty sets
for each $v_{j} \in U$ do
compute $Δ R_{j}^{A} = R_{C + v_{j}} - R_{C}$
insert $Δ R_{j}^{A}$ into $S_{Add}$
end for
for each $v_{k} \in C - v_{0}$ do
if $C - v_{k}$ is connected then
compute $Δ R_{k} = R_{C - v_{k}} - R_{C}$
insert $Δ R_{k}^{R}$ into $S_{Remove}$
end if
end for
merge $S_{Add}$ and $S_{Remove}$ to get a new set $S_{All} = S_{Add} \cup S_{Remove}$
select one $v_{m}$ from $S_{All}$ with probability $\frac{max {0, Δ R_{m} + ϵ}}{\sum_{k} max {0, Δ R_{k} + ϵ}}$
if $v_{m} \in U$ then
add $v_{m}$ to $C$
else
remove $v_{m}$ from $C$
end if
update $U$ , $B$ , and $R (C)$
if stopping rules are satisfied then
break out of the while loop
end if
end while

2.2. Tolerance Parameter and Stopping Criteria

Parameter

ϵ

controls the tightness of the local community. A big value of

ϵ

allows a big reduction in

R (C)

between iterations. It leads to a larger selection of candidate vertices, thus allowing the algorithm to probe a bigger region of the network at the cost of a longer running time. Using a small value of

ϵ

allows a small reduction in

R (C)

. Therefore, the algorithm will converge faster but might miss some important structures due to the inclusion of fewer candidate vertices.

ϵ = 0

is similar to what was performed in Clauset’s original method, where

R (C)

increases monotonically [10]. In a sense, the algorithm is similar to the simulated annealing optimization using the probabilistic technique to approximate the global optimum of a given function [14]. In practice, a single constant optimal value of

ϵ

that fits all scenarios does not exist. It depends on the structure of the network, including the clustering, degree distribution, etc. In the initial stage of the algorithm, where there are big fluctuations between consecutive steps,

ϵ

can be chosen to be large in order to probe a larger area of the network. Later, when

R (C)

stabilizes,

ϵ

can take smaller values to fine-tune the results.

Certain criteria must be met in order to terminate the algorithm. It is possible that all vertices are included in the local community. More generally, if the local modularity

R (C)

varies little for a certain number of consecutive steps, we then conclude that running the algorithm for more steps will not improve the performance, and it should be terminated.

2.3. Evaluation of Inclusion Significance

The procedure outlined in Algorithm 1 needs to be repeated a number of times to receive different realizations of

v_{0}

’s local community. For each vertex

v_{j}

in the network, we compute the proportion of times that it is included in the realizations, denoted by

P_{v_{o} v_{j}}

, the inclusion probability of

v_{j}

in the local community of

v_{0}

, whose value reveals the community structure around

v_{0}

. A large value of

P_{v_{0} v_{j}}

indicates that vertex

v_{j}

has a high likelihood of being in the local community of

v_{0}

, and vice versa.

A natural question regarding this procedure is that since we potentially could end up with various realizations of the local community, can we simply choose the one with the highest local modularity and use it as the unique “best” result? Actually, this is not the goal of the proposed method. Statistically, the observed network is, in fact, a sample of the actual underlying relationship among network vertices. The output of a deterministic algorithm is a point estimate of the true local community, which is treated as an unknown parameter of the network. It does not convey much information due to the randomness in the sampling procedure. A more informative measure is to assess the significance of the obtained local community, which can be decoded by the inclusion probability.

2.4. Computational Complexity

As discussed in [10], the running time of a single iteration is approximately

O (k^{2} d)

, where k is the number of vertices in the explored portion of the network or the typical size of the local community, and d is the mean degree of the graph. In the proposed algorithm, we need to run the iteration s times to obtain the inclusion probability. Since the standard error of the inclusion probability is

\sqrt{P (1 - P) / s}

[15], a value of a few hundred for s is sufficient to assure an accuracy of 0.05. In our experiment, s is taken to be 1000.

3. Results

In this section, we will apply the proposed algorithm in both synthetic and real networks for illustration. The data analysis is carried out using the Python package networkx [16].

3.1. A Simulation Study

In the simulation study, a synthetic graph is generated from stochastic block model (SBM) with probability matrix

M = (\begin{matrix} 0.5 & 0.05 \\ 0.05 & 0.4 \end{matrix})

[17]. Each block has 20 vertices. The network is shown in Figure 3.

Since the true community structure is known, we can test the algorithm at different parameters. Figure 4 shows the inclusion probabilities of the local community of vertex 1 by using three constant tolerance parameters,

ϵ = 0

, 0.02, and 0.04. It is apparent that the algorithm can detect the underlying hierarchical structure. Meanwhile, as

ϵ

increases, the inclusion probability also increases for vertices in the other block. This is due to the fact that a broader region in the network is being probed because there is a higher chance of crossing the bridges between the blocks.

As discussed in Section 2.4, the running time scales quadratically with the network size. It is thus feasible to apply the algorithm to a larger network. Moreover, in order to assess the quality of the detected local community, we computed the average inclusion probability

{\bar{P}}_{v_{0}}

of all vertices in the same block as the seed

v_{0}

. A higher value indicates a good quality of the detected community. The same probability matrix M is used on networks with different sizes, and the result is shown in Table 1. Since all values are higher than 90%, the proposed algorithm performs well on finding the underlying local community structure.

3.2. Zachary’s Karate Club Data

Zachary’s karate club data is a well-known dataset in network science, which contains 34 vertices and 78 edges representing friendship outside the club activities [18]. During the study, a disagreement emerged between the two most central vertices, the instructor (vertex 1) and the administrator (vertex 34), which caused the fission of the club into two smaller groups. Since the actual partition is known, the karate club data set is often used as a benchmark for testing different community detection algorithms. Figure 5 shows the network with the community structure detected by [19]. There are four communities distinguished by different colors and shapes.

We applied the proposed algorithm starting from different seed vertices. For each seed, 1000 realizations are obtained, and the resulting inclusion probabilities are shown in Figure 6 and Figure 7. Algorithms starting from the two central vertices (1 and 34) can recover the actual splitting observed by Zachary, as shown in Figure 6a,b. Additionally, some subdivisions discovered by [19] can also be revealed by the algorithm. Figure 6c and Figure 7c show that the local community of vertex 5 is tightly bounded among the five green vertices in Figure 5. The inclusion probabilities from seed vertex 25 illustrate a similar situation as shown in Figure 6d and Figure 7d, except that the five vertices closely bounded to vertex 25 are at different levels. Vertices 26, 29, and 32 have inclusion probabilities close to 1, which is almost double that of vertex 24 or 28. It is clear that the proposed algorithm can provide a quantitative assessment of the likelihood of being in the local community. A deterministic algorithm cannot do that. Vertices 3, 9, and 10 lie between the two main structures and are easily misclassified by the existing community detection algorithms. Figure 6e and Figure 7e show that the inclusion probabilities from vertex 9 are somewhat homogeneous but lean toward the portion centered around the administrator (vertex 34).

Figure 6e and Figure 7e show an interesting observation. The inclusion probability of vertex 10 is high (∼1) in the local community of vertex 9. However, these two vertices are not even connected. A connection between them would be highly recommended in link predictions [20].

3.3. Lusseau’s Network of Bottlenose Dolphins

The network of social interactions of bottlenose dolphins living in Doubtful Sound, New Zealand, is another popular benchmark data set for community detection [21]. A total of 62 dolphins was observed over a period of 7 years, and edges represent mutual associations occurring more often than by chance. The network was found to have two main communities, as shown in Figure 8a [22]. Using the proposed algorithm and starting from the dolphin named Grin (the one with the highest degree), the inclusion probabilities in its local community are shown in Figure 9. Three different levels are apparent in Figure 8b, with values clustered at (i) close to 1 (blue); (ii) between 0.4 and 0.7 (green); and (iii) close to 0 (red). A cutoff value of 0.4 will lead to exactly the same partition as Figure 8a. The takeaway from the output is that even though both blue and green dots are in the local community of dolphin Grin, there is a difference between the two groups. Blue vertices are tightly connected, while the green vertices are significantly lower in magnitude. The quantification of the communities and discovery of the hierarchical structure are the main advantages of the proposed algorithm.

4. Discussion

In this paper, we propose a stochastic agglomerative algorithm and use a simulated annealing approach to evaluate the local community

C

starting from the seed. By iterating the procedure, we obtain the inclusion probabilities of being in the local community for all vertices in the network. The computational complexity has a polynomial scaling with the size of the explored portion of the network and, therefore, is manageable for large networks. The biggest advantage of the proposed method compared to the existing deterministic algorithms is the ability to numerically quantify the output. One can control the tightness of

C

by varying the threshold probability. A high threshold will result in a tightly bounded local community in which all vertices are closely related to the seed. Lowering the threshold tends to include more vertices, but some of them may not have a significant connection. This is a tradeoff between diversity and accuracy [23]. Through simulation studies and two real benchmark data sets, the proposed method is shown to be able to reveal the actual underlying community structure by comparing it to existing algorithms. It can shed light on the hierarchical structure of the network, predict new connections in link prediction [20], and provide reliable recommendations in recommender systems [24]. For instance, the inclusion probabilities can be ranked in decreasing order, and then the links between the seed and the top vertices that are not yet connected to the seed should be recommended. Large-scale applications in link predictions and recommender systems are to be addressed in future research.

Author Contributions

Conceptualization, H.P. and Y.L.; methodology, H.P. and Y.L.; software, H.P. and Y.L.; validation, H.P. and Y.L.; formal analysis, H.P. and Y.L.; writing—original draft preparation, H.P. and Y.L.; writing—review and editing, H.P. and Y.L.; supervision, Y.L.; project administration, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank the two anonymous reviewers whose comments and suggestions greatly improved the quality and readability of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Newman, M. Networks: An Introduction; Oxford University Press: Oxford, UK, 2010. [Google Scholar]
Newman, M.E. The structure and function of complex networks. SIAM Rev. 2003, 45, 167–256. [Google Scholar] [CrossRef] [Green Version]
Strogatz, S.H. Exploring complex networks. Nature 2001, 410, 268. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Jackson, M.O. Social and Economic Networks; Princeton University Press: Princeton, NJ, USA, 2010. [Google Scholar]
Fortunato, S. Community detection in graphs. Phys. Rep. 2010, 486, 75–174. [Google Scholar] [CrossRef] [Green Version]
Fortunato, S.; Hric, D. Community detection in networks: A user guide. Phys. Rep. 2016, 659, 1–44. [Google Scholar] [CrossRef] [Green Version]
Bagrow, J.P.; Bollt, E.M. Local method for detecting communities. Phys. Rev. E 2005, 72, 046108. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Papadopoulos, S.; Skusa, A.; Vakali, A.; Kompatsiaris, Y.; Wagner, N. Bridge bounding: A local approach for efficient community discovery in complex networks. arXiv 2009, arXiv:0902.0871. [Google Scholar]
Rodrigues, F.A.; Travieso, G.; Costa, L.d.F. Fast community identification by hierarchical growth. Int. J. Mod. Phys. C 2007, 18, 937–947. [Google Scholar] [CrossRef] [Green Version]
Clauset, A. Finding local community structure in networks. Phys. Rev. E 2005, 72, 026132. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Hui, P.; Yoneki, E.; Chan, S.Y.; Crowcroft, J. Distributed community detection in delay tolerant networks. In Proceedings of the 2nd ACM/IEEE International Workshop on Mobility in the Evolving Internet Architecture, Kyoto, Japan, 27–30 August 2007; p. 7. [Google Scholar]
Bagrow, J.P. Evaluating local community methods in networks. J. Stat. Mech. Theory Exp. 2008, 2008, P05001. [Google Scholar] [CrossRef] [Green Version]
Eckmann, J.P.; Moses, E. Curvature of co-links uncovers hidden thematic layers in the world wide web. Proc. Natl. Acad. Sci. USA 2002, 99, 5825–5829. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Kirkpatrick, S.; Gelatt, C.D.; Vecchi, M.P. Optimization by simulated annealing. Science 1983, 220, 671–680. [Google Scholar] [CrossRef] [PubMed]
Ott, R.L.; Longnecker, M.T. An Introduction to Statistical Methods and Data Analysis, 7th ed.; Cengage Learning: Boston, MA, USA, 2015. [Google Scholar]
Hagberg, A.A.; Schult, D.A.; Swart, P.J. Exploring Network Structure, Dynamics, and Function using NetworkX. In Proceedings of the 7th Python in Science Conference, Pasadena, CA, USA, 19–24 August 2008; pp. 11–15. [Google Scholar]
Holland, P.W.; Laskey, K.B.; Leinhardt, S. Stochastic blockmodels: First steps. Soc. Netw. 1983, 5, 109–137. [Google Scholar] [CrossRef]
Zachary, W.W. An information flow model for conflict and fission in small groups. J. Anthropol. Res. 1977, 33, 452–473. [Google Scholar] [CrossRef] [Green Version]
Donetti, L.; Munoz, M.A. Detecting network communities: A new systematic and efficient algorithm. J. Stat. Mech. Theory Exp. 2004, 2004, P10012. [Google Scholar] [CrossRef] [Green Version]
Lü, L.; Zhou, T. Link prediction in complex networks: A survey. Phys. A Stat. Mech. Its Appl. 2011, 390, 1150–1170. [Google Scholar] [CrossRef] [Green Version]
Lusseau, D.; Schneider, K.; Boisseau, O.J.; Haase, P.; Slooten, E.; Dawson, S.M. The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations. Behav. Ecol. Sociobiol. 2003, 54, 396–405. [Google Scholar] [CrossRef]
Arenas, A.; Fernandez, A.; Gomez, S. Analysis of the structure of complex networks at different resolution levels. New J. Phys. 2008, 10, 053039. [Google Scholar] [CrossRef] [Green Version]
Isufi, E.; Pocchiari, M.; Hanjalic, A. Accuracy-diversity trade-off in recommender systems via graph convolutions. Inf. Process. Manag. 2021, 58, 102459. [Google Scholar] [CrossRef]
Vargas, S.; Castells, P. Rank and relevance in novelty and diversity metrics for recommender systems. In Proceedings of the Fifth ACM Conference on Recommender Systems, Chicago, IL, USA, 23–27 October 2011; pp. 109–116. [Google Scholar]

Figure 1. In an intermediate step, the current local community

C

consists of vertices 1 through 7, in which vertices 1 through 5 (red color) are boundary

B

. Vertices 6 and 7 (gray color) are the interior of

C

. The neighbor set

U

includes vertex 8 through 14 (blue color). The remaining vertices in the network are in white.

Figure 1. In an intermediate step, the current local community

C

consists of vertices 1 through 7, in which vertices 1 through 5 (red color) are boundary

B

. Vertices 6 and 7 (gray color) are the interior of

C

. The neighbor set

U

includes vertex 8 through 14 (blue color). The remaining vertices in the network are in white.

Figure 2. An illustration of the search algorithm. It shows the first three steps in the search for the local community of vertex 1.

Figure 3. A synthetic network generated from the stochastic block model. Red and blue colors represent two underlying blocks, i.e., communities.

Figure 4. Plot of inclusion probabilities of vertex 1 in the simulated network in Figure 3. Three tolerance parameters (

ϵ = 0, 0.02

, and

0.04

) are used. As

ϵ

increases, more of the network is probed.

Figure 4. Plot of inclusion probabilities of vertex 1 in the simulated network in Figure 3. Three tolerance parameters (

ϵ = 0, 0.02

, and

0.04

) are used. As

ϵ

increases, more of the network is probed.

Figure 5. Zachary’s karate club data set and its community structure detected by Donetti and Munoz [19]. Each color represents one community. After the splitting of the network, the square vertices are in the subgroup of vertex 1 (instructor), while the round vertices are with vertex 34 (administrator).

Figure 6. Inclusion probabilities in the local communities starting from five different seeds, which are represented by red dots. (a): vertex 1 (instructor). (b): vertex 34 (administrator). (c): vertex 5. (d): vertex 25. (e) vertex 9. The error bars depict twice the standard error.

Figure 7. Inclusion probabilities in the local communities starting from five different seeds. Darker color indicates a higher inclusion probability. (a): vertex 1 (instructor); (b): vertex 34 (administrator); (c): vertex 8; (d): vertex 25; (e): vertex 9. Seed vertices are in red.

Figure 8. Community structure of the dolphin social network. (a) Two communities detected by [22]. (b) Inclusion probabilities of dolphin Grin’s local community. Colors correspond to the values in Figure 9.

Figure 9. Inclusion probabilities in the local community of dolphin Grin. Colors represent three clusters of magnitude.

Table 1. The average inclusion probabilities of vertices in the same block as the seed for different network sizes. N is the number of vertices in the same block as the seed. The total number of vertices in the network is thus

2 N

.

Table 1. The average inclusion probabilities of vertices in the same block as the seed for different network sizes. N is the number of vertices in the same block as the seed. The total number of vertices in the network is thus

2 N

.

N	20	50	100	250	500
${\bar{P}}_{v_{0}}$	0.913	0.934	0.918	0.925	0.932

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Papei, H.; Li, Y. Stochastic Local Community Detection in Networks. Algorithms 2023, 16, 22. https://doi.org/10.3390/a16010022

AMA Style

Papei H, Li Y. Stochastic Local Community Detection in Networks. Algorithms. 2023; 16(1):22. https://doi.org/10.3390/a16010022

Chicago/Turabian Style

Papei, Hadi, and Yang Li. 2023. "Stochastic Local Community Detection in Networks" Algorithms 16, no. 1: 22. https://doi.org/10.3390/a16010022

APA Style

Papei, H., & Li, Y. (2023). Stochastic Local Community Detection in Networks. Algorithms, 16(1), 22. https://doi.org/10.3390/a16010022

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Stochastic Local Community Detection in Networks

Abstract

1. Introduction

2. Materials and Methods

2.1. A Stochastic Algorithm

2.2. Tolerance Parameter and Stopping Criteria

2.3. Evaluation of Inclusion Significance

2.4. Computational Complexity

3. Results

3.1. A Simulation Study

3.2. Zachary’s Karate Club Data

3.3. Lusseau’s Network of Bottlenose Dolphins

4. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI