#
Significant Geo-Social Group Discovery over Location-Based Social Network^{ †}

^{1}

^{2}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

**Event recommendation.**Online location-based services, such as Meetup (https://www.meetup.com/ accessed on 4 June 2021), Eventbrite (https://www.eventbrite.com/ accessed on 4 June 2021) and Meetin (https://www.meetin.org/ accessed on 4 June 2021), allow social network users to meet each other physically for entertainment or work purposes (e.g., business forum, dinner, and dating). Suppose that Meetup wishes to recommend different events to users. We can first detect potential groups of users that are spatially and socially close and then recommend events in their vicinity. Intuitively, people who have relatively tight social relations are more likely to participate in a nearby event as a group.**Geo-social data analysis.**A common data analysis task is to study features about geographic regions. As discussed in [12], these features are often related to the people located there and their interactions. Another important task is to classify users based on their social connections, tags, geographic locations, and time stamps. GSGD can be employed to provide concrete geo-social context, e.g., by detecting socially dense communities in a geographic area.

- First, the GSG detection problem intends to discover a user group within a circle with a radius not larger than $\gamma $. How to extract the smallest enclosing circle of a group efficiently is the first challenge.
- Second, it is unclear how to effectively quantify the significance of a set of GSGs, considering that (1) each GSG has a set of vertices as well as an MCC with radius and center, and (2) GSGs may significantly overlap with each other.
- Third, it is also challenging to efficiently compute the set of top-k GSGs, considering the large size of real networks as well as the wide variety of radius of MCC.

- We define a geo-social group model, named GSG, and an effective distance measure among users including both social and physical aspects.
- We formulate the problem of a significant GSG discovery in large geo-social networks and propose effective techniques to generate the candidate set.
- We propose effective pruning techniques to efficiently enumerate the set of all GSG in a network.
- We extend the GSG detection to the significant top-k geo-social group (TkGSG) discovery problem. Instead of extracting all qualified GSGs in a network, TkGSG discovers k groups to guarantee the diversity of detection processing while achieving an approximation ratio of $1-1/e$.

## 2. Related Work

#### 2.1. Cohesive Subgraph Extraction

#### 2.2. Top-k Densest Subgraphs Search

## 3. Problem Definition

#### 3.1. Geo-Social Group (GSG)

**Structure cohesiveness.**As discussed in the related works in Section 2, several structure cohesiveness measures have been proposed. Thereinto, structural graph clustering [30] can effectively discover hidden structures in a graph. Therefore, we continue to use it in our community structure.

**Example**

**1.**

**Spatial cohesiveness.**In order to ensure that there is an accessible physical distance among members of the discovered community, a maximum minimum covering circle (MCC) is adopted in our community criterion that vertices in a structurally connected cluster are required within an MCC in which the radius is no greater than the given user-defined distance threshold $\gamma $. Note that the notion of MCC has been broadly adopted to achieve strong spatial closeness for a group of members [6,31,32] in a two-dimensional space. It is defined as follows.

**Definition**

**1**

**Minimum Covering Circle**[6])

**.**Given a set of vertices C, the Minimum Covering Circle of C is a spatial circle that can cover all of the vertices in C with the smallest diameter, denoted by $\mathit{MCC}\left(C\right)$.

**Definition**

**2.**

**Connectivity:**$G\left[C\right]$is connected;**Structure cohesiveness:**$G\left[C\right]$is a induced connected subgraph from structural graph clustering, w.r.t. ϵ and μ;**Spatial cohesiveness:**The MCC of vertices in$G\left[C\right]$has radius no larger than γ.

**Example**

**2.**

#### 3.2. Problem Statement

**Problem Statement.**Given a network $G=(V,E)$ and three parameters $0<\u03f5\le 1$, $\mu \ge 2$ and $\gamma >0$, the problem of GSG detection is to disclose a set of GSG in G.

## 4. GSG Detection Algorithm

#### 4.1. Three-Step Paradigm

**Lemma**

**1**

**Lemma**

**2**

**Theorem**

**1.**

**Proof of Theorem**

**1.**

**Lemma**

**3**

#### 4.2. Our GSGD Approach

**Computing MCC.**The pseudocode for computing the MCC for each cluster derived from Algorithm 1 is given in Algorithms 2 and 3. A naive algorithm (see Algorithm 2) with three nested for-loops take polynomial time $\Theta \left({n}^{4}\right)$. A more efficient randomized incremental method [34] (see Algorithm 3) is adopted in our paper and runs at an expected linear time $\Theta \left(n\right)$.

**Definition**

**3.**

**Definition**

**4.**

Algorithm 1: GSGD |

**Example**

**3.**

Algorithm 2: MCCNaive(C) |

#### 4.3. Optimization Techniques

**(1) Efficiently Compute Structural Similarity between Two Vertices.**GSGD (i.e., Algorithm 1) computes $\sigma (u,v)$ when exploring u and $\sigma (v,u)$ when exploring v. These two structurally similar calculations significantly increase computational burden. Therefore, we adopt the cross link technique, which links edge $(u,v)$ with edge $(v,u)$; then, the structural similarity between u and v only needs to be calculated once. Overall, the size of structural similarity calculations in a graph is expected to be reduced by half. Concerning time complexity, we assume that the adjacent list of each vertex u is ordered by vertex IDs; then, we can utilize a binary search (with time complexity of $\mathcal{O}(logd[v\left]\right)$) on $N\left[v\right]$ to search edge $(v,u)$ when $(u,v)$ is processed. In total, the time complexity of this cross link is $\mathcal{O}({\sum}_{(u,v)\in E}log\{min\{d\left[u\right],d\left[v\right]\}\})$.

**(2) Spatial-Structural Neighborhood Pruning Rules.**Due to spatial cohesiveness constraints, we consider the physical distance between vertices when computing clustering in Phase-I. Thus, rather than focusing on pure structural neighborhood, we redefine the neighborhood of a vertex u in G by considering the physical distance between u and $N\left[u\right]$, as follows.

**Definition**

**5.**

**spatial-structural neighborhood**of a vertex u, denoted by ${N}_{\gamma}\left[u\right]$, is defined as a set of vertices in the structural neighborhood of u, and the distance between them and vertex u is not greater than γ; that is, ${N}_{\gamma}\left[u\right]=\{v\in V\mid (u,v)\in E\wedge dist(u,v)\le \gamma \}\cup \left\{u\right\}$, where $dist(u,v)$ is the spatial distance between u and v.

Algorithm 3: MCCRandom(C) |

## 5. Top-k Densest GSGs

**Density of a GSG.**Intuitively, the larger the size $\left|C\right|$ of the GSG and the smaller the radius r of MCC, the more important the GSG. Thus, we define a density of a GSG as $\rho \left(C\right)=\frac{\left|C\right|}{r}$. For example, consider Example 3. The GSG ${C}_{1}=\{{v}_{1},...,{v}_{5}\}$ has five vertices, and the radius is 20; thus, $\rho \left({C}_{1}\right)=\frac{5}{20}=0.4$. Formally, we define the density as follows.

**Definition**

**6.**

**Definition**

**7.**

**Diversified Density of a Set of GSGs.**Given a set $\mathbb{C}$ of GSGs, a naive approach to quantifying the score of $\mathbb{C}$ is to sum up $\rho \left(C\right)$ for each GSG C in $\mathbb{C}$. However, this may significantly overestimate the score of $\mathbb{C}$ in view of the overlaps that may exist among the GSG. Based on the observations discussed in [30], the resulting clusters may overlap.

**Definition**

**8.**

#### 5.1. Problem Statement

**Problem Statement.**Given a geo-social network $G=(V,E)$ and parameters $0<\u03f5\le 1$, $\mu \ge 2$, and k, the problem of the top-k densest GSG selection is to uncover a set of k GSGs ${\mathbb{C}}_{k}^{*}=\{{C}_{1}^{*},{C}_{2}^{*},...,{C}_{k}^{*}\}$ for which the diversified union density is the largest among all sets of k GSGs in G.

**Hardness Analysis.**Assuming that the set of all GSGs in G has been generated and stored in ${\mathbb{C}}^{A}$, it it still a hard problem to extract the set of top-k GSGs from ${\mathbb{C}}^{A}$. Intuitively, given ${\mathbb{C}}^{A}$, our problem becomes computing the set of k GSGs from ${\mathbb{C}}^{A}$ that covers the most vertices with the smallest MCC radius, which is an instance of the NP-hard k-set-coverage problem [35]. As a result, to exactly compute the set of top-k GSGs from ${\mathbb{C}}^{A}$, we may need to enumerate all sets of k GSGs of ${\mathbb{C}}^{A}$, to compute their density, and to finally return the set of k GSGs with the maximum union density. Moreover, it is worth noting that ${\mathbb{C}}^{A}$ can be of exponential size with respect to the size of G in the worst case.

#### 5.2. A Greedy Approach

**Phase-I: Generate All GSGs ${\mathbb{C}}^{A}$.**As we have demonstrated how to enumerate a qualified MCC under the condition that the radius of GSG’s MCC should be less than a threshold in Section 4, we can release the above radius condition and regard each cluster as a valid GSG with an MCC which covers all vertices in that GSG.

**Phase-II: Greedily Select Top-k**

**GSGs.**As discussed in Section 5, given ${\mathbb{C}}^{A}$, it is still a hard problem to select the top-k GSGs from ${\mathbb{C}}^{A}$. The good news is that our density of a set of GSGs (see Definition 8) is monotone and submodular, as proven in the lemma below.

**Lemma**

**4.**

**Monotone**). Moreover, for any GSG $C\notin {\mathbb{C}}_{2}$, we have $\rho ({\mathbb{C}}_{1}\cup \left\{C\right\})-\rho \left({\mathbb{C}}_{1}\right)\ge \rho ({\mathbb{C}}_{2}\cup \left\{C\right\})-\rho \left({\mathbb{C}}_{2}\right)$ (

**Submodular**).

**Proof of Lemma**

**4.**

Algorithm 4: Greedy |

**Theorem**

**2.**

#### 5.3. A Swap-Based Approach

Algorithm 5: Swap |

#### 5.4. Optimization Techniques

**Efficiently Select Top-k**

**GSGs.**In Algorithm 4, given a subset $\mathbb{C}$ of ${\mathbb{C}}^{A}$, we need to select the next GSG C such that $\rho (\mathbb{C}\cup \{C\left\}\right)$ is maximized. One naive approach is computing $\rho (\mathbb{C}\cup \left\{{C}^{\prime}\right\})$ for every GSG ${C}^{\prime}\in {\mathbb{C}}^{A}\setminus \mathbb{C}$, and then selecting the best one. However, this is time-consuming if $|{\mathbb{C}}^{A}|$ becomes large. Thus, we develop an upper bound-based pruning as follows.

## 6. Experiments

#### 6.1. Setup

- Naive: the geo-social group detection algorithm with naive approach for MCC in Section 4.
- Random: the geo-social group detection algorithm with randomized incremental construction for MCC in Section 4.
- DGCD: the state-of-the-art density-based geo-community detection algorithm in [8].

- Greedy: our greedy approach for top-k densest geo-social group mining in Section 5.
- Swap: our swap approach for top-k densest geo-social group mining in Section 5.

**Datasets.**We evaluate the performance of all algorithms on two real networks and three synthetic networks as summarized in Table 2, among which Brightkite and Gowalla have real location labels. Note that we only consider the first check-in position as the user’s geographic location. We also evaluate the proposed methods on LFR benchmark networks [37] by increasing the number of vertices from ${10}^{4}$ to ${10}^{6}$, and the default average clustering coefficient is set to $0.2$; this value is derived from the Brightkite and Gowalla datasets. Furthermore, akin to [30], we fix the average and maximum degree of the whole network at 20 and 50, respectively. For each synthetic graph, we generate node position in a similar manner to [6,8], as follows. First, we randomly pick a node v and assign it a random position in the space $[0,1000]\times [0,1000]$. Then, following the normal distribution with mean 300 and standard deviation 600, we put v’s neighbors at random positions. Starting from v’s neighbors, we repeat the above two steps until every node in the graph has a location.

**Parameters.**For each experimental network, we vary the three parameters in Section 6.2 and Section 6.3: $0<\u03f5\le 1$, $\mu \ge 2$, and $\gamma >0$. Concerning $\u03f5$, we select 0.4, 0.5, 0.6, and 0.7, with $\u03f5=0.6$ as the default. For $\mu $, we choose 5, 10, 15, and 20, with $\mu =10$ as the default. For $\gamma $, we choose 25, 50, 75, and 100, with $\gamma =50$ as the default.

**Metrics.**We evaluate the algorithms from two aspects: effectiveness and efficiency. Regarding effectiveness, we evaluate the total number of GSGs and our three-step paradigm. Regarding efficiency, we evaluate the total processing time by running an algorithm three times and by reporting the average CPU time.

#### 6.2. Performance of GSG Detection

**Eval-I: Total Number of GSGs.**As both of our algorithms GSGD with NaiveMCC (Naive for short), GSGD with RandomMCC (Random for short) need to generate and keep all candidate GSGs and then calculate MCC for each candidate,

**Eval-I**first evaluates the total number of GSGs in a network. Given $\mu =5$ and $\gamma =50$, as both Naive and Random generate the same ${\mathbb{C}}^{A}$, we just need to run one of them. The result under different $\u03f5$ values are shown in Table 3. It is easy to understand that the number of GSGs decreases when $\u03f5$ becomes larger. We can see that, the larger $\u03f5$, the less possible communities. Nevertheless, it is still manageable even for $\u03f5=0.4$. Thus, our strategy of storing all candidate GSGs works in practice.

**Eval-II: Evaluating Our Three-step Paradigm.**We evaluate the efficiency of designed three-step paradigm by comparing the time of clustering and MCC computation. Recall from Section 4.1 and Section 4.2 that our proposed paradigm consists of three steps: (Step-I) clustering core nodes, (Step-II) clustering non-core nodes, and (Step-III) computing MCC and discarding invalid subgraphs. We pack Step-I and Step-II together as CLUSTER and Step-III individual as MCCNaive and MCCRandom, respectively. The processing times of CLUSTER, MCCNaive, and MCCRandom are presented in Figure 6 by varying $\gamma $. The processing time of CLUSTER remains almost the same when $\gamma $ increases from 25 to 100 because Step-I, which is irrelevant to $\gamma $, is the dominating cost of GSGs detection. On the other hand, the run time of MCCNaive and MCCRandom increases slightly when $\gamma $ increases; this is due to the search range of Step-II, which increases with $\gamma $. Nevertheless, MCCRandom consistently performs better than MCCNaive; this is due to the linear time complexity.

**Eval-III: Vary $\u03f5$.**The run time of GSGD with NaiveMCC (Naive for short), GSGD with RandomMCC (Random for short), and DGCD by varying $\u03f5$ is illustrated in Figure 7. The processing time of Random is kept steady for different $\u03f5$ because of its linear time complexity. When $\u03f5$ increases, it is more likely that two adjacent vertices are not structurally similar to each other and, thus, can be prepruned. Note that, for a larger $\u03f5$, DGCD runs on less time. The reason for this is that a vertex is less likely to be a core vertex for a larger $\u03f5$. In summary, GSGD is significantly faster than DGCD by more than two orders of magnitude for all $\u03f5$.

**Eval-IV: Vary**$\mu $. Figure 8 presents the performances of GSGD (with NaiveMCC and RandomMCC) and DGCD by varying $\mu $. Basically, the processing times of both GSGD and DGCD hold steady for different $\mu $ values. GSGD takes slightly less time when $\mu $ increases. That is because, as $\mu $ increase, more vertices can be pruned to be core vertices (see Section 4.1) in our paradigm; thus, less structurally similar computations are found between core vertices along with less time cost. Furthermore, GSGD consistently outperforms DGCD regarding parameter $\mu $.

**Eval-V: Vary**$\gamma $. The experimental results of DGCD and our approaches by varying parameter $\gamma $ are shown in Figure 9. The processing time of DGCD is kept steady because the structural similarity computations are not likely to be reduced under different physical distance thresholds $\gamma $. Regarding GSGD, as the value of $\gamma $ changes, the run time is volatile. The reason for this is that, when $\gamma $ increases, it is more likely that two vertices can be pruned by threshold $\gamma $; however, as the candidate communities become larger, computing MCC will take more time.

#### 6.3. Performance of Top-k Densest GSG Mining

**Eval-VI: Total Number of Revisited GSGs.**As our algorithm Greedy needs to enumerate and keep all revisited GSGs ${\mathbb{C}}^{A}$, we first evaluate the total number of revisited GSGs (i.e., $|{\mathbb{C}}^{A}|$) to verify the feasibility of our strategy. Given $\mu =5$, the results for different $\u03f5$ values are illustrated in Table 4. We can conclude that the size becomes larger when $\u03f5$ increases. Nevertheless, it is still manageable even for $\u03f5=0.4$. Thus, our strategy of storing all revisited GSGs works in practice.

**Eval-VII: Vary k.**Figure 10 presents the processing time of Greedy, Swap, and TopK (we also implement a naive algorithm TopK, which directly chooses from ${\mathbb{C}}^{A}$ the k GSGs with the largest individual densities) when varying k. Recall from Section 5 that all three of our algorithms consist of two phases: (Phase-I) generating all revisited GSGs ${\mathbb{C}}^{A}$ and (Phase-II) selecting diversified top-k GSGs from ${\mathbb{C}}^{A}$. We denote the second phase of Greedy, Swap, and TopK as Greedy, Swap, and TopK. The processing time of Greedy and TopK remain almost the same when k increases from 10 to 50 because Phase-I, which is irrelevant to k, is the dominating cost of Greedy and TopK. On the other hand, the run time of Swap increases significantly when k increases; this is because the time of Phase-II (streaming-like selection) in Algorithm 5, which increases with k, also becomes significant due to the improved Phase-I (see Figure 11). Nevertheless, as an approximation ratio of $1-1/e$ proven in Section 5, the performance of Swap is acceptable; note that Swap uses all of the optimization techniques of Greedy. Moreover, Greedy also runs faster than Swap when k becomes larger; this is due to the overhead of checking the condition of the swap for each newly generated GSG.

**Eval-VIII: Vary**$\u03f5$. The processing time of Greedy, Swap, and TopK by varying $\u03f5$ is illustrated in Figure 12. In general, all three algorithms run faster for a larger $\u03f5$; that is because the total number of revisited GSGs becomes smaller due to a high cohesive threshold $\u03f5$, as shown in Table 4. Swap performs a little bit worse when $\u03f5$ is small, and Greedy and TopK are similar; however, the latter does not consider the overlap issue.

**Eval-IX: Scalability Testing.**In this experiment, we try to evaluate the scalability of our approaches on Syn3. For the graph, we randomly generate induced subgraphs with $20\%$, $40\%$, $60\%$, $80\%$, and $100\%$ of the vertices of the original one. Given $\u03f5=0.4$ and $\mu =5$, the results are shown in Figure 13, where the x-axis shows the number of vertices in the subgraph. Generally, the run time of all algorithms increases along with the increasing number of vertices $\left|V\right|$ due to the increase in the graph to be processed. Nevertheless, Greedy consistently outperforms Swap, and the improvement is up to two orders of magnitude. Thus, Greedy scales to large graphs as a result of our optimization techniques.

## 7. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Fortunato, S. Community detection in graphs. Phys. Rep.
**2010**, 486, 75–174. [Google Scholar] [CrossRef] [Green Version] - Huang, X.; Lakshmanan, L.V.; Xu, J. Community search over big graphs: Models, algorithms, and opportunities. In Proceedings of the 2017 IEEE 33rd International Conference on Data Engineering (ICDE), San Diego, CA, USA, 19–22 April 2017; IEEE: New York, NY, USA, 2017; pp. 1451–1454. [Google Scholar]
- Hlaoui, A.; Wang, S. A direct approach to graph clustering. Neural Netw. Comput. Intell.
**2004**, 4, 158–163. [Google Scholar] - Rattigan, M.J.; Maier, M.; Jensen, D. Graph clustering with network structure indices. In Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, USA, 20–24 June 2007; ACM: New York, NY, USA, 2007; pp. 783–790. [Google Scholar]
- Ogawa, K.; Verbree, E.; Zlatanova, S.; Kohtake, N.; Ohkami, Y. Toward seamless indoor-outdoor applications: Developing stakeholder-oriented location-based services. Geo-Spat. Inf. Sci.
**2011**, 14, 109–118. [Google Scholar] [CrossRef] - Fang, Y.; Cheng, R.; Li, X.; Luo, S.; Hu, J. Effective community search over large spatial graphs. Proc. VLDB Endow.
**2017**, 10, 709–720. [Google Scholar] [CrossRef] [Green Version] - Shi, J.; Mamoulis, N.; Wu, D.; Cheung, D.W. Density-based place clustering in geo-social networks. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, Snowbird, UT, USA, 22–27 June 2014; ACM: New York, NY, USA, 2014; pp. 99–110. [Google Scholar]
- Yao, K.; Papadias, D.; Bakiras, S. Density-based Community Detection in Geo-Social Networks. In Proceedings of the 16th International Symposium on Spatial and Temporal Databases, Vienna, Austria, 19–21 August 2019; ACM: New York, NY, USA, 2019; pp. 110–119. [Google Scholar]
- Colace, F.; De Santo, M.; Lombardi, M.; Mosca, R.; Santaniello, D. A Multilayer Approach for Recommending Contextual Learning Paths. J. Internet Serv. Inf. Secur.
**2020**, 10, 91–102. [Google Scholar] - Newman, M.E.; Girvan, M. Finding and evaluating community structure in networks. Phys. Rev. E
**2004**, 69, 026113. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Barthélemy, M. Spatial Networks; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
- Chon, Y.; Lane, N.D.; Li, F.; Cha, H.; Zhao, F. Automatically characterizing places with opportunistic crowdsensing using smartphones. In Proceedings of the 2012 ACM Conference on Ubiquitous Computing, Pittsburgh, PA, USA, 5–8 September 2012; ACM: New York, NY, USA, 2012; pp. 481–490. [Google Scholar]
- Li, W.; Zlatanova, S. Effective Geo-Social Group Detection in Location-Based Social Networks. In Proceedings of the 2019 IEEE International Symposium on Multimedia (ISM), San Diego, CA, USA, 9–11 December 2019; IEEE: New York, NY, USA, 2019; pp. 247–2477. [Google Scholar]
- Lim, S.; Ryu, S.; Kwon, S.; Jung, K.; Lee, J.G. LinkSCAN*: Overlapping community detection using the link-space transformation. In Proceedings of the 2014 IEEE 30th International Conference on Data Engineering, Chicago, IL, USA, 31 March–4 April 2014; IEEE: New York, NY, USA, 2014; pp. 292–303. [Google Scholar]
- Cheng, J.; Ke, Y.; Chu, S.; Özsu, M.T. Efficient core decomposition in massive networks. In Proceedings of the 2011 IEEE 27th International Conference on Data Engineering, Hannover, Germany, 11–16 April 2011; IEEE: New York, NY, USA, 2011; pp. 51–62. [Google Scholar]
- Cui, W.; Xiao, Y.; Wang, H.; Wang, W. Local search of communities in large graphs. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, Snowbird, UT, USA, 22–27 June 2014; ACM: New York, NY, USA, 2014; pp. 991–1002. [Google Scholar]
- Charikar, M. Greedy approximation algorithms for finding dense components in a graph. In International Workshop on Approximation Algorithms for Combinatorial Optimization; Springer: Berlin/Heidelberg, Germany, 2000; pp. 84–95. [Google Scholar]
- Cohen, J. Trusses: Cohesive subgraphs for social network analysis. Natl. Secur. Agency Tech. Rep.
**2008**, 16, 3–29. [Google Scholar] - Huang, X.; Cheng, H.; Qin, L.; Tian, W.; Yu, J.X. Querying k-truss community in large and dynamic graphs. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, Snowbird, UT, USA, 22–27 June 2014; ACM: New York, NY, USA, 2014; pp. 1311–1322. [Google Scholar]
- Chang, L.; Yu, J.X.; Qin, L.; Lin, X.; Liu, C.; Liang, W. Efficiently computing k-edge connected components via graph decomposition. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, 22–27 June 2013; ACM: New York, NY, USA, 2013; pp. 205–216. [Google Scholar]
- Chen, Y.; Xu, J.; Xu, M. Finding community structure in spatially constrained complex networks. Int. J. Geogr. Inf. Sci.
**2015**, 29, 889–911. [Google Scholar] [CrossRef] - Expert, P.; Evans, T.S.; Blondel, V.D.; Lambiotte, R. Uncovering space-independent communities in spatial networks. Proc. Natl. Acad. Sci. USA
**2011**, 108, 7663–7668. [Google Scholar] [CrossRef] [Green Version] - Xu, X.; Yuruk, N.; Feng, Z.; Schweiger, T.A. Scan: A structural clustering algorithm for networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, 12–15 August 2007; ACM: New York, NY, USA, 2007; pp. 824–833. [Google Scholar]
- Shiokawa, H.; Fujiwara, Y.; Onizuka, M. SCAN++: Efficient algorithm for finding clusters, hubs and outliers on large-scale graphs. Proc. VLDB Endow.
**2015**, 8, 1178–1189. [Google Scholar] [CrossRef] - Galbrun, E.; Gionis, A.; Tatti, N. Top-k overlapping densest subgraphs. Data Min. Knowl. Discov.
**2016**, 30, 1134–1165. [Google Scholar] [CrossRef] [Green Version] - Yuan, L.; Qin, L.; Lin, X.; Chang, L.; Zhang, W. Diversified top-k clique search. VLDB J.
**2016**, 25, 171–196. [Google Scholar] [CrossRef] - Yang, Z.; Fu, A.W.C.; Liu, R. Diversified top-k subgraph querying in a large graph. In Proceedings of the 2016 International Conference on Management of Data, San Francisco, CA, USA, 26 June–1 July 2016. [Google Scholar]
- Yang, Y.; Yan, D.; Wu, H.; Cheng, J.; Zhou, S.; Lui, J. Diversified Temporal Subgraph Pattern Mining. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
- Gibbons, A. Algorithmic Graph Theory; Cambridge University Press: Cambridge, UK, 1985. [Google Scholar]
- Chang, L.; Li, W.; Qin, L.; Zhang, W.; Yang, S. pSCAN: Fast and Exact Structural Graph Clustering. IEEE Trans. Knowl. Data Eng.
**2017**, 29, 387–401. [Google Scholar] [CrossRef] - Elzinga, D.J.; Hearn, D.W. The minimum covering sphere problem. Manag. Sci.
**1972**, 19, 96–104. [Google Scholar] [CrossRef] - Elzinga, J.; Hearn, D.W. Geometrical solutions for some minimax location problems. Transp. Sci.
**1972**, 6, 379–394. [Google Scholar] [CrossRef] - Cormen, T.H.; Leiserson, C.E.; Rivest, R.L.; Stein, C. Introduction to Algorithms; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
- Welzl, E. Smallest enclosing disks (balls and ellipsoids). In New Results and New Trends in Computer Science; Springer: Berlin/Heidelberg, Germany, 1991; pp. 359–370. [Google Scholar]
- Ausiello, G.; Boria, N.; Giannakos, A.; Lucarelli, G.; Paschos, V.T. Online maximum k-coverage. Discret. Appl. Math.
**2012**, 160, 1901–1913. [Google Scholar] [CrossRef] [Green Version] - Nemhauser, G.L.; Wolsey, L.A.; Fisher, M.L. An analysis of approximations for maximizing submodular set functions—I. Math. Program.
**1978**, 14, 265–294. [Google Scholar] [CrossRef] - Lancichinetti, A.; Fortunato, S. Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys. Rev. E
**2009**, 80, 016118. [Google Scholar] [CrossRef] [PubMed] [Green Version]

**Figure 2.**Illustration of the comparison between our model definition and that of Yao et al. [8].

Notation | Definition |
---|---|

$G(V,E)$ | a graph with vertex set V and edge set E |

$n,m$ | the sizes of vertex and edge sets V and E resp. |

$G\left[C\right]$ | a subgraph of G induced by vertex set C |

$N\left[u\right]$ | the closed neighborhood [29] of vertex u |

$d\left[u\right]$ | the cardinality of $N\left[u\right]$ |

${G}^{\prime}\subseteq G$ | ${G}^{\prime}$ is a subgraph of G |

${N}_{\u03f5}\left[u\right]$ | the $\u03f5$-neighborhood of vertex u |

$\sigma (u,v)$ | the structural similarity between vertices u and v |

$\mathsf{MCC}\left(C\right)$ | the minimum covering circle of vertex set C |

$\mathsf{T}$k$\mathrm{GSG}$ | Top-k Geo-Social Group |

**Table 2.**Datasets used in our experiments (the last two columns are the average degree and the average clustering coefficient).

Type | Name | Vertices | Edges | $\widehat{\mathit{d}}$ | c |
---|---|---|---|---|---|

Real | Brightkite | 58,228 | 214,078 | 3.68 | 0.17 |

Gowalla | 196,591 | 950,327 | 9.67 | 0.24 | |

Synthetic | Syn1 | 10,000 | 97,750 | 19.55 | 0.2 |

Syn2 | 100,000 | 980,295 | 19.61 | 0.2 | |

Syn3 | 1,000,000 | 9,778,000 | 19.56 | 0.2 |

Dataset | Total Number of GSGs | |||
---|---|---|---|---|

$\mathit{\u03f5}=0.4$ | $\mathit{\u03f5}=0.5$ | $\mathit{\u03f5}=0.6$ | $\mathit{\u03f5}=0.7$ | |

Brightkite | 467 | 261 | 105 | 49 |

Gowalla | 1617 | 1064 | 537 | 234 |

Syn1 | 149 | 98 | 27 | 8 |

Syn2 | 1509 | 826 | 230 | 61 |

Dataset | Total Number of Revisited GSGs | |||
---|---|---|---|---|

$\mathit{\u03f5}=0.4$ | $\mathit{\u03f5}=0.5$ | $\mathit{\u03f5}=0.6$ | $\mathit{\u03f5}=0.7$ | |

Brightkite | 636 | 330 | 133 | 60 |

Gowalla | 1856 | 1166 | 575 | 244 |

Syn1 | 312 | 154 | 42 | 11 |

Syn2 | 3302 | 1433 | 372 | 84 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Li, W.; Zlatanova, S.
Significant Geo-Social Group Discovery over Location-Based Social Network. *Sensors* **2021**, *21*, 4551.
https://doi.org/10.3390/s21134551

**AMA Style**

Li W, Zlatanova S.
Significant Geo-Social Group Discovery over Location-Based Social Network. *Sensors*. 2021; 21(13):4551.
https://doi.org/10.3390/s21134551

**Chicago/Turabian Style**

Li, Wei, and Sisi Zlatanova.
2021. "Significant Geo-Social Group Discovery over Location-Based Social Network" *Sensors* 21, no. 13: 4551.
https://doi.org/10.3390/s21134551