Query Processing of Geosocial Data in Location-Based Social Networks

: The increasing use of social media and the recent advances in geo-positioning technologies have produced a great amount of geosocial data, consisting of spatial, textual, and social information, to be managed and queried. In this paper, we focus on the issue of query processing by providing a systematic literature review of geosocial data representations, query processing methods, and evaluation approaches published over the last two decades (2000–2020). The result of our analysis shows the categories of geosocial queries proposed by the surveyed studies, the query primitives and the kind of access method used to retrieve the result of the queries, the common evaluation metrics and datasets used to evaluate the performance of the query processing methods, and the main open challenges that should be faced in the near future. Due to the ongoing interest in this research topic, the results of this survey are valuable to many researchers and practitioners by gaining an in-depth understanding of the geosocial querying process and its applications and possible future perspectives.


Introduction
The increasing use of social media, which has reached over 3.8 billion people in 2020 worldwide [1], along with the recent advances in geo-positioning technologies, has produced a great amount of geosocial data, consisting of spatial, textual, and social information, to be managed and queried. Geosocial networks, also known as location-based social networks, have gained a relevant interest in the last decade, both from the users and the scientific community. Two examples of the most popular geosocial networks are Foursquare (www.foursquare.com, accessed on 22 December 2021) and Flickr (www.flickr.com, accessed on 22 December 2021), which couple social network functionalities with geographical information. To show this interest in numbers, we searched for "geosocial networking" OR "geosocial networks" OR "location-based social networks" in the title of the scientific articles indexed in the search engine Web of Science (WoS), in order to also investigate the scientific interest of the topic. The results in Figure 1 show a growing trend that reached its peak in 2018 by demonstrating that the scientific community has been interested in the topic of geosocial networking in the period 2010-2020 (no results were returned from 2000 to 2009).
Specifically, the scientific interest of the researchers in geosocial networking was mainly addressed to the following research topics, as analysed by Armenatzoglou and Papadias [2]: social and spatial data management, query processing, link prediction, recommendations, metrics and properties, and privacy issues. To show the interest for each research topic in numbers, we searched again the scientific articles indexed in WoS by restricting the previous search by adding a further keyword, corresponding to Armenatzoglou and Papadias' research topics, logically joined to the previous three keywords ("geosocial networking" OR "geosocial networks" OR "location-based social networks") using the AND operator. The details of each search are provided in Table 1. Table 1. Scientific articles published in WoS dealing with the geosocial networking topics surveyed by Armenatzoglou and Papadias [2]. The asterisk (*) in the query allows finding all words that start with the same letters (e.g. network* finds network, networks, networking, etc.).

Armenatzoglou and Papadias' Geosocial Networking Topics Search Keywords Number of Published Articles Retrieved from WoS
Social and spatial data management (("geosocial networking" OR "geosocial network*" OR "location-based social network*") AND "data management") 1 Query processing (("geosocial networking" OR "geosocial network*" OR "location-based social network*") AND "quer*") 11 Link prediction (("geosocial networking" OR "geosocial network*" OR "location-based social network*") AND "predict*") 7 Recommendations (("geosocial networking" OR "geosocial network*" OR "location-based social network*") AND "recommend*") 71 Metrics (("geosocial networking" OR "geosocial network*" OR "location-based social network*") AND "metric*") 2 Privacy (("geosocial networking" OR "geosocial network*" OR "location-based social network*") AND "privacy") 33 Therefore, the trend provided in Figure 1 and Table 1 shows us that geosocial networking is a popular topic that attracts the interest of the scientific community. The main addressed research issues within this topic are the recommendations of geosocial data that facilitate users to find relevant places and friends, the privacy of the users' sensitive geosocial data, and the query processing that allows extracting meaningful data from geosocial databases.
In this paper, we focus on the issue of query processing by providing a survey of the geosocial data representations, querying methods, applications, and evaluation methods, subsequently providing a systematic literature review of 57 scientific articles published over the two last decades  in major journals, conferences, and workshops and indexed by three major scientific search engines (WoS, Scopus, and Google Scholar).
Although several surveys have been proposed in the last few years, dealing with the various geosocial networking topics surveyed by Armenatzoglou and Papadias (recommendations [3], privacy issues [4,5], social and spatial data management [6]), to the best of our knowledge, none of these surveys focuses on the query processing topic. Due to the ongoing interest in this research topic, the results of this survey are valuable to many researchers and practitioners by gaining an in-depth understanding of the geosocial querying process and its applications and possible future perspectives.
Aiming to identify the trends and opportunities of the research about geosocial query processing, the main research objectives of this article can be detailed as follows: 1.
To study how query processing methods are applied to geosocial data by researchers and practitioners, categorising them according to the kinds of geosocial queries, the kind of method(s) used to retrieve the result of the query, the kind of access method, and the opportunity to provide an approximate solution; 2.
To summarise the metrics and datasets used to evaluate geosocial queries in locationbased social networks; 3.
To point out the primary research challenges in this field that emerged from analysing the literature.
The remainder of the paper is organised as follows. A brief overview of the existing definitions of LBSN or geosocial networks and an overview of the process of querying geosocial data is provided in Section 2. Section 3 introduces the research methodology adopted to conduct the literature search and the analyses performed. The results of the quantitative analysis are presented in Section 4. In Section 5, we discuss the study results according to the four review questions defined in the study. Finally, in Section 6, we provide some concluding remarks.

Definitions of LBSN or Geosocial Networks
There are several definitions for "geosocial network" or "location-based social network": the first formal definition was given by Quercia et al. [7] in 2010, who defined it as "a type of social networking in which geographic services and capabilities such as geocoding and geotagging are used to enable additional social dynamics". One year later, Zheng [8] refined this definition by stating that "a location-based social network (LBSN) does not only mean adding a location to an existing social network so that people in the social structure can share location embedded information but also consists of the new social structure made up of individuals connected by the interdependency derived from their locations in the physical world as well as their location-tagged media content, such as photos, video, and texts". In 2013, Roick and Heuser [9] defined LBSNs simply as "social network sites that include location information into shared contents". Finally, one most recent definition is given by Armenatzoglou and Papadias [10] and is the following: "geosocial network (GeoSN) is an online social network augmented by geographical information".
From the above definitions, it is evident that the peculiarity of LBSNs is the coupling of geographical information/services with social network sites that allow LBNS users to benefit from the communication and sharing functionalities provided by social networks, enhanced with geographic positions of users to locate contents, people, and activities in a physical space.
To model both the social and geographical relationships in it, a LBSN is often represented through a multilevel geosocial model, with a geosocial graph G(V, E); i.e., an undirected graph with vertex set V and edge set E. Each vertex v ∈ V represents a user and has one or more spatial locations (v.x i , v.y i ) with 1 ≤ i ≤ n in the two-dimensional space associated with the n locations visited by the corresponding user, and has one or more geo-located media content m j (v.x i , v.y i ) with 1 ≤ j ≤ p associated to the ith location visited by the corresponding user. Each edge e = (u, v) ∈ E denotes a relationship (e.g., friendship, common interest, shared knowledge, etc.) between two users v and u ∈ V. A graphical representation of a geosocial graph G(V, E) representing an LBSN is given in Figure 2. Three layers can be differentiated, as also suggested by Gao and Liu [11]. The first layer, named social layer, contains the users of the LBSN and the relationships among them. The second layer, named location or geographical layer, consists of the geographical information in the two-dimensional space associated with the locations visited by the users.
by the corresponding user. Each edge e = (u, v) ∈E denotes a relationship (e.g., friend common interest, shared knowledge, etc.) between two users v and u ∈ V. A grap representation of a geosocial graph G(V, E) representing an LBSN is given in Figu Three layers can be differentiated, as also suggested by Gao and Liu [11]. The first l named social layer, contains the users of the LBSN and the relationships among them second layer, named location or geographical layer, consists of the geographical informa in the two-dimensional space associated with the locations visited by the users. The layer, named media content layer, contains information about the media contents duced/shared by the users when visiting the locations.

The Process of Querying Geosocial Data
To process the geosocial queries, different kinds of query primitives are define the literature as fundamental operations that can be further combined to answer a range of general-purpose geosocial queries. As suggested in [12], these kinds of q primitives can be grouped in three categories according to the layer of the geosocial g that is exploited by the query primitive: social query primitives that exploit the data the social graph, spatial query primitives that exploit the data over the spatial graph activity query primitives that exploit the data over the media content graph. A brie scription of the query primitives used in geosocial query processing literature is prov in Table 2.

The Process of Querying Geosocial Data
To process the geosocial queries, different kinds of query primitives are defined in the literature as fundamental operations that can be further combined to answer a wide range of general-purpose geosocial queries. As suggested in [12], these kinds of query primitives can be grouped in three categories according to the layer of the geosocial graph that is exploited by the query primitive: social query primitives that exploit the data over the social graph, spatial query primitives that exploit the data over the spatial graph, and activity query primitives that exploit the data over the media content graph. A brief description of the query primitives used in geosocial query processing literature is provided in Table 2.
In addition to the query primitives, several basic heuristics or algorithms are applied to retrieve the geosocial data. Some examples found in the literature on geosocial querying are: • Best-first search algorithm: it allows to explore paths to search in the geosocial graphs by using an evaluation function to decide which among the various available nodes is the most promising to explore [13]; • Depth-first search algorithm: it allows to explore paths to search in the geosocial graphs by starting at a given node and exploring as far as possible along each branch before backtracking [14]; • Dijkstra search algorithm: it allows to find, for a given source node in the geosocial graph, the shortest path between that node and every other node [15]; • Branch and bound algorithm: it allows to explore branches of the geosocial graphs, which represent subsets of the solution set, by checking against upper and lower estimated bounds on the optimal solution and then enumerates only the candidate solutions of a branch that can produce a better solution [16]; • Measure and conquer algorithm: it allows to explore branches of the geosocial graphs, by using a (standard) measure of the size of the subsets of the solution set (e.g., number of vertices or edges of graphs, etc.) to lower bound the progress made by the algorithm at each branching step [17]. Table 2. Query primitives.

Filter
Removes some vertices or edges from the graph that do not satisfy a selection condition.
Partitioning Compute a partition of the vertex set into n parts of size c.
Scoring/Ranking Ranks the vertices based on a scoring function to predict the values associated with each vertex.

Sorting
Re-arrange the vertices on the graph according to one or more keys.

Join
Compute the join between two vertex sets if a condition defined on their features is satisfied.
Clustering Partition the vertex set into a certain number of clusters so that vertices in the same cluster should be similar to each other, Pruning Simplify a graph by reducing the number of edges while preserving the maximum path quality metric for any pair of vertices in the graph.
Several query indexing approaches have also been developed in the literature to optimise the processing of geosocial queries and quickly retrieve all of the data that a query requires. Existing indexing methods can be roughly categorised into three classes: the spatial-first, the social-first, and the hybrid indexing methods. The spatial-first indexing methods prioritise the spatial factor for the index construction and then improve it with the social factor. For example, MR-Tree [18], GIM-tree [19], TaR-tree [20], and SIL-Quadtree [21] employ a spatial index (e.g., R-tree, Quad-tree, G-tree) and integrate it with the textual and social information of objects. The social-first indexing methods prioritise social relationships among objects for the index construction and then improve it with the spatial information of objects. Representatives of these methods are the Social R-tree [22], B-tree [23], and 3D Friends Check-Ins R-tree [24], which index each user along with their social relationships and then integrate the spatial information. Finally, hybrid indices are developed to store both the spatial and social information of objects giving them the same priority. For example, NETR-tree [25], CD-tree [26], and SaR-tree [27,28] encode both social information and spatial information into two major pieces of information that are used to prune the search space during the query time.

Research Methodology
This section illustrates the methodology used to conduct an objective and replicable literature search to systematically analyse the published research knowledge and answer our research questions. To this end, we have chosen the scientific method called systematic literature review (SLR). Specifically, we have followed the SLR process described in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) recommendations [29]. The steps of the SLR process, as adapted to this study, can be summarised as follows: (1) identifying the review focus; (2) specifying the review question(s); (3) identifying studies to include in the review; (4) data extraction and study quality appraisal; (5) synthesising the findings; and (6) reporting the results.

Identifying the Review Focus
Considering the first step, the review focuses on analysing and systematising scientific knowledge related to geosocial query processing in location-based social networks. Specifically, we aim to study the query processing methods, the evaluation methodologies, and the open challenges envisaged by researchers and practitioners in their scientific works.

Specifying the Review Questions
This research objective is addressed by trying to define the following review questions (RQ), as required by Step 2 of the SLR protocol:

Identifying Studies to Include in the Review
Once we identified the review focus and review questions of the study, the next step of the SLR process is identifying studies to include in the review. This step includes the following four phases recommended by the PRISMA statement [29], as shown in the flow diagram of Figure 3: (1) identify records through database searching and other sources (identification phase); (2) screen and exclude records (screening phase); (3) assess full-text articles for eligibility (eligibility phase); and (4) include studies for qualitative analysis (included phase).
To identify the initial set of scientific papers, we defined the following search strings: ("location-based social network*" OR "geosocial network*" OR "geographic social network*" OR "LBSN*" OR "geosocial networking" OR "location-based social networking" OR "geosocial networking") AND "quer*".
These terms were chosen from the research questions to represent the scientific knowledge we want to search for. Moreover, we included the synonyms and related terms found in the scientific literature. For instance, "location-based social network" is also referred to as "geographic social network" or "geosocial network". Moreover, related terms to "location-based social network" are "location-based social networking" and "geosocial networking". Therefore, we included all these terms in the search strings.
The sources we used in our search for identifying the scientific works are twofold: (i) indexed scientific databases containing formally published literature (e.g., published journal papers, conference proceedings, books); and (ii) non-indexed databases containing grey literature (e.g., theses and dissertations, research and committee reports, government reports, preprints, etc.). We chose to include also grey literature in our systematic review because several studies highlighted the importance to consider it to avoid missing significant evidence [30,31]. Considering the first kind of source, Scopus and the Web of Science (WoS) core collection were identified as the most comprehensive of the published scientific research. The choice of using them was motivated by their multidisciplinarity that allows a wider domain coverage of the retrieved literature concerning more domain-oriented databases. Moreover, Scopus is among the largest databases containing over 76 million publication records, and WoS provides a greater depth of coverage containing published literature of over 15 years. Therefore, they complement each other. Attending to the second kind of database, Google Scholar was used in this review for retrieving the grey literature since several studies have proved its effectiveness in searches for grey literature in systematic reviews [32,33].  The sources we used in our search for identifying the scientific works are twofold: (i) indexed scientific databases containing formally published literature (e.g., published journal papers, conference proceedings, books); and (ii) non-indexed databases containing grey literature (e.g., theses and dissertations, research and committee reports, government reports, preprints, etc.). We chose to include also grey literature in our systematic review because several studies highlighted the importance to consider it to avoid missing significant evidence [30,31]. Considering the first kind of source, Scopus and the Web of Science (WoS) core collection were identified as the most comprehensive of the published scientific research. The choice of using them was motivated by their multidisciplinarity that allows a wider domain coverage of the retrieved literature concerning more domain-oriented databases. Moreover, Scopus is among the largest databases containing over 76 million publication records, and WoS provides a greater depth of coverage containing published literature of over 15 years. Therefore, they complement each other. Attending to the The search results during the screening phase were filtered according to the inclusion and exclusion criteria described in Table 3. Specifically, the duplication (e1) and understandability (e3) exclusion criteria and the temporal (i2) and relevance (i1) inclusion criteria based on the studies' titles were applied. The understandability criterion was formulated for the difficulties to examine the content of articles that are not written in English. • articles that are written not in English.

i1
Relevance criterion: • studies that are relevant to the review focus, i.e., they describe geosocial query processing in location-based social networks; • studies that are relevant to answer our research questions, i.e., they describe: (i) the query processing methods applied to geosocial data, or (ii) the evaluation process of geosocial query processing, or (iii) the open challenges in geosocial querying. i2 Temporal criterion: • articles published in the period 2000-2020.
In the eligibility phase, the availability (e2) exclusion criterion and the relevance inclusion criteria (i1) were applied based on the studies' abstract. The availability criterion was formulated for the impossibility to analyse the content of articles that are not accessible in full text. Applying these criteria allows identifying eligible publications to establish evidence on the different geosocial query processing methods and data representation schemes.

Data Extraction and Study Quality Appraisal
The full text of the eligible articles was then analysed by two reviewers that assessed them according to a quality evaluation checklist composed of four questions, as shown in Table 4. The possible answers (with their related scores) for each quality assessment question are defined, as shown in the second column of Table 4. In case of disagreement, the "disagreed" articles were examined by a moderator that evaluated them again and provided the final scores.
Studies that scored less than "2" were excluded from the qualitative analysis, while articles that scored "2" or more were included in the systematic review.
Finally, the full texts of the included articles were analysed, and the following information was extracted from them (if any): The last two phases of the SLR process, i.e., synthesising the findings and reporting the results, will be detailed in the following sections. Table 4. Quality assessment questions and scores formulated for the study.

QA1
Does the article describe a geosocial query processing method?
1-yes, the geosocial query processing method is fully described. 0.5-partially, the geosocial query processing method is only summarised without describing in detail some steps. 0-no, the geosocial query processing method is only cited, without describing it.

QA2
Does the article describe the geosocial data representation schema?
1-yes, the geosocial data representation schema is fully described. 0.5-partially, the geosocial data representation schema is only summarised without describing it in detail. 0-no, the geosocial data representation schema is not described.

QA3
Does the article provide an evaluation of the geosocial query processing method?
1-yes, the geosocial query processing method is evaluated. 0-no, the geosocial query processing method is not evaluated.

Results of the SLR and Quantitative Analysis
During the identification phase, described in Section 3.3 and depicted in Figure 3, a total of 4312 articles were returned using the three search engines (retrieved on March 2021): 4054 from Google Scholar, 172 from Scopus, and 86 from Web of Science, respectively.
As required by the duplication criterion, removing duplicate records resulted in 4075 papers. Excluding also the articles that are not written in English (understandability criterion), a total of 3943 articles was screened for the inclusion criteria. Applying the temporal criterion resulted in no articles being excluded because all retrieved papers were published in the period 2000-2020. The relevance criterion was applied by searching for the term "quer*" in the articles' titles, resulting in 208 articles at the end of the screening phase.
Removing the articles that are not accessible in full text (11 studies for the availability criterion) and the articles that are not relevant (130 studies for the relevance criterion) by applying the relevance criterion to the articles' abstract, a total of 67 articles were retained for a full evaluation of eligibility. Specifically, the articles that do not talk about geosocial queries in the abstract were excluded.
Two reviewers assessed these 67 studies according to the quality evaluation checklist shown in Table 4. Seven studies that scored less than "2" were excluded, while the remaining 57 studies were included in the qualitative synthesis and the information listed in Section 3.4 were extracted from their full texts. Table 5 provides an overview of the selected studies, where the reference, publication type, publication year, publisher, and citation count (from Google Scholar) for each study are provided.  The selected studies have been published mainly in journals (50.88%-29 studies), followed by conference proceedings (43.86%-25 studies), theses (3.51%-2 studies), and only 1 preprint (1.75%). Therefore, the majority of the studies (94.74%) are formally published studies (journal and conference papers), while only 5.26% are composed of grey literature (thesis and preprint).
The temporal distribution of the selected publications, shown in Figure 4, underscores the increasing interest of the scientific community in the topic of geosocial querying, which started growing in 2010 and continues to grow in 2020.

Findings and Discussion
This section analyses how the 57 selected studies answered our four review questions introduced in Section 3.2. Specifically, to deal with RQ1, we start by analysing and classifying the kinds of geosocial queries. With respect to RQ2, the query processing methods applied to the geosocial network data are extracted and classified. Addressing RQ3, the metrics and datasets used to evaluate the geosocial queries in LBSN are analysed. Finally,

Findings and Discussion
This section analyses how the 57 selected studies answered our four review questions introduced in Section 3.2. Specifically, to deal with RQ1, we start by analysing and classifying the kinds of geosocial queries. With respect to RQ2, the query processing methods applied to the geosocial network data are extracted and classified. Addressing RQ3, the metrics and datasets used to evaluate the geosocial queries in LBSN are analysed. Finally, as part of RQ4, the open challenges in geosocial querying proposed in these studies are analysed.

RQ 1: What Kinds of Geosocial Queries Are Proposed in the Literature?
To answer the first RQ, we look first at the kinds of queries proposed by the selected studies, and then at the constraints (social, spatial, temporal) considered.
Based on our analysis, we identified seven categories of geosocial queries (as presented in Figure 5) that consider both social and spatial relations: geosocial group queries, geosocial keyword queries, geosocial top-k queries, geosocial skyline queries, geosocial moving queries, geosocial fuzzy queries, and geosocial nearest neighbor queries. Moreover, among the selected studies, there were three frameworks providing a collection of query primitives essential for geosocial queries.
In the following paragraphs, we briefly discuss each category of the geosocial queries defined above. studies, and then at the constraints (social, spatial, temporal) considered.
Based on our analysis, we identified seven categories of geosocial queries (as presented in Figure 5) that consider both social and spatial relations: geosocial group queries, geosocial keyword queries, geosocial top-k queries, geosocial skyline queries, geosocial moving queries, geosocial fuzzy queries, and geosocial nearest neighbor queries. Moreover, among the selected studies, there were three frameworks providing a collection of query primitives essential for geosocial queries.  Table 4) belonging to each category.
In the following paragraphs, we briefly discuss each category of the geosocial queries defined above.

Geosocial Group Queries
The most numerous category of geosocial queries is the group query with 25 studies (43.85%), which allows finding a group of users close to each other both socially and geographically. Generally, the studies addressing this kind of query start from spatial queries (e.g., range, k nearest neighbour, spatial join) to find geographically close users and integrate them by considering grouping concepts to find also socially close users. That results in several kinds of queries (see Table 6) that we have grouped here in the class of geosocial group queries. An example of a geosocial group query, inspired by the work in [74], is depicted in Figure 6.  Table 4) belonging to each category.

Geosocial Group Queries
The most numerous category of geosocial queries is the group query with 25 studies (43.85%), which allows finding a group of users close to each other both socially and geographically. Generally, the studies addressing this kind of query start from spatial queries (e.g., range, k nearest neighbour, spatial join) to find geographically close users and integrate them by considering grouping concepts to find also socially close users. That results in several kinds of queries (see Table 6) that we have grouped here in the class of geosocial group queries. An example of a geosocial group query, inspired by the work in [74], is depicted in Figure 6.

S2
Range Friends (RF) returns the friends of a user within a given range Nearest Friends (NF) returns the nearest friends of a user to a given location Nearest Star Group (NSG) returns a user group, which (i) forms a star subgraph of the social network, and (ii) minimises the aggregate (Euclidean) distance of its members to a given location S3 S18 Minimum user spatial-aware interest group query (MUSIGQ) returns a group of users that have the common interests and stay in the near spots

S16
Reverse nearest neighborhood (RNH) discovers the neighborhoods that find a query facility as their nearest facility among other facilities in the dataset S17 S30 Spatial Group Preference (SGP) returns top-k POIs that are much likely to satisfy the group's preferences for POI categories S22 Geosocial group query retrieves k users that satisfy the minimum acquaintance constraint and has the minimum spatial distance to the query issuer S23 Geo-Social K-Cover Group (GSKCG) retrieves a minimum user group in which each user is socially related to at least k other users and the users' associated regions can jointly cover all the query points The main types of spatial constraints that have been applied in these studies are the following: • Distance: typical distance functions are Euclidean distance for items that are located in a small area; network distance, which is the length of the shortest path between the items on the road network of the search area; and Haversine formula, which is the distance between the items on the surface of a sphere [43]. • Range: the locations of the retrieved items (users/objects/PoIs) are within the query region.

•
Coverage: the coverage of a set of query points is the minimum rectangle containing all query points. • Travel cost, which is the expected cost of a direct travel from one item to the other.
More than half of the studies use distance (mainly Euclidean) to measure the spatial distance between two points in the space. Eight studies apply the travel cost constraint, only 3 works use the range constraint, and 2 studies the coverage (see Table 7). Table 7. Main types of spatial, social, and temporal constraints applied in geosocial group queries.

1.
Friendship: in a geosocial network, friendship relations correspond to the edges between two nodes representing users.

2.
Interest/preference score: considers the interest(s)/preference(s) of a user or a group of users in spatial objects annotated by one or more keywords and can be computed by its/their check-ins on these spatial objects.

3.
Closeness: it restricts the users in a social group considering the proximity of candidate attendees to corresponding locations in the physical world, and sometimes even the ratings of assembly points as additional references [38].

4.
Acquaintance: it imposes a minimum degree on the familiarity of group members (which may include q); i.e., every user in the group should be familiar with at least k other users [52]. It is a measure of group cohesiveness. The value of k can be defined according to a minimum social distance that should be less than or equal to an acceptable social boundary.
The majority of the studies (10 studies) apply the acquaintance constraint, while 9 works use the interests/preferences constraint, 5 studies apply the friendship constraint, and 3 studies the closeness (see Table 7). The acquaintance constraint allows avoiding finding a group with mutually unfamiliar members by retrieving a cohesive subgroup in the geosocial network.
Finally, only one study [39] proposing geosocial group queries incorporates temporal constraints, in addition to spatial and social ones, to retrieve a cohesive ridesharing group.

Geosocial Keyword Queries
Generally, the studies addressing this kind of query start from conventional spatial keyword queries to find objects that are spatially and textually relevant to the user-supplied keywords, and integrate them by considering also collective and social criteria to find these objects. The number of surveyed studies that belong to this class of geosocial query is 15 (26.31%) (see Table 8). An example of a geosocial keyword query, inspired by the work in [40], is depicted in Figure 7.
The type of spatial constraints that has been applied in these studies is twofold: (i) the distance, already defined in the previous sub-section on "Geosocial group queries"; and (ii) the cost, which is calculated according to two kinds of cost functions, the maximum sum cost and the diameter cost. The maximum sum cost is defined as the linear combination of the maximum distance between the query and a node in the POI set [40], while the diameter cost is defined as the maximum distance between any pair of nodes in the POI set [64]. Similarly to the geosocial group query, the majority of the studies (9 studies) use the distance to measure the spatial distance, while 6 studies use the cost. Table 8. Geosocial keyword queries. returns the top-k objects by taking geo-spatial score, keywords similarity, visiting time score, and social relationship into consideration S40 diversified top-k geosocial keyword (D k GSK) query returns the top-k objects based on their spatial and textual proximity to q as well as the check-in counts of u 's friends at such objects S44 Popularity-aware collective keyword (PAC-K) query finds a group of popular POIs that cover the query's keywords and satisfy the distance requirements from each node to the query node and between each pair of nodes, such that the sum of rating scores over these nodes for the query keywords is maximized S50 Social space Keyword Query returns the top-k semantic trajectory for users has higher social relevance and shorter distance while satisfying spatial and keyword constraints S56 why-not top-k geosocial keyword (WNGSK) query returns the top-k objects based on their spatial and textual proximity to the query location as well as the check-in counts of user's friends at such objects Generally, the studies addressing this kind of query start from conventional spatial keyword queries to find objects that are spatially and textually relevant to the user-supplied keywords, and integrate them by considering also collective and social criteria to find these objects. The number of surveyed studies that belong to this class of geosocial query is 15 (26.31%) (see Table 8). An example of a geosocial keyword query, inspired by the work in [40], is depicted in Figure 7. Figure 7. An example of a geosocial keyword query that considers a set of objects {u1, u2, …, u4} located in the places depicted by circles and associated with keywords shown in the table on the right. Query q requests a location (red circle) and a set of keywords. The query returns the set of objects {u2, u3} that minimizes the distance and contains the required keywords.

Name of the Query Description
The type of spatial constraints that has been applied in these studies is twofold: (i) the distance, already defined in the previous sub-section on "Geosocial group queries"; and (ii) the cost, which is calculated according to two kinds of cost functions, the maximum sum cost and the diameter cost. The maximum sum cost is defined as the linear combination of the maximum distance between the query and a node in the POI set [40], while the diameter cost is defined as the maximum distance between any pair of nodes in the POI set [64]. Similarly to the geosocial group query, the majority of the studies (9 studies) use the distance to measure the spatial distance, while 6 studies use the cost.
Considering the social constraints, besides the friendship relationships among the nodes of the network, further social constraints that have been applied in these studies are the following: • Relevance: it is obtained from the number of fans and the relationship between these fans and the query user, where a fan is a user who exhibits positive behavior towards an object (e.g., check-in, like, share, etc.) [23]; • Relationship effect: it can be measured by the similarity of embedding vectors between users and their neighbors with all users' check-in records [25].
The majority of these studies (4 studies) apply the relevance constraint, while 2 studies apply the friendship constraint, and only 1 work uses the relationship effect constraint (see Table 9).
In addition to these social constraints, several geosocial keyword queries (8 studies) apply a collective constraint, meaning that the group's keywords collectively cover the query keywords. Considering the social constraints, besides the friendship relationships among the nodes of the network, further social constraints that have been applied in these studies are the following: • Relevance: it is obtained from the number of fans and the relationship between these fans and the query user, where a fan is a user who exhibits positive behavior towards an object (e.g., check-in, like, share, etc.) [23]; • Relationship effect: it can be measured by the similarity of embedding vectors between users and their neighbors with all users' check-in records [25].
The majority of these studies (4 studies) apply the relevance constraint, while 2 studies apply the friendship constraint, and only 1 work uses the relationship effect constraint (see Table 9). Table 9. Main types of spatial, social, and collective constraints applied in geosocial keyword queries. In addition to these social constraints, several geosocial keyword queries (8 studies) apply a collective constraint, meaning that the group's keywords collectively cover the query keywords.

Geosocial Top-k Queries
The third most numerous class of geosocial queries is the geosocial top-k query with 11 studies (19.3%) (see Table 10). Generally, the studies addressing this kind of query rely on the conventional top-k queries that retrieve the top-k objects based on a user-defined scoring function, and enrich the top-k query semantics by considering both spatial and social relevance components to compute the scoring function. An example of a geosocial top-k query, inspired by the work of [71], is shown in Figure 8.
The third most numerous class of geosocial queries is the geosocial top-k query with 11 studies (19.3%) (see Table 10). Generally, the studies addressing this kind of query rely on the conventional top-k queries that retrieve the top-k objects based on a user-defined scoring function, and enrich the top-k query semantics by considering both spatial and social relevance components to compute the scoring function. An example of a geosocial top-k query, inspired by the work of [71], is shown in Figure 8.

S10
Top-k join queries compute the k combinations of several query search results over geospatial and social data sources with the highest score All the studies apply the distance, defined in the previous sub-section on "Geosocial group queries", as a spatial constraint of the query.
Considering the social constraints, besides the friendship, relevance, and relationship effect, already mentioned and described in the previous classes of queries, further social constraints that have been applied in these studies are the following: • Popularity: it is obtained by quantifying how many users have the location in their k nearest neighbours results [42]; • Social connectivity: the social connectivity of a geosocial graph can be defined as the graph density and can be measured by a formula provided [78].
The majority of these studies (7 studies) apply the relevance constraint, while 4 studies apply the friendship constraint, and only 1 work uses the relationship effect, the popularity, or the connectivity constraint (see Table 11). Table 11. Main types of spatial, social, and temporal constraints applied in geosocial top-k queries.

Geosocial Skyline Queries
The skyline operator was introduced by Borzsony et al. [79] for retrieving a set of data objects O that are not dominated by others, meaning that any other set of object O' is worse than O for all the attributes of the query. The category of geosocial skyline query enriches the semantics of the skyline operator by considering also the social relationships of the query owner for retrieving the set of data objects O. Six of the surveyed studies (10.5%) belong to this class of geosocial query (see Table 12). An example of a geosocial skyline query, inspired by the work in [55], is shown in Figure 9. Table 12. Geosocial skyline queries.

S20
LBSNs friend recommendation skyline query (LFRSQ) returns the friend recommendation list by considering three factors: (a) common friend, (b) distance influence, and (c) similarity score, which is calculated from location similarity and friend influence between user and candidate friends S26 Geosocial skyline query reports for a given user and a given location the pareto-optimal set of persons who are close to the location and closely connected to the user S24 Geo-Social Keyword Skyline Query (GSKSQ) returns the skyline of a set of PoIs based on a query point, the social relationships of the query owner, and query keywords S28 Geosocial skyline keyword (GSSK) returns every object within range which is not dominated by any other object in terms of distance to the query location and aggregated score of social and keyword relevance S49 Skyline cohesive group query finds a group of users, which are strongly connected and closely co-located S51 Socio-Spatial Skyline Query (SSSQ) query returns every place for which there does not exist any other place that has a better social score and better spatial score has a better social score and better spatial score Similarly to the category of geosocial top-k queries, all the studies proposing geosocial skyline queries apply the distance as a spatial constraint of the query.
Attending to the social constraints, in addition to the friendship, relevance, and acquaintance, already mentioned and described in the previous categories of queries, fur- Similarly to the category of geosocial top-k queries, all the studies proposing geosocial skyline queries apply the distance as a spatial constraint of the query.
Attending to the social constraints, in addition to the friendship, relevance, and acquaintance, already mentioned and described in the previous categories of queries, further social constraints that have been applied in these studies are the following:

•
Social influence: it is applied to retrieve friends who have closer social ties and it is computed based on both the social connections and similarity of the check-in activities [50]. • Social similarity: it measures how socially close people are. Several methods for measuring this proximity have been proposed in the literature, and the most adopted are the Random Walks with Restart method and the Bookmark Coloring Algorithm, which considers all walks between two users [55].
In terms of numbers, the most applied social constraint in this category is the friendship constraint (2 studies), followed by social influence, social similarity, relevance, and acquaintance constraints with one study each (see Table 13).

Geosocial Nearest Neighbor Queries
Chen and Lu [80] define a nearest neighbour (NN) query as a query aimed to find the set of nearest items (users/objects/PoIs) to the query point in terms of spatial distance. The most popular variant of NN query is the k-nearest neighbor (k-NN) query that retrieves the k-nearest points to the query point. An example of a k-NN query, extracted from [46], is provided in Figure 10. The geosocial NN query extends the computation of the nearest items by considering not only the spatial distance but also social criteria to find these objects. Ten of the surveyed studies (17. 5%) belong to this class of geosocial query (see Table 14). The spatial constraints that have been applied in these studies are the distance and travel costs, already defined in the sub-section on "Geosocial group queries". Specifically, 8 studies apply the distance, while only 2 studies apply the travel cost (see Table 15).
Attending to the social constraints, five different kinds of social constraints have been Figure 10. An example of a geosocial nearest neighbor query that considers a set of users {u 1 , u 2 , . . . , u 8 } and the query location q. The query returns C 1 with radius constraint ρ = 3, which is the nearest neighborhood to q.

S2
Nearest Friends (NF) returns the nearest friends of a user to a given location

S7
Cohesive group nearest neighbor (CGNN) returns a group of attendees such that the travel cost of each attendee is within a range, and the total travel cost of all attendees is minimised Cohesive group nearest neighbor queries under multi-criteria (MCGNN) return a group of attendees and a set of locations such that the travel cost of each attendee is within a range, and the overall scores of locations are maximised under multi-criteria S15 k-Relevant nearest neighbor (k-RNN) retrieves close-by and relevant (as judged by the crowd) POIs S16 Reverse nearest neighborhood (RNH) discovers the neighborhoods that find a query facility as their nearest facility among other facilities in the dataset S19 kNN and range queries discover the hot zones (highly populated areas) based on users' spatial movement patterns and incorporate them into the construction of watchtowers

S22
Geosocial group queries retrieve k users that satisfy the minimum acquaintance constraint and has the minimum spatial distance to the query issuer S23 Geo-Social K-Cover Group (GSKCG) retrieves a minimum user group in which each user is socially related to at least k other users, and the users' associated regions can jointly cover all the query points S34 k-nearest neighbor temporal aggregate (kNNTA) query returns the top-k locations that have the smallest weighted sums of (i) the spatial distance to the query point and (ii) a temporal aggregate on a certain attribute over the time interval

S46
Reverse Nearest Social Group (RNSG) finds all social groups that satisfy k-core constraint and have their farthest member (individual with maximum euclidean distance to the query point) as a reverse nearest neighbor of the query point S53 Consensus query finds a meeting place that minimises the travel distance for at least a specified number of group members The spatial constraints that have been applied in these studies are the distance and travel costs, already defined in the sub-section on "Geosocial group queries". Specifically, 8 studies apply the distance, while only 2 studies apply the travel cost (see Table 15). Table 15. Main types of spatial, social, and temporal constraints applied in geosocial nearest neighbor queries.

Constraints Paper ID Total
Spatial Distance S2, S15, S16, S19, S22, S23, S34, S46 8 Attending to the social constraints, five different kinds of social constraints have been applied in these studies: the friendship constraint, which is the most applied in this category with 3 studies, followed by popularity, closeness, and acquaintance constraints with 2 studies, and the relevance with 1 study.
Finally, one study [20] proposing geosocial NN queries incorporates also temporal constraints, in addition to spatial and social ones.

Geosocial Moving Queries
Moving queries are an important type of query of moving objects, asking for a set of objects that satisfy the spatial query constraints in a given time interval. The geosocial moving queries enlarge the query requests also to the variation in social relationships, in addition to the movements with spatial and temporal characteristics [63]. Three of the surveyed studies (5.26%) belong to this category of geosocial query (see Table 16).
Similarly to the category of geosocial top-k queries, all the studies proposing geosocial moving queries consider distance as a spatial constraint of the query.
Considering the spatio-temporal constraints, the surveyed studies apply two different kinds of movement constraints: trajectory and route constraints. The former defines constructs for retrieving the trajectories of the moving object, while the latter allows searching for the optimal route that passes through the locations specified in the query.
Attending to the social constraints, in addition to the friendship and social similarity, already mentioned and described in the previous categories of queries, a further social constraint that has been applied in these studies is social trust. It measures the credibility between two persons and can be computed considering features that exploit social information and user behavioural patterns, including user profiles, social structure, and user behaviors in the geosocial network [75].

Geosocial Fuzzy Queries
Fuzzy queries have been defined by Hassine et al. [81] as queries with imprecision in the preferences about the desired items that are expressed usually using fuzzy conditions. Therefore, the terms in the queries do not have to be an exact match with the retrieved terms but within the maximum distance specified in the fuzziness.
Only one surveyed work [51] proposes fuzzy queries for geosocial networks. Specifically, in the work of Chen et al. [51], fuzzy queries are defined over a social relational network model, called an intuitionistic fuzzy social relational network (IFSRN) model, representing and reasoning with negative, positive, and neutral relationships between actors, and can get the degrees of truth and the degrees of false of the fuzzy queries.

Frameworks Supporting Geosocial Query Processing
In addition to the 54 studies proposing the geosocial queries classified in the seven categories described above, 3 of the surveyed studies propose the following frameworks providing a collection of query primitives essential for geosocial queries: 1.
J-CO framework [34] that provides a data model, an execution model, and a pool of operators (basic and spatial), which constitute the query language for querying heterogeneous collections of geo-referenced data and social network information.

3.
Socio-Spatial Network Algebra [77] that is composed of a set of seven operators that serve as the building blocks of a socio-spatial query language over a joined socio-spatial graph.

RQ 2: What Are the Query Processing Methods Applied to Geosocial Data by Selected Studies?
We addressed the second research question by analysing the kind of method(s) used to retrieve the result of the query, the kind of access method (if index-based or not), and whether or not they provide an approximate solution [82,83].
Considering the kind of query processing method, we checked the algorithms of the query processing proposed in the selected studies and we searched for the query primitives or algorithms described in Section 2.2. Based on our analysis, the most applied primitive in geosocial queries is pruning with 31 studies (57.4%), followed by sorting (15 studies-27.8%), scoring (14 studies-25.9%), clustering (8 studies-14.8%), filtering (6 studies-11.1%), and join and partitioning (1 study-1.8%). Considering the query algorithms, the most applied are the best first search algorithm and branch and bound with 6 studies each (11.1%), followed by measure and conquer (2 studies-3.7%), Dijkstra search, and depth-first search (1 study-1.8%).
Considering the kind of access method, the majority of the selected studies used an index-based approach (47 studies-87%) and only 7 studies (13%) do not use an index. The most applied class of indexing method is the spatial-first with 30 studies (63.8%), followed by the hybrid approach with 14 studies (29.8%) and the social-first with 3 studies (6.4%).
Finally, the majority of the selected studies do not provide an approximate solution (37-68.5%). Table 17 summarises the selected studies with respect to the kind of query primitives/algorithms, access method, and indexing method they utilised.

RQ 3: How Are Geosocial Query Processing Methods Evaluated?
To answer this RQ, we identified 55 (96.5%) studies out of the selected studies that evaluated the proposed geosocial query processing methods, while two studies [34,51] do not provide any evaluation.
In the following sub-sections, we analyse both some important evaluation metrics used to assess the performance of geosocial query processing methods and the evaluation datasets.

Metrics
From the selected studies, we identified the following measures used to evaluate the performance of the query processing methods:

•
Query response time, also named the query elapsed time or query processing time, which measures the time elapsed from the instant a query is issued to its result retrieval; • Running time, also called the computation time, which is the length of time required to perform the query computational process; • CPU time, which is the amount of time for which a central processing unit (CPU) is used for processing query instructions. According to what exactly the CPU is processing, this metric can be distinguished in client CPU time, which is the amount of time the CPU is busy executing client instructions, and the server CPU time, which is the amount of time the CPU is busy executing server instructions; • Communication overhead, which is defined as the number of encrypted records sent as the result of an issued query [84]; • Correctness, which is the ratio between the number of the correct answers and the number of total queries; • Accuracy, which is computed as the ratio between the cost functions of the result set obtained by the proposed query and the baseline solution [60]; • Index construction time, which can be defined as the time elapsed to construct the index structures [85]; • Approximation ratio, which is the usual way of measuring the performance of the query processing methods that provide approximate solutions and is computed as the ratio of the radius of approximate solution returned over that of the exact solution; • I/O cost, which corresponds to the number of page/blocks accessed (I/O) to retrieve the data from the disk for each query; • Pruning rate, which is computed as the ratio of the pruned PoIs to all the PoIs in the query range; • Memory space, which is the total amount of memory used by the algorithm for query processing.
The most applied metric is the running time (43.9%), followed by I/O cost (26.3%), query response time (24.5%), and server CPU time (19.3%), as shown in Table 18.

Evaluation Datasets
As discussed by Brinkhoff [86], preparation and use of well-defined evaluation datasets are fundamental for enabling a systematic evaluation of the performance of query pro-cessing algorithms and data structures. To achieve that, real-world and synthetic datasets have been used in the literature. The former are collected from real applications. The latter are generated by constructing a model that learns the statistical properties of the real data and using the model to produce the synthetic data, as well explained by Dankar and Ibrahim [87].
The selected studies used predominantly real-world datasets (56.1%-32 studies) to perform the evaluation of the geosocial query process, while 19 studies (33.3%) used both real-world and synthetic datasets and 2 studies (3.5%) used synthetic datasets only. The two remaining studies (S4 and S19) do not specify the datasets used for the evaluation. The predominant use of real-world datasets is probably due to the fact that they provide more realistic benchmarking results, even if the effort to record them can be very high compared to synthetic datasets. Table 19 provides a summary of the real-world datasets used by the selected studies, along with their main characteristics; i.e the size, which is the number of items (users, locations, vertices, objects, PoIs, etc.) collected in the dataset, and the sources, which are the location-based social network or the road network used to acquire the data. The most popular real-world dataset (with 23 studies or 41%) is the Gowalla dataset [   The definition of the query processing methods applied to geosocial data brings many opportunities for research; however, there are also several open challenges that should be faced in the near future. Table 20 provides a summary of these issues that we have extracted from the surveyed studies and opportunely divided into three main categories: technological challenges, privacy-related challenges, and social challenges.

Open Challenges ID
Technological use of the shortest route, the interest of riders, obstacles on the road, and location uncertainty to enhance the query ridesharing system S8 use of the historical information of each user in the group to automatically setting the group preference and its weight S17 to allow each user to specify the minimum number of attendees with each attribute value required to be selected S42 empirical "relevance" assessment of the query results involving real-world data collected from the Web S15 to adopt deep learning technologies to train knowledge graphs of users, so as to intelligently perceive the preference information of a user community and choose the best POI S35 development of a corresponding index structure and various query algorithms, and the distributed implementation of a data model using a large-scale graph S37 to incorporate more sophisticated spatial queries such as skyline and distance-based joins S22 integration of methods to favor users whose friends are concentrated near the query and to investigate the adaptation of these methods to related application domains, such as spatial-keyword search S25 to study geo-social top-k collective keyword queries S28 Privacy-related to protect the location privacy of users while evaluating GTP queries S31 group planning over privacy-preserved or inconsistent spatial-social networks S14 to consider a user location as a region instead of a point that is desirable from the standpoint of privacy S53 Social to investigate the issue of social trust and how to integrate social trust into geo-social group query S43 to incorporate social relationships as an important criterion in group formation and develop novel query processing techniques S54 to study the evaluation of social trust in location-based social networks and to seek other approximate algorithms for solving this new problem S55 to investigate how other social information, such as social relationships between mobile users, can be utilized to speed up spatial query processing S19 With respect to the technological challenges, the results of the SLR reveal a need to explore new kinds of social and spatial data to include in the query processing for refining the results of the geosocial queries. For instance, Shim et al. [39] suggested the use of the shortest route or the interest of riders to enhance the query ridesharing processing and to apply this kind of query also to environments with obstacles on the road and location uncertainty. Zhang et al. [47] proposed the use of the historical information of each user in the group to automatically set the group preference and its weight in the social graph. Furthermore, several works suggested to focus future research on the development of new approaches for (i) assessing the relevance of the query results, for instance, by using realworld data collected from the Web [45]; and (ii) training knowledge graphs, for instance, by using deep learning technologies to intelligently perceive the user community preference information and choose the best POI to retrieve [61]. In addition, a look at new kinds of geosocial queries is also suggested by the surveyed works. In particular, more sophisticated spatial queries, such as skyline and distance-based joins [52] and geosocial top-k collective keyword queries [23], are proposed.
Regarding the privacy-related challenges, some surveyed works highlighted the need for solutions to protect the users' location privacy. Hashem et al. [58], for example, suggested to study scenarios where the group of users does not reveal their locations among each other, and Ali et al. [73] proposed to consider a user location as a region instead of a point to avoid to disclose the precise location.
Finally, attending to the social challenges, future research needs to focus on the concept of social trust by investigating how social trust can be evaluated in location-based social networks [75] and how it can be integrated into geosocial query processing [66]. Moreover, future studies may even investigate how to incorporate other social information, such as the social relationships between mobile users, to develop novel query processing methods and speed up spatial query processing [49,74].

Conclusions
This study has examined the geosocial query processing in location-based social networks through a systematic literature review of the scientific knowledge extracted from indexed scientific databases, containing formally published literature, and from nonindexed databases, containing grey literature. Out of the 4312 papers returned from the initial search on these databases, 67 studies were retained after the application of the inclusion and exclusion criteria defined in the methodology, of which 57 were selected for the qualitative synthesis according to the scores obtained in the quality evaluation checklist.
We have found that the scientific community's interest in the topic of geosocial querying has started growing in 2012 and continued to grow till 2020. Furthermore, the result of our analysis shows that seven categories of geosocial queries can be identified: geosocial group queries proposed by 43.85% of the selected studies, followed by geosocial keyword queries (26.31%), geosocial top-k queries (19.3%), geosocial nearest neighbor queries (17.5%), geosocial skyline queries (10.5%), geosocial moving queries (5.26%), and geosocial fuzzy queries (1.75%). Moreover, three of the surveyed studies (5.26%) propose frameworks supporting a collection of query primitives essential for geosocial queries.
Regarding the query processing methods, we have observed that the kind of query primitive predominantly applied in the geosocial query process is pruning (57.4%), followed by sorting (27.8%), scoring (25.9%), clustering (14.8%), filtering (11.1%), join (1.8%), and partitioning (1.8%), while the most frequently used query algorithms are the best-first search algorithm (11.1%) and branch and bound (11.1%), followed by measure and conquer (3.7%), Dijkstra search (1.8%), and depth-first search (1.8%). Moreover, we found out that the majority of the selected studies used an index-based approach to optimize the retrieval of the geosocial data, and the spatial-first indexing method is the most common class of indexing methods (63.8%). Another key finding is that most of the selected studies (68.5%) do not provide an approximate solution, probably because it is preferable to have a completely accurate answer, even if through a more time-consuming process, instead of faster but not accurate approximate results.
Concerning the evaluation methodologies, we found out that one of the most common measures used to evaluate the performance of the query processing methods is running time (43.9%), followed by I/O cost (26.3%), the query response time (24.5%) and server CPU time (19.3%). Moreover, to perform the evaluation of the geosocial query process, real-world datasets are mainly used (56.1%), followed by both real-world and synthetic datasets (33.3%). The Gowalla dataset is the most popular real-world dataset applied by 41% of the selected studies.
Finally, the findings of the study highlight the need to explore (i) new kinds of social and spatial data to include in the query processing for refining the results of the geosocial queries; (ii) solutions to protect the location privacy of users; and (iii) methods for evaluating and integrating social trust into geosocial query processing.