Investigating “ Locality ” of Intra-Urban Spatial Interactions in New York City Using Foursquare Data

Thanks to the increasing popularity of location-based social networks, a large amount of user-generated geo-referenced check-in data is now available, and such check-in data is becoming a new data source in the study of mobility and travel. Conventionally, spatial interactions between places were measured based on the trips made between them. This paper empirically investigates the use of social media data (i.e., Foursquare data) to study the “locality” of such intra-urban spatial interactions in New York City, and specifically: (i) the level of “locality” of spatial interactions; (ii) the impacts of personal characteristics on “locality” of spatial interaction and finally; (iii) the heterogeneity in spatial distribution of “local” interactions. The results of this study indicate that: (1) spatial interactions show a high degree of locality; (2) gender does not have a considerable impact on the locality of spatial interactions and finally; (3) “local” interactions likely cluster in some places within the research city.


Introduction
Location-based social network (LBSN) products such as Foursquare, Gowalla, Google Latitude, and Facebook Places are becoming an important source of volunteered geographic information (VGI).As the most popular LBSN, Foursquare has nine venue categories, including food, travel, and transport, and these cover hundreds of sub-categories, including cafes, bus stations, and train stations [1].Foursquare has over 50 million users worldwide, and over 6 billion check-ins had been made using the website by May 2014, with millions of new check-ins taking place every day [2].Although check-in data has some limitations in terms of how it represents human mobility-for example, it shows age group and place category bias-such data has the ability to identify human mobility patterns in accordance with certain mechanisms [3,4].When compared to some other data sources (e.g., survey data and mobile phone data), LBSN check-in data has some advantages as an indicator of human activity categories such as dining, working, and shopping, as it provides a fine-grained resolution, and is readily available.Therefore, user-generated geo-referenced check-in data has excellent potential when wishing to study human mobility, as some researchers have already demonstrated [3,[5][6][7][8][9][10][11].Since spatial interactions are measured through the use of human mobility patterns, so check-in data has significant potential within the study of spatial interaction.Therefore, in this study we will use LBSN check-in data to study intra-urban spatial interactions.Before introducing how the paper is structured, we will review the previous research on intra-urban mobility and spatial interactions.

Brief Overview of the Previous Research on Intra-Urban Mobility and Spatial Interactions
Over the last two decades, intra-urban mobility has become a popular research topic across the research community, including among geographers, urban planners, computer scientists, and physicists.
ISPRS Int.J. Geo-Inf.2016, 5, 43 2 of 13 Geographers are interested in spatial distribution of intra-urban mobility, and urban planners want to improve transport efficiency by investigating spatial and temporal variations of travel time and travel flows [12][13][14][15].To quantitatively understand mobility, computer scientists and physicists are involved in modelling distribution of travel distance in a mathematical way [16][17][18].A few studies have been able to identify human mobility patterns using mobile phone data, automobile GPS traces, and social media data [3,5,[15][16][17][18][19][20].One of the most common findings among such studies is that the distribution of trip distances tends to follow a power law [12,16] or an exponential law [4,21,22].Another common finding is that humans follow simple, reproducible patterns of mobility [15,16].Some other studies have attempted to interpret intra-urban mobility distributions using models [12,23] such as the gravity model [23] and its modified forms-mainly based on the distance-decay effect [12], and the population distribution model-which is the most popular simulation model used [12,22].
Some researchers [17,18] have used empirical studies to demonstrate high potential predictability levels with respect to user mobility [17], while others have attempted to uncover how urban form characteristics (e.g., land area, land use mix, road and population densities) and personal or household characteristics (e.g., age, gender, education levels, employment, income, and car ownership levels) impact upon intra-urban human mobility [11,15,20,[24][25][26][27].Furthermore, from a network perspective, some studies have focused on intra-urban travel networks [28,29], revealing that, like some typical connection networks such as the World Wide Web, the internet, friendship networks, and scientific collaboration networks, intra-urban travel networks are "small world" networks [30].Finally, some studies have revealed underlying patterns of intra-urban mobility to understand urban structure and functions of sub regions [31,32], and to better measure popularity of places (e.g., retails) for site selections [33,34].

Motivation for This Study
In some studies [35,36], trips between geographic units such as cities and neighborhoods, are used to measure the interaction between two geographic units.Similarly, trips between locations such as restaurants, apartments, and bus stops, have been used to measure the interactions between locations in a city.The number and length of such trips can be used to represent the strength and length of interactions between locations.Spatial interactions have also been analyzed by researchers to better understand human mobility patterns and human-mediated dynamic phenomena, such as the spread of infectious disease, the results of which may be beneficial to urban planners and decision makers [28,37,38].As one sub-field of urban planning, transportation planning will be improved by a better understanding of the spatial distribution of human mobility in cities. Besides, "Human behaviour plays an important role in the spread of infectious diseases, and understanding the influence of behaviour on the spread of diseases can be key to improving control efforts."[39].Obviously, decision makers are concerned about how to improve control over the spread of infectious diseases.On the one hand, investigating spatial interaction patterns may be beneficial to businesses, e.g., by helping to identify a good location based on personalized user preferences, or selecting a good site for a new shop.The former has been explored by studies through the use of social media data [4,6], while the latter is a new field within mobility studies using social media data.Investigating spatial interaction distances may help when selecting the site of a new shop, restaurant, or other service facility.For instance, if a new restaurant is established to serve customers of fitness and sports centers, it should be located in close proximity to them.However, it may not always be the most profitable decision to open a restaurant extremely close to a gym, so when selecting a site, one needs to examine to what extent a restaurant close to a gym is able to attract a large number of gym users.In addition to business location decisions, management of the spread of infectious diseases can also benefit from spatial interaction or human mobility analysis [37,40,41].For instance, if a healthy person visits a venue over the same time period as a sick person, such as over the course of a day, he or she is more likely to become infected than those who do not visit the venue.Therefore, mobility trajectories may be closely associated with the spread of a disease; with a strong interaction implying a higher possibility of disease spread and with long-distance interactions implying a greater potential for a larger spatial range of spread.The strength and length of interactions between venues can be taken into account within disease spread simulations.
This study was explicitly motivated by the following.The distance decay effect [42] seemingly implies that human mobility or spatial interactions have a large degree of "locality" associated with them, so this study attempts to investigate the "locality" of spatial interactions in a quantitative way.Moreover, since the existing literature reveals that intra-urban mobility is influenced by personal characteristics, it will be interesting to examine if and how personal characteristics impact upon the "locality" of interaction.Furthermore, since spatial heterogeneity of population and physical activities likely result in spatial heterogeneity of human mobility, this study attempts to investigate the heterogeneity in spatial distribution of "local" interactions by identifying clustering of "local" interactions.Additionally, this study attempts to analyze noticeable clusters of "local" interactions to understand travel behavior and activities of residents.Based on this, functions or land use patterns of sub regions covered by the clusters will be somewhat discussed.
There is some research that leverages LBSN data or other mobility data to analyze intra-urban mobility, spatial interactions, and activity transitions [4,6].One the one hand, the majority of this research has not accounted for the impacts of personal characteristics on spatial interactions.In contrast, this study attempts to incorporate personal characteristics into the analysis of intra-urban spatial interactions.On the other hand, some other research utilizes LBSN data or other mobility data to divide a city or identify functions of regions by analyzing spatial patterns of human activities [43][44][45].The majority of these researchers consider activities (points) separately, but ignore the relationships between different activities undertaken by individual people.This study takes account of the relationships between different activities in terms of trips (activity transitions).Accordingly, complex network approaches are used to identify clusters of trips.Afterward, we will somewhat discuss functions or land use patterns of sub regions covered by the noticeable clusters.This study doesn't aim to discuss the functions or land use patterns of all sub regions in a city.
The remainder of this paper is organized as follows.Section 2 introduces how we analyze the "locality" of spatial interaction, while Section 3 presents the empirical analysis and the relevant results.Lastly, we will present the conclusion and make suggestions for future research.

Methodology
In this section, we will introduce the methodology used for this study.First, the study will investigate the level of "locality" of spatial interactions.Second, the study will explore if gender as a personal characteristic has a considerable impact on the "locality" of spatial interactions.Finally, this study will investigate the heterogeneity in spatial distribution of "local" interactions.Particularly, this study will identify clustering of "local" interactions (links), i.e., area of strong intra-interactions.Additionally, functions or land use patterns of sub-regions covered by the clusters will be somewhat discussed.Land use data will be used to somewhat validate the discussions.

Depiction of Spatial Interactions
In this section we will introduce how to use check-in information to depict human mobility and spatial interaction patterns.Within location-based social networks such as Foursquare, each venue corresponds to a physical location (see Figure 1).Common types of venues include restaurants, offices, apartments, hotels, bus stops, shops, and gyms.Imagine a user checks in at venue A (house) and venue B (office) consecutively.In this situation, the "trip" is from venue A to venue B, irrespective of the specific route taken by the user travelling between these two venues.In the context of this study, a "trip" is a single journey represented by a line in geometrical terms.Sometimes, a user might travel twice between two venues in opposite directions, e.g., going from an office to a restaurant for lunch and then returning after lunch.If a user travels between two venues, these two venues are considered to be "spatially interacted", in other words, there is an "interaction" between these two venues.
considered to be "spatially interacted", in other words, there is an "interaction" between these two venues.

Length and Strength of Spatial Interactions
The length of an interaction is equal to the distance between the pairwise venues (i.e., the length of trip), while the strength of an interaction is measured by the number of trips taken between interacted pairwise venues.The more trips there are between pairwise venues, the stronger the interaction between them.

"Locality" of Spatial Interactions
Since this study investigates the "locality" of spatial interactions, two "locality" indicators are introduced.Two indicators are used to characterize locality of interaction at the venue scale.To eliminate the effect of spatial heterogeneity among venues, relative distance is used instead of real distance.Therefore, supposing venue j is the kth nearest neighbor (kth NN) of venue i, then k is used to measure the relative distance between venue i and its neighboring venue j.The lower k is as a value, the shorter the relative distance will be.K's nearest neighbor (KNNs) is the venue set composed of ith nearest neighbors (i = 1, 2, …, K).For instance, 200NNs refers to a set composed of the 200 nearest neighbors.Two indicators are defined here to characterize the locality of interaction.
(1) The percentage of links a venue has with KNNs is used to measure the relative possibility of a venue being linked (interacted) with its neighbors.This value is used to measure the "locality" range of the interactions that take place.The higher the value is, the more the venue is likely to be interacted with neighboring venues than with distant venues.This value can be defined as: where the     ℎ (  ) is the number of links venue   has with its , and     (  ) is the number of links venue   has.
(2) The kth NN linked with a venue value is used to represent that a venue is interacted with the kth nearest venue among its KNNs.This value is used to measure the "locality" strength of interactions between venues.

Interaction Network
In regards to investigating the heterogeneity in spatial distribution of "local" interactions, this study will use a complex network analysis method to identify clustering of "local" interactions (links).First, this sub-section introduces how to build a "local" interaction network on the basis of nodes and edges.After that, this subsection presents how to identify clustering of "local" interactions in a "local" interaction network.

Length and Strength of Spatial Interactions
The length of an interaction is equal to the distance between the pairwise venues (i.e., the length of trip), while the strength of an interaction is measured by the number of trips taken between interacted pairwise venues.The more trips there are between pairwise venues, the stronger the interaction between them.

"Locality" of Spatial Interactions
Since this study investigates the "locality" of spatial interactions, two "locality" indicators are introduced.Two indicators are used to characterize locality of interaction at the venue scale.To eliminate the effect of spatial heterogeneity among venues, relative distance is used instead of real distance.Therefore, supposing venue j is the kth nearest neighbor (kth NN) of venue i, then k is used to measure the relative distance between venue i and its neighboring venue j.The lower k is as a value, the shorter the relative distance will be.K's nearest neighbor (KNNs) is the venue set composed of ith nearest neighbors (i = 1, 2, . . ., K).For instance, 200NNs refers to a set composed of the 200 nearest neighbors.Two indicators are defined here to characterize the locality of interaction. (1) The percentage of links a venue has with KNNs is used to measure the relative possibility of a venue being linked (interacted) with its neighbors.This value is used to measure the "locality" range of the interactions that take place.The higher the value is, the more the venue is likely to be interacted with neighboring venues than with distant venues.This value can be defined as: percentage o f links a venue has with KNNs pP i q " link count o f a venue with KNNs pP i q link count o f venue where the link count o f venue with KNNs pP i q is the number of links venue P i has with its KNNs, and link count o f venue pP i q is the number of links venue P i has. (2) The kth NN linked with a venue value is used to represent that a venue is interacted with the kth nearest venue among its KNNs.This value is used to measure the "locality" strength of interactions between venues.

Interaction Network
In regards to investigating the heterogeneity in spatial distribution of "local" interactions, this study will use a complex network analysis method to identify clustering of "local" interactions (links).First, this sub-section introduces how to build a "local" interaction network on the basis of nodes and edges.After that, this subsection presents how to identify clustering of "local" interactions in a "local" interaction network.

Nodes and Edges
Trips within a city can be used to build an "interaction network".In this network, a node represents a venue and an edge represents an "interaction".The weight of an edge is measured by the number of trips taken between interacted pairwise nodes.

Network Structure and Community Analysis
In complex networks, links are not evenly distributed among nodes.The nodes of a network can be divided into groups of nodes with dense connections internally and sparser connections between groups [46].Such a group is considered as a community, which is actually a sub-network.The community based approach is widely used to analyze the structure of complex networks [46,47].
A good partition of a network into communities must comprise many intra-community links and few inter-community links.Various community detection algorithms have been proposed and used to divide complex networks [48][49][50][51], however, there are not many empirical studies revealing what is the best algorithm.In this case, four widely used community detection algorithms-i.e., FastGreedy [48], Spinglass [49], Walktrap [50], and Infomap [51]-are all used to divide the network in this study.Among the four partitions of the network, the partition with the smallest number of communities detected is then selected as the best partition for analysis of clustering of "local" interactions.A small number of communities detected means the network is divided into a small number of parts.Accordingly, the communities detected likely have a relatively large number of nodes and intra-community links (trips).

Empirical Analysis
The methods for spatial intersection described in Section 2 are applied for Foursquare Check-in data in New York City.In this section, the test data set will be briefly described at first, then the "locality" of spatial interactions will be empirically investigated.

Study Case and Empirical Data
This study uses New York City (NYC) in the United States as the research city.There are a large number of active social media users in NYC.New York City is composed of five boroughs: Brooklyn, Queens, Manhattan, the Bronx, and Staten Island.
In this paper, Foursquare check-in data is used to generate the empirical mobility data.Since Foursquare has a strict privacy policy, the check-ins were collected from Twitter, with which some Foursquare users share their check-ins.Within NYC's municipal boundary, 148,169 check-ins were acquired over the period 3 March 2014 to 27 April 2014 (12 continuous weeks).To make sure the trip generation process is reasonable, any noise-i.e., where a check-in could not reasonably be used to constitute a trip-found in the four situations detailed below, is filtered-out, as follows: Situation (1): Among the consecutive daily check-ins, more than one check-in is generated in the same position.In this situation, only the first generated check-in is retained; the others are discarded.
Situation (2): The time difference between consecutive check-ins for an identical user is greater than 8 h.
Situation (3): The speed of travel between two consecutive daily check-ins for an identical user is faster than extremely high (e.g., 250 km/h).
Situation (4): The distance between two consecutive daily check-ins for an identical user is short (e.g., 100 m or less).This might suggest that when visiting a venue a user will probably check in, not only at this venue, but at other venues nearby.Those check-ins made at other venues, but not actually visited by the user, are considered fake.Therefore, with lengths of less than 100 m, such fake short trips should be discarded.In this situation, the two consecutive check-ins do not constitute a trip.
Furthermore, active users who are likely local residents are selected.In this study, active local users are regarded as users who: (1) are likely to check in at locations as much as possible whenever they actually visit these locations; who (2) are likely to have check-ins for a sufficient number of days (at least 28 days); and who (3) have a certain number of trips (at least 21 trips) between different venues.
As a result, 50,758 travel flows (trips) covering 18,333 venues and 40,111 links (interactions) are included.These trips are generated by 843 active sampled users (443 male users and 400 female users). The

Locality of Spatial Interactions
This section presents an empirical investigation of the locality of spatial interactions, and specifically, to what degree a venue is likely to be interacted with its neighbors.First of all, this sub-section presents the empirical distribution of two locality characteristics: the percentage of links a venue has with KNNs, and the kth NN linked with a venue.It should be noted that "K-nearest neighbors" (KNNs) means the K-nearest neighbors of the relevant venue.In this empirical investigation, a specific number has had to be set as the value of K.The value 200 is assigned to K, since 200NNs of a venue represents a considerable number of neighbors for a venue.23,390 trips (46% of the total trips) connect the start venues to the end venues which are the 200NNs of the start venues.This indicates 200NN can be used to represent an appropriate neighborhood range.
In this sub-section, we empirically investigate how venues interact with their 200NNs.First, the distribution of percentage of links a venue has with KNNs is analyzed, with the results shown in Figure 2. Approximately 50% of the venues interact with their 200NNs.Figure 2 also shows that when the value is more than 0, the percentage of links a venue has with 200NNs values seems to exhibit a uniform distribution.This suggests that the relative possibility of a location being linked (interacted) with its neighbors seems to follow a uniform law.

Locality of Spatial Interactions
This section presents an empirical investigation of the locality of spatial interactions, and specifically, to what degree a venue is likely to be interacted with its neighbors.First of all, this subsection presents the empirical distribution of two locality characteristics: the percentage of links a venue has with KNNs, and the kth NN linked with a venue.It should be noted that "K-nearest neighbors" (KNNs) means the K-nearest neighbors of the relevant venue.In this empirical investigation, a specific number has had to be set as the value of K.The value 200 is assigned to K, since 200NNs of a venue represents a considerable number of neighbors for a venue.23,390 trips (46% of the total trips) connect the start venues to the end venues which are the 200NNs of the start venues.This indicates 200NN can be used to represent an appropriate neighborhood range.
In this sub-section, we empirically investigate how venues interact with their 200NNs.First, the distribution of percentage of links a venue has with KNNs is analyzed, with the results shown in Figure 2. Approximately 50% of the venues interact with their 200NNs.Figure 2 also shows that when the value is more than 0, the percentage of links a venue has with 200NNs values seems to exhibit a uniform distribution.This suggests that the relative possibility of a location being linked (interacted) with its neighbors seems to follow a uniform law.Second, the distribution of the kth NN linked to each venue is analyzed.In Figure 3, using a Kolmogorov-Smirnov (KS) test (for more detail see [53]), the distribution of the kth NN linked to each venue follows an exponential law.This suggests that the distribution of relative interaction lengths Second, the distribution of the kth NN linked to each venue is analyzed.In Figure 3, using a Kolmogorov-Smirnov (KS) test (for more detail see [53]), the distribution of the kth NN linked to each venue follows an exponential law.This suggests that the distribution of relative interaction lengths follows an exponential law.In addition, among the venues' links with their 200NNs, approximately 80% are connected to their 100NNs.Therefore, the analytical results reveal that spatial interactions have a high degree of locality.Furthermore, although the relative possibility of a location being linked (interacted) with its neighbors seems to follow a uniform law, locations are more likely to be interacted with nearer neighbors than those further away.follows an exponential law.In addition, among the venues' links with their 200NNs, approximately 80% are connected to their 100NNs.Therefore, the analytical results reveal that spatial interactions have a high degree of locality.Furthermore, although the relative possibility of a location being linked (interacted) with its neighbors seems to follow a uniform law, locations are more likely to be interacted with nearer neighbors than those further away.

Impact of Gender on "Locality"
This section presents the empirical results of research into the impact of gender on "Locality" characteristics (i.e., percentage of links a venue has with 200NNs, and the kth NN linked with a venue).Specifically, this section presents the impact of gender on the two locality characteristics, with Table 1 giving the average values found.As an alternative to the T-test when it is not guaranteed that samples are normally distributed, the Wilcoxon test is used to test if a sample set has a statistically significant higher average value than the other.
In the results from the Wilcoxon test, the p-values corresponding to the two characteristics are all much more than 0.05 (see Table 1).This means that the average percentage of links a venue has with 200NNs value for male users is not statistically different from that for female users, and the average the kth NN linked with a venue value for male users is not statistically different from that for female users at the 0.05 level.This indicates that gender does not have a considerable impact on the locality of spatial interactions.

Clustering of Local Interactions
This section presents the empirical results of research into identifying clustering of local interactions.In this study, local links connect the start venues to the end venues which are the 200NNs of the start venues.Accordingly, the local interaction network is composed of 13,112 nodes (venues)

Impact of Gender on "Locality"
This section presents the empirical results of research into the impact of gender on "Locality" characteristics (i.e., percentage of links a venue has with 200NNs, and the kth NN linked with a venue).Specifically, this section presents the impact of gender on the two locality characteristics, with Table 1 giving the average values found.As an alternative to the T-test when it is not guaranteed that samples are normally distributed, the Wilcoxon test is used to test if a sample set has a statistically significant higher average value than the other.
In the results from the Wilcoxon test, the p-values corresponding to the two characteristics are all much more than 0.05 (see Table 1).This means that the average percentage of links a venue has with 200NNs value for male users is not statistically different from that for female users, and the average the kth NN linked with a venue value for male users is not statistically different from that for female users at the 0.05 level.This indicates that gender does not have a considerable impact on the locality of spatial interactions.

Clustering of Local Interactions
This section presents the empirical results of research into identifying clustering of local interactions.In this study, local links connect the start venues to the end venues which are the 200NNs of the start venues.Accordingly, the local interaction network is composed of 13,112 nodes (venues) and 15,944 edges (local links).The weight of an edge equals the number of trips connecting the pairwise venues of the edge.
The four community detection algorithms were all used to identify clustering of local interactions (area of high dense interactions).Among the four partitions of the network, the partition made by using the Walktrap algorithm was selected as the best partition for this study since the number of communities detected is the smallest.The spatial interaction network in NYC was divided into 1515 communities by using the Walktrap algorithm.The majority of the communities have a small number of intra-community trips and nodes.Among the 1515 communities detected, only four communities have more than 500 intra-community trips and 100 nodes.In this case, typical communities were further selected for analysis of clustering of local interactions.Typical communities have a large number of intra-community trips (more than 500 trips), and a ratio of more than 20 for the number of intra-community trips and number of inter-community trips.The four most typical communities, which in total have 790 venues and 3778 intra-community trips, were selected since they contained only 6% of the total venues but 16% of the total trips.This indicates the heterogeneity in spatial distribution of "local" interactions.
Furthermore, we focused on the four typical communities to analyze significant clustering of local interactions.Figure 4 displays the four most typical communities (significant clusters of local interactions).In Figure 4, convex hulls of communities were used to measure the spatial sizes of communities.Table 2 shows number of nodes and intra-community trips, and the most predominant link categories in the typical communities selected.Table 3 shows the most predominant land use categories within convex hulls of typical communities selected and percentages of the most predominant land use categories.
Community 1 is located within Brooklyn.The most predominant link categories in Community 1 are Eating -> Eating, Shopping -> Eating, and Eating -> Shopping (see Table 2).This indicates that there is clustering of links (trips) between Shopping venues and Eating venues within Community 1.This implies that some residents cluster in this area where they live, and are more involved in eating and shopping activities than working activities or entertainment activities.It seems that this area is likely a residential area.One & Two Family Buildings and Multi-Family Elevator Buildings are the 1st and 2nd most predominant land use categories within the convex hull of Community 1 (see Table 3), indicating this area is a residential area.
Community 2 is located within Queens, and around the Queens College and St. John's University.The most predominant link category in Community 2 is University -> University (see Table 2).This indicates that there is clustering of links (trips) from University venues to University venues within Community 2. This further implies that some residents cluster in this area, and are likely students in the Queens College.It seems that this area is likely an educational area.Public Facilities & Institutions is one of the most predominant land use categories within the convex hull of Community 2 (see Table 3), indicating this area is an educational area.
Community 3 is located within Manhattan and covers SoHo, Little Italy, Chinatown, and the New York University.There are no extremely predominant link categories in Community 3 (see Table 2).This further implies that some residents cluster in this area for various activities with a high mixed level.It seems that this area is a mixed area.Mixed Residential & Commercial Buildings and Public Facilities & Institutions are the 1st and 2nd most predominant land use categories within the convex hull of Community 3 (see Table 3), indicating this area is a mixed area.
Community 4 is located within Manhattan and around the Empire State Building.The most predominant link categories in Community 4 are Office -> Office and Office -> Shopping (see Table 2).This indicates that there is clustering of links (trips) between Office venues and Office venues or Shopping venues within Community 4. This further implies that some residents cluster in this area for participating in commercial work.It seems that this area is likely a commercial area.Commercial & Office Buildings is the 1st most predominant land use category within the convex hull of Community 4 (see Table 3), indicating this area is a commercial area.Note: Ratio * represents the ratio of number of intra-com trips and number of inter-com trips; Per (%) * represents the percentages of link categories.

Conclusions and Future Work
In this study, the "locality" of intra-urban spatial interactions was empirically investigated using LBSN data.The empirical results generated indicate that: (1) spatial interactions have a high degree of locality; (2) as a kind of personal characteristic, gender does not have a considerable impact on the locality of spatial interactions; and (3) "local" interactions likely cluster in some places within NYC.
Compared to other data sources, check-in data have both advantages and limitations.On the positive side, when compared to census travel data, LBSN check-in data are low cost (they can be downloaded for free) and have a large spatial scale.Geo-referenced check-in can also be at the street level, whereas census travel data are usually publicly available at the census tract level only.However, check-in data also have some limitations when used to study mobility.First, compared to some other mobility data (e.g., mobile phone and taxi trace data), geo-referenced check-ins are relatively sparse in spatial terms because of their relatively low record frequency.For instance, normally one record per minute is taken in a taxi trace record set for an individual user, while there are less than 10 check-ins per day recorded in one historic record set for an individual user.The taxi trace is; therefore, able to represent a trajectory between distinct locations in much more detail than check-in records.Second, geo-referenced check-ins are heterogeneously distributed; for instance, user-generated check-in data is abundant in urban areas, but sparse in rural regions.The result of this is that most of the existing studies of mobility using check-in data only take place in large cities.Also, the more effective application of information and communications technology (ICT) in urban areas is more likely to result in a higher number of user-generated check-ins.Added to this, users tend to carry out more check-ins at certain venues, such as airports, restaurants, shops, and railway stations [54] than at home, partly because there are not so many home-based venues offered by Foursquare.Compared to check-ins made at restaurants, shops, or in work, check-ins made at home are relatively rare, though the majority of people travel to and from home several times a day.Despite having a fine-grained resolution, check-in data is only used in a limited way to characterize home-to-work travel behavior, and this is one of the most important research fields within transportation studies, since check-ins that can indicate whether users are at home or not with much certainty are relatively rare.The third issue is data representativeness.The mobility of a young person is better represented by check-in data than that of an elderly person, since most elderly people do not check-in frequently, or do not use social media at all [54].This is a potential shortcoming when wishing to use check-in data to represent the human mobility of all users in a city.Finally, traveler profiles are not available or incomplete when using check-in data; therefore, some mobility patterns at the individual level (e.g., age, profession, and so forth) cannot easily be extracted from check-in data sets.The availability of user profiles issue is also an obstacle to any improvement of data representativeness.
In the future, some further aspects should be taken into account when carrying out an analysis of intra-urban spatial interactions.First, within the social media, social relationships are a vital aspect, and so need to be considered.However, the use of such data raises a question: To what extent can social relationships affect spatial interactions?Second, it will be interesting to examine in the future if and how urban form characteristics, or socio-economic characteristics, impact upon the "locality" of interactions.Finally, a combination of check-in data and other data (e.g., mobile phone data) seemingly has potential in the study of spatial interactions.When undertaking an empirical study, however, some obstacles such as the inconsistency of positional accuracy and recording frequency will need to be removed or reduced.

Figure 1 .
Figure 1.Trips taken by Foursquare users between venues

Figure 1 .
Figure 1.Trips taken by Foursquare users between venues.
land use data used in this empirical study were collected in 2010.The land use data were open data and downloaded from the Department of City Planning (DCP), NYC [52].Within the land use dataset there are 11 urban land use categories: One & Two Family Buildings, Multi-Family Walk-Up Buildings, Multi-Family Elevator Buildings, Mixed Residential & Commercial Buildings, Commercial & Office Buildings, Industrial & Manufacturing Buildings, Transportation & Utility, Public Facilities & Institutions, Open Space & Outdoor Recreation, Parking Facilities, and Vacant Land.

Figure 2 .
Figure 2. Complementary cumulative distribution function (CCDF) of percentage of links a venue has with KNNs.

Figure 2 .
Figure 2. Complementary cumulative distribution function (CCDF) of percentage of links a venue has with KNNs.

Figure 4 .
Figure 4. Typical communities detected by using the Walktrap algorithm.

Figure 4 .
Figure 4. Typical communities detected by using the Walktrap algorithm.

Table 1 .
Average values of the two locality characteristics for male and female users.

Table 1 .
Average values of the two locality characteristics for male and female users.

Table 2 .
Number of nodes and intra-community trips, and the most predominant link categories in the typical communities selected.

Table 3 .
Names and percentages of the most predominant land use categories within convex hulls of typical communities selected.

Table 2 .
Number of nodes and intra-community trips, and the most predominant link categories in the typical communities selected.

Table 3 .
Names and percentages of the most predominant land use categories within convex hulls of typical communities selected.