Language Modeling on Location-Based Social Networks

: The popularity of mobile devices with GPS capabilities, along with the worldwide 1 adoption of social media, have created a rich source of text data combined with spatio-temporal 2 information. Text data collected from location-based social networks can be used to gain space- 3 time insights into human behavior and provide a view of time and space from the social media lens. 4 From a data modeling perspective: text, time, and space have different scales and representation 5 approaches; hence it is not trivial to jointly represent them in a unified model. Existing approaches 6 do not capture the sequential structure present in texts or the patterns that drive how text is 7 generated considering the spatio-temporal context at different levels of granularity. In this work 8 we present a neural language model architecture that allows us to represent time and space 9 as context for text generation at different granularities. We define the task of modeling text, 10 timestamps, and geo-coordinates as a spatio-temporal conditioned language model task. This 11 task definition allows us to employ the same evaluation methodology used in language modeling, 12 a traditional natural language processing task which considers the sequential structure of texts. 13 We conduct experiments over two datasets collected from location-based social networks Twitter 14 and Foursquare. Our experimental results show that each dataset has particular patterns for 15 language generation under spatio-temporal conditions at different granularities. Also, we present 16 qualitative analyses to show how the proposed model can be used to characterize urban places.


Introduction
Social networks play a crucial role nowadays in modern societies. From interests 20 and reviews to preferences and political opinions; it is imprinted in our everyday life. So- 21 cial networks such as Instagram, Facebook, Twitter, and Foursquare allow users to share 22 text data with spatio-temporal information (a timestamp and geo-coordinates). We refer 23 to these social networks as location-based social networks (LBSN). Text data generated 24 on location-based social networks is a set of records representing ⟨where, when, what⟩, 25 in which the where means a location's latitude-longitude geo-coordinates, the when is a 26 timestamp, and the what is the textual content. 27 Understanding patterns of spatio-temporal textual data generated on LBSN can help 28 us understand human mobility patterns [1,2] or when and where popular social activities 29 take place [3][4][5] in urban environments. In addition, spatio-temporal textual data from 30 LBSN has been successfully used to detect real-world events such as earthquakes [6,7] 31 or to predict events like civil unrest [8]. A better understanding of this type of data 32 could be beneficial in a wide range of scenarios. For instance, the STAPLES Center is a 33 multi-purpose arena in Los Angeles, California which holds different humans activities 34 like sporting events and concerts. Using "STAPLES Center" to annotate this location 35 could fail to reveal the complete purpose of the place; while using data from a LBSN 36 could discover spatio-temporal nuances of the human activities that take place on points 37 of interest like this. 38 One challenge related to modeling this kind of data is its multi-modality. Times- 39 tamps, geo-coordinates and textual data exhibit different magnitudes and representa- 40 tions schemes which makes it difficult to combine them effectively. Timestamps and or probabilistic models. Spatio-temporal patterns for text data generation should capture 48 patterns at different granularities such as hours, weeks, months, and years, for time or 49 blocks, neighborhoods and cities, for space. When considering the textual data, previous 50 works have modeled the text following a bag-of-words approach (see Section 2), ignoring 51 the sequential structure of texts. 52 The research question that guides this work is whether modeling time and space 53 at different granularities along with the sequential structure of texts can improve the 54 modeling of spatio-temporal conditioned text data. The main contributions of our 55 current work are to: 56 1.
propose a spatio-temporal conditioned neural language model architecture that  This document is organized as follows, in section 2 we provide a background of the 72 literature relevant to this work. In the first part of the section, we describe applications 73 that leverage spatio-temporal textual data from LBSN; after that, we delve into models 74 that jointly represent the three variables and highlight existing drawbacks in previous 75 approaches that need to be addressed. In section 3, first, we provide a background on 76 language modeling before presenting our problem formulation as a spatio-temporal 77 conditioned language modeling task. We provide a background of neural networks 78 for language modeling and finally describe the proposed neural language model archi- when are occurring, predict the incidence of events in the future. The common approach 133 is to use data from LBSN in conjunction with external sources to build prediction models.

134
For some events like criminal incidents [17][18][19] or civil unrests [8,19], predicting the 135 exact location with as much time in advance is paramount. A common approach is to 136 define features as indicators and train prediction models for spatial regions [17]. Allocation (LDA) [28], capable of learning the relationships between locations and words.

162
In the model, each word has an associated location. For generating words, the model 163 produces the word and also the location, in both cases with a multinomial distribution 164 depending on a topic that is generated by a Dirichlet distribution. Additionally, Sizov

165
[23] developed a model similar to the work of Wang et al. [22]. Rather than using a multi-  cessing [29,30] and graph node representation [31]. For spatio-temporal textual data, and Graph. In Recon, the problem is modeled as a relation reconstruction task between 192 the elements of the tuple ⟨time, location, text⟩ while in Graph; the goal is to learn repre-193 sentations such that the structure of a graph built from the tuples ⟨time, location, text⟩ is 194 preserved. In [5], Crossmap is extended to learn the embedded representation in a stream.

195
The authors propose two strategies based on life-decay learning and constrained learning  In Table 1, we present a summary of the works discussed in this section. Existing 207 approaches are based on topic modeling or embedding methods. Works following 208 the topic modeling approach are based on topic models such as Probabilistic Latent

209
Semantic Analysis [33] or Latent Dirichlet Allocation [28] and extend the models by  Also, no work models the sequential structure of texts.

219
An additional problem about modeling spatio-temporal text data, which is impor-220 tant to mention, is the evaluation framework. Building a reference dataset in this field 221 is complex. First, there is a temporal variable involved: this means that data should be 222 collected for a long time. Second, data is related to a specific region: this means that 223 using models in a new region would require collecting data from that region. We can 224 observe (see column Dataset in Table 1 geo-coordinates as well as the parameters selection. Language modeling is defined as the task of assigning a probability to a sequence 251 of words w: p(w) = p(w 0 , w 1 . . . w j−1 , w j ). State-of-the-art models for language mod-252 eling are based on neural networks. Typically, neural network language models are 253 constructed and trained as discriminative predictive models that learn to predict a 254 probability distribution p(w j /w 0 , w 1 . . . w j−1 ) for a given word conditioned on the pre-255 vious words in the sequence. These models are trained on a given corpus of docu-256 ments. The probability of a sequence of words p(w 0 . . . w j−1 , w j ) can be estimated with: Conditioned language modeling is defined as the task of assigning a probability 259 to a sequence of words given a context c: the probability of each word in the sequence is computed as: p(w j /c, w 0 , w 1 . . . w j−1 ).

261
Conditioned language models have applications in multiple natural language processing etc. In our case, the context will be a tuple of timestamp and coordinates. we require the resulting model to assign a probability to a text given the timestamp and 271 coordinates associated with that text.

272
More formally, let be H = {r 1 , ..., r n } a set of spatio-temporal annotated text records 273 (e.g., a tweet). Each r i is a tuple ⟨t i , l i , e i ⟩, where: t i is the timestamp associated with r i , l i 274 is a two-dimensional vector representing the location corresponding to r i , and e i denotes 275 the text in r i . Given that e i is a sequence of words w 0 . . . w n , assigning a probability to 276 w 0 . . . w n given ⟨t i , l i ⟩ can be written as p((w 0 , w 1 . . . , w n )/⟨t i , l i ⟩), which is an instance 277 of the conditioned language modeling task presented in Section 3.1. Transformer-based self-attention models. step t, h t is used as input to a feed-forward network that predicts the next token x t+1 .

293
The most popular architectures of RNN are the Long-Short Term Memory (LSTM) [44] 294 Spatio-temporal context representation component (Encoder)

Spatio-temporal records
Timestamps discretization <timestamp> <latitude,longitude> <text> Coordinates discretization  Figure 1. Model's Architecture. and the Gated Recurrent Unit (GRU) [45]. Both variants introduce mechanisms that 295 control the information flow between the hidden states representing the sequence.

Timestamps and geo-coordinates discretization 342
To discretize geo-coordinates and timestamps we use equal-size squared cells in 347 Figure 2 shows a hierarchy describing these discretizations. For spatial discretization, 348 we use equal-size spatial-cells using the spatial-coordinates as metric space. Figure 3   349 shows a hierarchy describing the squared-cell discretizations.

350
It is important to remark that our approach of representing contexts as discrete  Figure 2. Hierarchy of timestamps discretization.

370
In this section, we describe our experimental framework. The goal is to get a 371 better understanding of the patterns that guide language generation in spatio-temporal 372 contexts. In particular, looking at the data defined from tuples ⟨time, location, text⟩, 373 the model will be evaluated in a traditional language modeling task (i.e. using the 374 Perplexity metric). First, we describe the datasets. After that, we present the evaluation 375 methodology, then we show the experimental results and finally, we showcase studies 376 of real-world applications of the studied models.  Table 2).  Table   386 2).   Evaluation of language modeling is usually done using Perplexity [52]. Perplexity  we test: 1) a two layers GRU recurrent neural network [45] and 2) a transformer-based 425 two layer Decoder representation proposed in [51].

426
In Table 3 we show the results for Foursquare and in Table 4 for Twitter. For

Spatio-temporal granularities analysis 442
In this section, we study how modeling time and space at different granularities 443 influences the spatio-temporal conditioned language models. In Table 5  larger the spatial-cell, the best the results.

450
As a complement to the results in Table 5, in Table 6 we show the results with bigger 451 spatial-cells. We can see that instead of getting better results, Perplexity gets worst, with 452 indicates that the sweet point to get the best results is with spatial-cells between 0.008 453 and 0.016.

454
In Table 7  Angeles. This is due to that most of the Foursquare reports are generic texts generation 457 suggested by the application. These texts only differ in most of the cases on the place that 458 is checked-in, while the Twitter dataset is mostly free texts. About the spatio-temporal produced better results than the wider ones. We consider that this is due to texts being 464 correlated to places of interest where people report activities in Foursquare (restaurants 465 and small businesses) with a fine granularity.

466
As a complement to the results in Table 7, in Table 8   between individual words and different granularities of representation.

481
In Table 9   This type of analysis shows the utility of the spatio-temporal conditioned language 490 models trained over LBSN datasets to characterize human activities in urban areas.   while the word tonight, a more general term, is associated with the coarsest granularity.

498
In Figure 8 we show an example with the geo-coordinates of Venice Beach as spatial 499 context. We can observe how the word venice is associated with the finest level of spatial 500 discretization; while the word beach is associated with the second finest granularity, beach 501 is a more general term than venice, but also is only associated with coastal regions in a 502 city.  here will increase interest in the use of this mechanism in spatio-temporal domains.

517
In this work, we studied the problem of modeling spatio-temporal annotated textual 518 data. We studied how different granularities of time and space influence spatio-temporal 519 conditioned language generation on location-based social networks. We proposed a 520 neural language model architecture adaptable to different granularities of time and space.

521
A remarkable result of our experiments over two datasets from social networks Twitter 522 (Los Angeles) and Foursquare (New York) is that each dataset has its own optimal 523 granularity setting for spatio-temporal language generation. Since our proposed archi-524 tecture is adaptable to modeling time and space at different granularities, it is capable of 525 capturing patterns according to each dataset. These results directly answer our research 526 question by empirically demonstrating that an appropriate adjustment of temporal and 527 spatial granularities can benefit spatio-temporal language modeling/generation. On  reported in [10]. We downloaded the datasets from the link provided by the authors in (download) 554 and created our pre-processed versions that can be found in (download).

Context
Text Generated