# A Statistical Approach for Studying the Spatio-Temporal Distribution of Geolocated Tweets in Urban Environments

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Background and Related Work

## 3. Data

#### 3.1. Collection

`twitter4j`of

**,**

`Java``tweety`of MATLAB,

`streamR`of

**, and**

`R``tweepy`of

`Python`, among others, allow for researchers to perform this task. We used

**[51], the language and environment for statistical computing, and its package**

`R``tweet2r`[52] to download geolocated tweets.

`tweet2r`requires the definition of two parameters for the query: (1) a bounding box to establish the spatial scope and (2) a temporal window to set the period when R connects to the API. The downloading process builds files in GeoJSON format, and each file stores up to 3000 tweets. Since streaming collects approximately 1% of the overall activity [44,53,54,55], the gathered amount of data depends on the volume of usage of the social network in the city.

#### 3.2. Preprocessing

#### Human-Generated Tweets

#### 3.3. Datasets Construction for Statistical Analysis

- We obtain the hour of the day when people created those tweets, labelling each row with corresponding numbers $0,1,\cdots ,23$.
- We set, inside of the temporal window of data gathering, a study period, i.e., a start point ${t}_{{}_{0}}$ and an endpoint ${t}_{{}_{T+1}}$. It is necessary to ensure that the start point is at least 30 h after the lower boundary of the collecting window to allow for obtaining past information about the process. In addition, we assume that ${t}_{{}_{i}}$ denotes the timestamp of the i-th tweet, $i=1,2,\dots ,N$ where N is the total number of collected tweets. Then, by subtracting ${t}_{0}$ from ${t}_{i}$, we obtain the number of elapsed hours from start point until a user shared the i-th tweet. That process allows for defining another timestamp, represented by ${t}_{N}$, through applying the floor function: ${t}_{{N}_{i}}=\lfloor {t}_{{}_{i}}-{t}_{{}_{0}}\rfloor $. For instance, if ${t}_{{}_{0}}=$ ‘
`2017-07-30 00:00:00`’, ${t}_{{}_{T+1}}=$ ‘`2017-08-13 00:00:00`’ and if the timestamp for a particular tweet is ${t}_{{}_{i}}=$ ‘`2017-08-05 15:18:32`’, then the ${t}_{{}_{N}}$ values associated with that study period are between 0 and 335 h; the elapsed time for that tweet is $159.31$ h, and ${t}_{{N}_{i}}=159$.

#### 3.3.1. Temporal Dataset

`00:00`hour. Finally, the table includes variables related to the count of tweets in previous hours for identifying autoregressive and seasonal autoregressive schemas, the five last hours (${n}_{-1},{n}_{-2},\dots ,{n}_{-5}$) and the same hours as the day before (${n}_{-24},{n}_{-25},\dots ,{n}_{-29}$). Following our previous example, where ${t}_{{}_{0}}=$ ‘

`2017-07-30 00:00:00`’ and ${t}_{{}_{T+1}}=$ ‘

`2017-08-13 00:00:00`’, Table 1 shows an schema of a possible temporal dataset.

#### 3.3.2. Spatio-Temporal Dataset

**package sp [58]. Table 2 shows a schema of a possible spatio-temporal dataset, where $({x}_{{}_{{j}_{{}_{h}}}},{y}_{{}_{{j}_{{}_{h}}}},h)$ means the location of the j-th tweet shared at the hour of the day h.**

`R`#### 3.4. Dataset Biases

## 4. Statistical Framework and Methods

#### 4.1. Regression Models for Count Data

#### 4.2. Spatio-Temporal Analysis

## 5. Case Study

`2017-07-30 00:00:00`’ and ${t}_{{}_{T+1}}=$ ‘

`2017-08-13 00:00:00`’. This step provided a study period of 336 h, between 0 and 335. We then processed 3626, 64,404, and 59,472 tweets in each urban scenario. We finally transformed the coordinates to the local CRS EPSG:3763 for Lisbon, EPSG:27700 for London, and EPSG:2263 for Manhattan.

`04:00`, whose curve decreases rapidly and reaches negative values after $1.75$ km. In addition, those functional representations belonging to hours from midnight to early in the morning (light deep-sky-blue curves) are more irregular than those associated with later hours. The first two principal components retain $86.06$% and $7.84$% of the variability. As a functional principal component symbolizes variation over the average curve, the interpretation depends on this capability. Thus, since the first component takes negative values for distances up to 500 meters, approximately the variation of the mean of the hourly second-order summary statistics, the relationship is strongest for distances longer than this value, and the second component captures primarily variations in the hourly summaries up to $1.5$ km. Panel (c) of Figure 8 reveals that the spatial distribution, of the shared events at

`04:00`, is quite dissimilar in comparison with the behavior of the distributions for the other hours of the day. There are approximately three groups of hours for human activities, thus: (1) between

`00:00`and

`01:00`, (2) from

`02:00`to

`07:00`, and (3) at the rest of the hours.

`00:00`to

`02:00`, another for

`03:00`to

`05:00`, and the other two for later hours.

## 6. Discussion

`19:00`to the prematch tweets of the Portuguese local soccer league between Benfica and Braga. Our approach also involved the estimation of parameters associated with autoregressive trends. The findings highlight that those temporal effects are also significant to explain the number of tweets and can be meaningful as a measure to anticipate the pressures of increasing the amount of human activity.

`08:00`to midnight and highly unlikely between midnight and early hours in the morning. We also found that the measures of spatial correlation through the time tend to be more homogeneous in short distances, at 500 m, 3 km, and $2.8$ km for Lisbon, London, and Manhattan, respectively. These values differ significantly with travel distance of 1.5 km reported in human mobility studies [5,6,7]. The behavior of the smoothed second-order summary statistics showed more uniform curves in London and more erratic curves in Lisbon, which might be an effect of the number of gathered tweets in each city in the two-week period. The analysis also revealed that the places where people share content in Twitter are in the same areas at the same hours, which is a common feature in the social conduct of humans. The irregular shape of the curves for dawn hours retained most of the variability of the L-Besag’s functions; as a consequence, this covered other relevant spatial effects that occur in different periods of the day.

## 7. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## Abbreviations

FPCA | Functional principal component analysis |

API | Application programming interface |

LBSN | Location-based social networks |

SOM | Self-organizing maps |

ICA | Independent Component Analysis |

DBSCAN | Density-based spatial clustering of Applications with Noise |

OLS | Ordinary least-squares |

GAM | Generalized additive models |

LISA | Local indicators of spatial association |

STSS | Space-time scan statistics |

GMM | Gaussian mixture models |

KDE | Kernel density estimation |

LDA | Latent Dirichlet allocation |

MAUP | modifiable area unit problem |

CRS | Coordinate reference system |

FDA | Functional data analysis |

GLM | Generalized linear models |

IWLS | Iteratively weighted least-squares |

BIC | Bayesian information criterion |

CSR | Complete spatial randomness |

PCA | Principal components analysis |

RMSE | Root mean squared error |

MAE | Mean absolute error |

MAPE | Mean absolute percentage error |

sMAPE | Symmetric mean absolute percentage error |

INLA | Integrated nested Laplace approximations |

## Appendix A

## References

- França, U.; Sayama, H.; Mcswiggen, C.; Daneshvar, R.; Bar-Yam, Y. Visualizing the “heartbeat” of a city with tweets. Complexity
**2015**, 21, 280–287. [Google Scholar] [CrossRef] [Green Version] - Celikten, E.; Falher, G.L.; Mathioudakis, M. Modeling Urban Behavior by Mining Geotagged Social Data. IEEE Trans. Big Data
**2017**, 3, 220–233. [Google Scholar] [CrossRef] - Jiang, S.; Ferreira, J.; González, M.C. Clustering daily patterns of human activities in the city. Data Min. Knowl. Discov.
**2012**, 25, 478–510. [Google Scholar] [CrossRef] [Green Version] - Tasse, D.; Hong, J.I. Using social media data to understand cities. In Proceedings of the NSF Workshop on Big Data and Urban Informatics, Chicago, IL, USA, 11–12 August 2014; pp. 64–79. [Google Scholar]
- Simini, F.; González, M.; Maritan, A.; Barabási, A.L. A universal model for mobility and migration patterns. Nature
**2012**, 484, 96–100. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Song, C.; Qu, Z.; Blumm, N.; Barabási, A.L. Limits of predictability in human mobility. Science
**2010**, 327, 1018–1021. [Google Scholar] [CrossRef] - González, M.; Hidalgo, C.; Barabasi, A.L. Understanding individual human mobility patterns. Nature
**2008**, 453, 779–782. [Google Scholar] [CrossRef] [PubMed] - Brockmann, D.; Hufnagel, L.; Geisel, T. The scaling laws of human travel. Nature
**2006**, 439, 462–465. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Batty, M. Cities as Complex Systems: Scaling, Interaction, Networks, Dynamics and Urban Morphologies. In Encyclopedia of Complexity and Systems Science; Meyers, R.A., Ed.; Springer: New York, NY, USA, 2009; pp. 1041–1071. [Google Scholar] [Green Version]
- Jackson, M.C. Social systems theory and practice: The need for a critical approach. Int. J. Gen. Syst.
**1985**, 10, 135–151. [Google Scholar] [CrossRef] - United Nations. World Urbanization Prospects: The 2014 Revision, Highlights; Technical Report ST/ESA/SER.A/ 352; Department of Economic and Social Affairs, Population Division: New York, NY, USA, 2014. [Google Scholar]
- Vespignani, A. Predicting the behavior of techno-social systems. Science
**2009**, 325, 425–428. [Google Scholar] [CrossRef] - Thériault, M.; Des Rosiers, F. Modeling Urban Dynamics; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
- Silva, T.H.; de Melo, P.O.S.V.; Almeida, J.M.; Loureiro, A.A.F. Social Media as a Source of Sensing to Study City Dynamics and Urban Social Behavior: Approaches, Models, and Opportunities. In Ubiquitous Social Media Analysis; Springer: Berlin/Heidelberg, Germany, 2013; pp. 63–87. [Google Scholar]
- Huang, Q. Mining online footprints to predict user’s next location. Int. J. Geogr. Inf. Sci.
**2016**, 31, 523–541. [Google Scholar] [CrossRef] - Gao, H.; Liu, H. Data Analysis on Location-Based Social Networks. In Mobile Social Networking; Springer: New York, NY, USA, 2013; pp. 165–194. [Google Scholar] [Green Version]
- Ferrari, L.; Rosi, A.; Mamei, M.; Zambonelli, F. Extracting urban patterns from location-based social networks. In Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Location-Based Social Networks, Chicago, IL, USA, 1 November 2011; ACM: New York, NY, USA, 2011; pp. 9–16. [Google Scholar]
- Toole, J.; de Montjoye, Y.A.; González, M.; Pentland, A.S. Modeling and Understanding Intrinsic Characteristics of Human Mobility. In Social Phenomena, Computational Social Sciences; Gonçalves, B., Perra, N., Eds.; Springer: Cham, Switzerland, 2015; pp. 15–35. [Google Scholar]
- Frias-Martinez, V.; Soto, V.; Hohwald, H.; Frias-Martinez, E. Characterizing Urban Landscapes Using Geolocated Tweets. In Proceedings of the 2012 ASE/IEEE International Conference on Social Computing and 2012 ASE/IEEE International Conference on Privacy, Security, Risk and Trust, Amsterdam, The Netherlands, 3–5 September 2012; IEEE Computer Society: Washington, DC, USA, 2012. SOCIALCOM-PASSAT ’12. pp. 239–248. [Google Scholar] [CrossRef]
- Wakamiya, S.; Lee, R.; Sumiya, K. Crowd-based urban characterization: Extracting crowd behavioral patterns in urban areas from twitter. In Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Location-Based Social Networks, Chicago, IL, USA, 1 November 2011; ACM: New York, NY, USA, 2011; pp. 77–84. [Google Scholar]
- Stimmel, C.L. Building Smart Cities: Analytics, ICT, and Design Thinking; CRC Press: Boca Raton, FL, USA, 2015. [Google Scholar]
- Zheng, Y.; Capra, L.; Wolfson, O.; Yang, H. Urban Computing: Concepts, Methodologies, and Applications. ACM Trans. Intell. Syst. Technol.
**2014**, 5, 38. [Google Scholar] [CrossRef] - Steiger, E.; de Albuquerque, J.P.; Zipf, A. An Advanced Systematic Literature Review on Spatiotemporal Analyses of Twitter Data. Trans. GIS
**2015**, 19, 809–834. [Google Scholar] [CrossRef] [Green Version] - Steiger, E.; Westerholt, R.; Resch, B.; Zipf, A. Twitter as an indicator for whereabouts of people? Correlating Twitter with UK census data. Comput. Environ. Urban Syst.
**2015**, 54, 255–265. [Google Scholar] [CrossRef] - Kaplan, A.M.; Haenlein, M. Users of the world, unite! The challenges and opportunities of Social Media. Bus. Horiz.
**2010**, 53, 59–68. [Google Scholar] [CrossRef] - Nummi, P. Social Media Data Analysis in Urban e-Planning. Int. J. E-Plan. Res.
**2017**, 6, 18–31. [Google Scholar] [CrossRef] - Gandomi, A.; Haider, M. Beyond the hype: Big data concepts, methods, and analytics. Int. J. Inf. Manag.
**2015**, 35, 137–144. [Google Scholar] [CrossRef] [Green Version] - Thakur, G.; Sims, K.; Mao, H.; Piburn, J.; Sparks, K.; Urban, M.; Stewart, R.; Weber, E.; Bhaduri, B. Utilizing Geo-located Sensors and Social Media for Studying Population Dynamics and Land Classification. In Human Dynamics Research in Smart and Connected Communities; Springer International Publishing: Cham, Switzerland, 2018; pp. 13–40. [Google Scholar]
- Huang, Y.; Li, Y.; Shan, J. Spatial-Temporal Event Detection from Geo-Tagged Tweets. ISPRS Int. J. Geo-Inf.
**2018**, 7, 150. [Google Scholar] [CrossRef] - García-Palomares, J.C.; Salas-Olmedo, M.H.; Moya-Gómez, B.; Condeço-Melhorado, A.; Gutiérrez, J. City dynamics through Twitter: Relationships between land use and spatiotemporal demographics. Cities
**2018**, 72, 310–319. [Google Scholar] [CrossRef] - Resch, B.; Usländer, F.; Havas, C. Combining machine-learning topic models and spatiotemporal analysis of social media data for disaster footprint and damage assessment. Cartogr. Geogr. Inf. Sci.
**2017**, 45, 362–376. [Google Scholar] [CrossRef] - De Albuquerque, J.P.; Herfort, B.; Brenning, A.; Zipf, A. A geographic approach for combining social media and authoritative data towards identifying useful information for disaster management. Int. J. Geogr. Inf. Sci.
**2015**, 29, 667–689. [Google Scholar] [CrossRef] - Kim, E.; Helal, S.; Cook, D. Human activity recognition and pattern discovery. IEEE Pervasive Comput./IEEE Comput. Soc. IEEE Commun. Soc.
**2010**, 9, 48–53. [Google Scholar] [CrossRef] [PubMed] - Goodchild, M.F. Citizens as sensors: The world of volunteered geography. GeoJournal
**2007**, 69, 211–221. [Google Scholar] [CrossRef] - Frias-Martinez, V.; Frias-Martinez, E. Spectral clustering for sensing urban land use using Twitter activity. Eng. Appl. Artif. Intell.
**2014**, 35, 237–245. [Google Scholar] [CrossRef] [Green Version] - Soliman, A.; Soltani, K.; Yin, J.; Padmanabhan, A.; Wang, S. Social sensing of urban land use based on analysis of Twitter users’ mobility patterns. PLoS ONE
**2017**, 12, e0181657. [Google Scholar] [CrossRef] [PubMed] - Resch, B.; Summa, A.; Zeile, P.; Strube, M. Citizen-Centric Urban Planning through Extracting Emotion Information from Twitter in an Interdisciplinary Space-Time-Linguistics Algorithm. Urban Plan.
**2016**, 1, 114. [Google Scholar] [CrossRef] [Green Version] - Hasan, S.; Zhan, X.; Ukkusuri, S.V. Understanding urban human activity and mobility patterns using large-scale location-based data from online social media. In Proceedings of the 2nd ACM SIGKDD International Workshop on Urban Computing, Chicago, IL, USA, 11–14 August 2013; ACM: New York, NY, USA, 2013; p. 6. [Google Scholar]
- Huang, W.; Li, S. Understanding human activity patterns based on space-time-semantics. ISPRS J. Photogramm. Remote Sens.
**2016**, 121, 1–10. [Google Scholar] [CrossRef] - Patel, N.N.; Stevens, F.R.; Huang, Z.; Gaughan, A.E.; Elyazar, I.; Tatem, A.J. Improving Large Area Population Mapping Using Geotweet Densities. Trans. GIS
**2016**, 21, 317–331. [Google Scholar] [CrossRef] - Huang, Q.; Wong, D.W.S. Activity patterns, socioeconomic status and urban spatial structure: What can social media data tell us? Int. J. Geogr. Inf. Sci.
**2016**, 30, 1873–1898. [Google Scholar] [CrossRef] - Cheng, T.; Wicks, T. Event detection using Twitter: A spatio-temporal approach. PLoS ONE
**2014**, 9, e97807. [Google Scholar] [CrossRef] - Shi, Y.; Deng, M.; Yang, X.; Liu, Q.; Zhao, L.; Lu, C.T. A Framework for Discovering Evolving Domain Related Spatio-Temporal Patterns in Twitter. ISPRS Int. J. Geo-Inf.
**2016**, 5, 193. [Google Scholar] [CrossRef] - Steiger, E.; Resch, B.; Zipf, A. Exploration of spatiotemporal and semantic clusters of Twitter data using unsupervised neural networks. Int. J. Geogr. Inf. Sci.
**2015**, 30, 1694–1716. [Google Scholar] [CrossRef] - Bakerman, J.; Pazdernik, K.; Wilson, A.; Fairchild, G.; Bahran, R. Twitter Geolocation. ACM Trans. Knowl. Discov. Data
**2018**, 20, 1–17. [Google Scholar] [CrossRef] - Diggle, P.J. Statistical Analysis of Spatial and Spatio-Temporal Point Patterns; CRC Press: Boca Raton, FL, USA, 2013. [Google Scholar]
- Liboschik, T.; Fokianos, K.; Fried, R. tscount: An R Package for Analysis of Count Time Series Following Generalized Linear Models. J. Stat. Softw.
**2017**, 82. [Google Scholar] [CrossRef] - Baddeley, A.; Rubak, E.; Turner, R. Spatial Point Patterns: Methodology and Applications with R; CRC Press: Boca Raton, FL, USA, 2015. [Google Scholar]
- Illian, J.; Penttinen, A.; Stoyan, H.; Stoyan, D. Statistical Analysis and Modelling of Spatial Point Patterns; John Wiley & Sons: Hoboken, NJ, USA, 2008; Volume 70. [Google Scholar]
- Lee, D.J.; Zhu, Z.; Toscas, P. Spatio-temporal functional data analysis for wireless sensor networks data. Environmetrics
**2015**, 26, 354–362. [Google Scholar] [CrossRef] [Green Version] - R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2018. [Google Scholar]
- Aragó, P.; Juan, P.; Staab, J. tweet2r: Twitter Collector for R and Export to ’SQLite’, ’postGIS’ and ’GIS’ Format, 2018. R Package Version 1.1. Available online: https://cran.r-project.org/web/packages/tweet2r/tweet2r.pdf (accessed on 15 November 2018).
- Morstatter, F.; Pfeffer, J.; Liu, H.; Carley, K.M. Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose. ICWSM. 2013. Available online: https://arxiv.org/abs/1306.5204 (accessed on 15 November 2018).
- Hawelka, B.; Sitko, I.; Beinat, E.; Sobolevsky, S.; Kazakopoulos, P.; Ratti, C. Geo-located Twitter as proxy for global mobility patterns. Cartogr. Geogr. Inf. Sci.
**2014**, 41, 260–271. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Steinert-Threkeld, Z.C. Twitter as Data; Cambridge University Press: Cambridge, UK, 2018. [Google Scholar]
- Yin, J.; Gao, Y.; Du, Z.; Wang, S. Exploring multi-scale spatiotemporal twitter user mobility patterns with a visual-analytics approach. ISPRS Int. J. Geo-Inf.
**2016**, 5, 187. [Google Scholar] [CrossRef] - Tsou, M.H.; Zhang, H.; Jung, C.T. Identifying Data Noises, User Biases, and System Errors in Geo-tagged Twitter Messages (Tweets). arXiv, 2017; arXiv:1712.02433. [Google Scholar]
- Pebesma, E.J.; Bivand, R.S. Classes and methods for spatial data in R. R News
**2005**, 5, 9–13. [Google Scholar] - Montgomery, D.C.; Peck, E.A.; Vining, G.G. Introduction to Linear Regression Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2012; Volume 821. [Google Scholar]
- Myers, R.H.; Montgomery, D.C.; Vining, G.G.; Robinson, T.J. Generalized Linear Models: With Applications in Engineering and the Sciences; John Wiley & Sons: Hoboken, NJ, USA, 2012; Volume 791. [Google Scholar]
- Nelder, J.A.; Wedderburn, R.W.M. Generalized Linear Models. J. R. Stat. Soc. Ser. A
**1972**, 135, 370. [Google Scholar] [CrossRef] - Dobson, A.J.; Barnett, A.G. An Introduction to Generalized Linear Models, 4th ed.; Chapman & Hall/CRC Texts in Statistical Science; Chapman and Hall/CRC: London, UK, 2008. [Google Scholar]
- Hilbe, J.M. Log negative binomial regression as a generalized linear model. Grad. Coll. Comm. Stat.
**1993**, 1024, 1–16. [Google Scholar] - McCullagh, P.; Nelder, J.A. Generalized Linear Models; CRC Press: Boca Raton, FL, USA, 1989; Volume 37. [Google Scholar]
- Hardin, J.W.; Hilbe, J.M. Generalized Linear Models and Extensions; Stata Press: College Station, TX, USA, 2012. [Google Scholar]
- Katsouyanni, K.; Schwartz, J.; Spix, C.; Touloumi, G.; Zmirou, D.; Zanobetti, A.; Wojtyniak, B.; Vonk, J.; Tobias, A.; Pönkä, A.; et al. Short term effects of air pollution on health: A European approach using epidemiologic time series data: The APHEA protocol. J. Epidemiol. Community Health
**1996**, 50, S12–S18. [Google Scholar] [CrossRef] [PubMed] - Venables, W.N.; Ripley, B.D. Modern Applied Statistics with S; Springer: New York, NY, USA, 2002. [Google Scholar] [CrossRef]
- Cameron, A.C.; Trivedi, P.K. Econometric models based on count data. Comparisons and applications of some estimators and tests. J. Appl. Econom.
**1986**, 1, 29–53. [Google Scholar] [CrossRef] - Cressie, N. Statistics for Spatial Data; Wiley series in probability and mathematical statistics: Applied probability and statistics; John Wiley & Sons: Hoboken, NJ, USA, 1993. [Google Scholar]
- O’Sullivan, D.; Unwin, D. Geographic Information Analysis; Wiley: Hoboken, NJ, USA, 2014. [Google Scholar]
- Dale, M.; Fortin, M. Spatial Analysis: A Guide For Ecologists; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
- Illian, J.; Benson, E.; Crawford, J.; Staines, H. Principal component analysis for spatial point processes—Assessing the appropriateness of the approach in an ecological context. In Case Studies in Spatial Point Process Modeling; Lecture Notes in Statistics; Springer: New York, NY, USA, 2006; pp. 135–150. [Google Scholar]
- Kokoszka, P.; Reimherr, M. Introduction to Functional Data Analysis; Chapman & Hall/CRC Texts in Statistical Science; CRC Press: Boca Raton, FL, USA, 2017. [Google Scholar]
- Ramsay, J.; Hooker, G.; Graves, S. Functional Data Analysis with R and MATLAB; Springer: New York, NY, USA, 2009. [Google Scholar]
- Ramsay, J.; Silverman, B. Functional Data Analysis; Springer Series in Statistics; Springer: New York, NY, USA, 2005. [Google Scholar]
- Husson, F.; Lê, S.; Pags, J. Exploratory Multivariate Analysis by Example Using R; CRC Press: Boca Raton, FL, USA, 2017. [Google Scholar]
- Scott, D.W. Multivariate Density Estimation: Theory, Practice, and Visualization; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
- Blangiardo, M.; Cameletti, M. Spatial and Spatio-Temporal Bayesian Models with R-INLA; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
- Rue, H.; Martino, S.; Chopin, N. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. R. Stat. Soc. Ser. B (Stat. Methodol.)
**2009**, 71, 319–392. [Google Scholar] [CrossRef] [Green Version] - Illian, J.B.; Sørbye, S.H.; Rue, H. A toolbox for fitting complex spatial point process models using integrated nested Laplace approximation (INLA). Ann. Appl. Stat.
**2012**, 6, 1499–1530. [Google Scholar] [CrossRef] [Green Version] - Bivand, R.S.; Gómez-Rubio, V.; Rue, H. Spatial Data Analysis withR-INLAwith Some Extensions. J. Stat. Softw.
**2015**, 63. [Google Scholar] [CrossRef] - Meyer, S.; Held, L.; Höhle, M. Spatio-Temporal Analysis of Epidemic Phenomena Using the R Package surveillance. J. Stat. Softw.
**2017**, 77. [Google Scholar] [CrossRef] [Green Version]

**Figure 7.**Observed temporal variation of geolocated tweets (black dots) together with the fitted variation from a negative binomial regression model (deep-sky-blue lines).

Date | ${\mathit{t}}_{{}_{\mathit{N}}}$ | n | Autoregressive | Seasonal Autoregressive | Day-of-the-Week | Hour-of-the-Day | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

${\mathit{n}}_{-1}$ | … | ${\mathit{n}}_{-5}$ | ${\mathit{n}}_{-24}$ | … | ${\mathit{n}}_{-29}$ | Tuesday | … | Sunday | 00:00 | … | 23:00 | |||

2017-07-30 00:00 | 0 | ${n}_{{}_{0}}$ | ${n}_{{}_{-1}}$ | … | ${n}_{{}_{-5}}$ | ${n}_{{}_{-24}}$ | … | ${n}_{{}_{-29}}$ | 0 | … | 1 | 0 | … | 0 |

2017-07-30 01:00 | 1 | ${n}_{{}_{1}}$ | ${n}_{{}_{0}}$ | … | ${n}_{{}_{-4}}$ | ${n}_{{}_{-23}}$ | … | ${n}_{{}_{-28}}$ | 0 | … | 1 | 1 | … | 0 |

2017-07-30 02:00 | 2 | ${n}_{{}_{2}}$ | ${n}_{{}_{1}}$ | … | ${n}_{{}_{-3}}$ | ${n}_{{}_{-22}}$ | … | ${n}_{{}_{-27}}$ | 0 | … | 1 | 0 | … | 0 |

⋮ | ⋮ | ⋮ | ⋮ | ⋱ | ⋮ | ⋮ | ⋱ | ⋮ | ⋮ | ⋱ | ⋮ | ⋮ | ⋱ | ⋮ |

2017-mm-dd hh:00 | h | ${n}_{{}_{h}}$ | ${n}_{{}_{h-1}}$ | … | ${n}_{{}_{h-5}}$ | ${n}_{{}_{h-24}}$ | … | ${n}_{{}_{h-29}}$ | 0 | … | 0 | 0 | … | 0 |

⋮ | ⋮ | ⋮ | ⋮ | ⋱ | ⋮ | ⋮ | ⋱ | ⋮ | ⋮ | ⋱ | ⋮ | ⋮ | ⋱ | ⋮ |

2017-08-12 23:00 | 335 | ${n}_{{}_{335}}$ | ${n}_{{}_{334}}$ | … | ${n}_{{}_{330}}$ | ${n}_{{}_{311}}$ | … | ${n}_{{}_{306}}$ | 0 | … | 0 | 0 | … | 1 |

East | North | Hour |
---|---|---|

${x}_{{}_{{1}_{{}_{0}}}}$ | ${y}_{{}_{{1}_{{}_{0}}}}$ | 0 |

${x}_{{}_{{2}_{{}_{0}}}}$ | ${y}_{{}_{{2}_{{}_{0}}}}$ | 0 |

⋮ | ⋮ | ⋮ |

${x}_{{}_{{n}_{{}_{0}}}}$ | ${y}_{{}_{{n}_{{}_{0}}}}$ | 0 |

${x}_{{}_{{1}_{{}_{1}}}}$ | ${y}_{{}_{{1}_{{}_{1}}}}$ | 1 |

${x}_{{}_{{2}_{{}_{1}}}}$ | ${y}_{{}_{{2}_{{}_{1}}}}$ | 1 |

⋮ | ⋮ | ⋮ |

${x}_{{}_{{n}_{{}_{1}}}}$ | ${y}_{{}_{{n}_{{}_{1}}}}$ | 1 |

⋮ | ⋮ | ⋮ |

${x}_{{}_{{1}_{{}_{23}}}}$ | ${y}_{{}_{{1}_{{}_{23}}}}$ | 23 |

${x}_{{}_{{2}_{{}_{23}}}}$ | ${y}_{{}_{{2}_{{}_{23}}}}$ | 23 |

⋮ | ⋮ | ⋮ |

${x}_{{}_{{n}_{{}_{23}}}}$ | ${y}_{{}_{{n}_{{}_{23}}}}$ | 23 |

Metropolitan Area | Lisbon | London | New York City | |
---|---|---|---|---|

Bounding box | (Left, Bottom) | ($-9.503,38.35$) | ($-0.516,51.30$) | ($-73.995,40.523$) |

(Right, Top) | ($-8.4925,39$) | ($0.36,51.69$) | ($-73.695,40.923$) | |

Number of collected tweets | Total | 213,253 | 1,084,059 | 1,370,963 |

No geolocated | 198,418 | 928,197 | 1,094,420 | |

Clean | 11,817 | 87,448 | 119,802 |

Test | Lisbon | London | Manhattan | |||
---|---|---|---|---|---|---|

Likelihood ratio ($LR$) | $26.25$ | *** | $165.02$ | *** | $281.97$ | *** |

Deviance (D) | $363.77$ | * | $397.32$ | *** | $388.88$ | ** |

**Table 5.**Estimated regression coefficients and 95% confidence intervals in the fitted negative binomial regression models for the number of geolocated tweets per hour.

(a) Lisbon | (b) London | (c) Manhattan | ||||||||
---|---|---|---|---|---|---|---|---|---|---|

Parameter | Estimate | 95% CI | Parameter | Estimate | 95% CI | Parameter | Estimate | 95% CI | ||

Intercept | 2.062 | $(1.952,2.172)$ | Intercept | 3.803 | $(3.721,3.884)$ | Intercept | 3.965 | $(3.823,4.108)$ | ||

Tuesday | $0.237$ | $(0.112,0.362)$ | Thursday | 0.07 | $(0.027,0.112)$ | 01:00 | $-0.321$ | $(-0.455,-0.187)$ | ||

Wednesday | 0.239 | $(0.113,0.364)$ | Friday | 0.125 | $(0.08,0.17)$ | 02:00 | $-0.698$ | $(-0.854,-0.542)$ | ||

Thursday | 0.197 | $(0.069,0.325)$ | Saturday | 0.174 | $(0.122,0.226)$ | 03:00 | $-0.813$ | $(-0.984,-0.644)$ | ||

02:00 | $-1.368$ | $(-1.85,-0.934)$ | Sunday | 0.156 | $(0.106,0.206)$ | 04:00 | $-0.618$ | $(-0.789,-0.447)$ | ||

03:00 | $-1.986$ | $(-2.604,-1.458)$ | 01:00 | $-0.291$ | $(-0.414,-0.169)$ | 05:00 | $-0.252$ | $(-0.416,-0.089)$ | ||

04:00 | $-2.536$ | $(-3.333,-1.893)$ | 02:00 | $-0.859$ | $(-1.004,-0.717)$ | 06:00 | 0.318 | $(0.166,0.47)$ | ||

05:00 | $-1.523$ | $(-1.977,-1.115)$ | 03:00 | $-1.055$ | $(-1.214,-0.9)$ | 07:00 | 0.697 | $(0.555,0.839)$ | ||

06:00 | $-1.033$ | $(-1.393,-0.698)$ | 04:00 | $-0.741$ | $(-0.882,-0.603)$ | 08:00 | 0.824 | $(0.693,0.956)$ | ||

11:00 | 0.531 | $(0.321,0.741)$ | 06:00 | 0.685 | $(0.578,0.791)$ | 09:00 | 0.837 | $(0.714,0.959)$ | ||

12:00 | 0.736 | $(0.53,0.942)$ | 07:00 | 1.019 | $(0.909,1.129)$ | 10:00 | 0.848 | $(0.728,0.968)$ | ||

13:00 | 0.585 | $(0.374,0.795)$ | 08:00 | 1.077 | $(0.958,1.196)$ | 11:00 | 0.955 | $(0.835,1.076)$ | ||

14:00 | 0.723 | $(0.515,0.929)$ | 09:00 | 1.078 | $(0.954,1.203)$ | 12:00 | 0.869 | $(0.739,0.999)$ | ||

15:00 | 0.712 | $(0.505,0.919)$ | 10:00 | 1.163 | $(1.037,1.289)$ | 13:00 | 0.791 | $(0.661,0.922)$ | ||

16:00 | 0.865 | $(0.653,1.076)$ | 11:00 | 1.289 | $(1.165,1.412)$ | 14:00 | 0.841 | $(0.714,0.968)$ | ||

17:00 | 0.751 | $(0.533,0.97)$ | 12:00 | 1.33 | $(1.204,1.456)$ | 15:00 | 0.82 | $(0.691,0.95)$ | ||

18:00 | 0.932 | $(0.725,1.14)$ | 13:00 | 1.251 | $(1.12,1.382)$ | 16:00 | 0.884 | $(0.756,1.013)$ | ||

19:00 | 1.161 | $(0.959,1.365)$ | 14:00 | 1.201 | $(1.073,1.33)$ | 17:00 | 0.919 | $(0.787,1.052)$ | ||

20:00 | 1.144 | $(0.941,1.348)$ | 15:00 | 1.292 | (1.169,1.414) | 18:00 | 0.976 | $(0.837,1.114)$ | ||

21:00 | 1.036 | $(0.825,1.249)$ | 16:00 | 1.37 | (1.249,1.491) | 19:00 | 0.795 | $(0.645,0.945)$ | ||

22:00 | 0.731 | $(0.513,0.948)$ | 17:00 | 1.496 | (1.371,1.621) | 20:00 | 0.731 | $(0.586,0.877)$ | ||

23:00 | 0.471 | $(0.229,0.711)$ | 18:00 | 1.401 | (1.263,1.54) | 21:00 | 0.711 | $(0.577,0.846)$ | ||

${n}_{{}_{-5}}$ | $-0.018$ | $(-0.026,-0.01)$ | 19:00 | 1.327 | (1.188,1.466) | 22:00 | 0.572 | $(0.445,0.699)$ | ||

20:00 | 1.248 | $(1.111,1.384)$ | 23:00 | 0.354 | (0.233,0.474) | |||||

21:00 | 1.119 | $(0.991,1.247)$ | ${n}_{{}_{-1}}$ | 0.002 | (0.001,0.003) | |||||

22:00 | 0.998 | $(0.883,1.114)$ | ${n}_{{}_{-2}}$ | 0.001 | (0.000,0.002) | |||||

23:00 | 0.646 | $(0.537,0.755)$ | ||||||||

${n}_{{}_{-1}}$ | 0.002 | $(0.001,0.002)$ | ||||||||

${n}_{{}_{-3}}$ | 0.001 | $(0.0002,0.001)$ | ||||||||

${n}_{{}_{-5}}$ | $-0.001$ | $(-0.001,-0.0003)$ |

**Table 6.**Forecast accuracy evaluation of the fitted negative binomial regression models for the number of geolocated tweets per hour.

City | Pearson’s Correlation | $\mathit{RMSE}$ | $\mathit{MAE}$ | $\mathit{MAPE}$ | $\mathit{sMAPE}$ |
---|---|---|---|---|---|

Lisbon | $0.83$ | $3.95$ | $3.07$ | $79.06$ | $65.30$ |

London | $0.97$ | $32.71$ | $22.23$ | $19.89$ | $18.89$ |

Manhattan | $0.98$ | $20.17$ | $15.54$ | $19.15$ | $16.76$ |

**Table 7.**Distance parameters for estimating the second-order summary statistics for the hourly multitype spatial point patterns of tweets in the three studied cities

City | Length of the Shorter Side | $1/4$ of the Length | ${\mathit{r}}_{{}_{\mathit{m}}}$ | m |
---|---|---|---|---|

Lisbon | 11,530.11 | 2882.53 | 2875 | 115 |

London | 44,819.03 | 11,204.76 | 11,200 | 449 |

Manhattan | 30,153.90 | 7533.96 | 7525 | 302 |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Santa, F.; Henriques, R.; Torres-Sospedra, J.; Pebesma, E.
A Statistical Approach for Studying the Spatio-Temporal Distribution of Geolocated Tweets in Urban Environments. *Sustainability* **2019**, *11*, 595.
https://doi.org/10.3390/su11030595

**AMA Style**

Santa F, Henriques R, Torres-Sospedra J, Pebesma E.
A Statistical Approach for Studying the Spatio-Temporal Distribution of Geolocated Tweets in Urban Environments. *Sustainability*. 2019; 11(3):595.
https://doi.org/10.3390/su11030595

**Chicago/Turabian Style**

Santa, Fernando, Roberto Henriques, Joaquín Torres-Sospedra, and Edzer Pebesma.
2019. "A Statistical Approach for Studying the Spatio-Temporal Distribution of Geolocated Tweets in Urban Environments" *Sustainability* 11, no. 3: 595.
https://doi.org/10.3390/su11030595