The Rio Olympic Games: A Look into City Dynamics through the Lens of Twitter Data

: Olympic Games have a huge impact on the cities where they are held, both during the actual celebration of the event and before and after it. This study presents a new approach based on spatial analysis, GIS, and data coming from Location Based Social Networks to model the spatiotemporal dimension of impacts associated with the Rio 2016 Olympic Games. Geolocalized data from Twitter are used to analyze the activity pattern of users from two different viewpoints. The first monitors the activity of Twitter users during the event -the arrival of visitors, where they came from, and the use resident and tourist made of different areas of the city. The second assesses the spatiotemporal use of the city by Twitter users before the event, compared to the use during and after the event. The results not only reveal which spaces were the most used while the Games were being held but also changes in the urban dynamics after the Games. Both approaches can be used to assess the impacts of mega-events and to improve the management and allocation of urban resources such as transport and public services infrastructure.


Introduction
Mega-events can be defined as events that attract a large number of visitors, have a large mediated reach, come with high costs and have a large impact on the built environment and the host population [1]. The importance of sporting, cultural, or political mega-events is confirmed by the competition between cities to host them [2]. A substantial body of literature about mega-events has risen over recent decades, both theoretical [3,4] and empirical, based on case studies [5][6][7][8][9]. These studies have mainly focused on the impact of mega-events from different perspectives such as economic, tourism and commercial; infrastructure and physical resources; political; sport and recreation; environmental; and socio-cultural [10][11]7].
Most previous studies conclude that mega-events, especially mega-sporting events, benefit host regions in terms of city branding, urban regeneration, and international investment and creating substantial and long-term economic impacts [12]. The Olympic Games, in particular, constitute an opportunity to transform urban areas that have become obsolete in terms of use, for example, industrial areas, into areas related to the service economy [13 -15]. Despite all these studies, there is limited knowledge about the effects of mega-events (particularly the Olympic Games) on the activity levels of the host city and in areas not only restricted to the event facilities. This paper aims to analyze the Rio 2016 Olympic Games and its influence on the spatiotemporal dynamics of the host city. For that purpose, we will explore the potential of Location Based Social Networks (LBSN) and particularly Twitter data. The following research questions will be addressed: • What was the impact of the Olympic Games on the number of Twitter users in the study area and comparison to other events occurring in the city during the analyzed period?
• What was the proportion of visitors to the Olympic Games and the degree of language diversity?
• Which was the daily dynamic of Olympic venues in terms of user presence?
• Which areas of Rio de Janeiro were most frequented by users before, during, and after the Games?
• Which areas were more/less successful in retaining users' activity after the Olympic Games?
The remainder of this paper is organized into the following five sections: 1) a literature review on mega-events, event tourism, and LBSN data opportunities; 2) a description of the case study; 3) data sources and the methodological approach; 4) an analysis of the spatiotemporal distribution of Twitter users in Rio de Janeiro before, during and after the Olympic Games, and the impacts of this mega-event in terms of residents' and tourists' activity patterns; and 5) a discussion of main conclusions.

Mega-events, tourism, and impacts on the host-city
International mega-events, such as the Olympic Games, have become key for host cities seeking economic growth, urban development-dynamics, and city image branding [16][17][18]12]. Mega-events are also frequently considered as a highly significant tourist asset for a host city, firstly, because they directly attract participants, and secondly, because they increase general visitation as a result of the raised profile of the area [19]. The issue has aroused significant scientific interest and has spawned a substantial number of studies. There is a particular abundance of Olympics-related literature, regarding its economic impacts, tourism and urban regeneration [20][21][22][23][24], or the attitudes of the host population concerning the Games and its impacts [25][26][27][28][29][30].
Nevertheless, mega-events also disrupt cities' routines and negatively impact on mobility by increasing travel times [31], thereby decreasing the appeal of hosting a sports mega-event [32][33] due to the financial, environmental [34,26] and social costs [35][36]. Some authors have pointed out the dramatic overestimation of the economic benefits of mega-events, warning about the limited effects observed during post-event assessment, and in the long term in general [37].
Tourism legacy is often used as a primary justification for staging mega-events [38]. A recurrent question among stakeholders and managers has been whether the Olympic Games can effectively be used to increase tourism in the host city or country, with an emphasis on the concept of event tourism, whether these levels of tourism can be maintained after the Games, or whether the boost given to tourism by mega-events is ephemeral and temporary [3,[39][40].

Location Based Social Network data applied to study the impacts of mega-events
ready availability, the analysis of Twitter data has started receiving considerable attention from academics as a source for opinion mining and trend tracking, with a focus on message content and users' features [48,[71][72][73][74][75][76]. Interesting applications can also be found in the field of identifying planned or unexpected events in space and time [77][78][79][80][81][82][83][84][85][86]. However, more research is needed at intra-urban levels of analysis, in particular for gaining an understanding of activity levels during mega-events and potential changes occurring after all urban regeneration processes associated with this type of event.
In the current study we go a step further in the analysis of the spatiotemporal distribution of Twitter users in the context of a mega-event such as the Rio Olympic Games, with several contributions to the existing literature of mega-events: • The analyzed period includes the entire Olympic cycle, distinguishing between the periods before, during, and after the Games, in order to identify potential structural changes in the use of certain areas of Rio de Janeiro, directly linked with the investments made in the city.
• The collection of data is a key issue here. Most previous studies on event-related analyses have focused on surveys. In this study, we use Twitter data that has an important potential for spatiotemporal assessment, given its broad geographic distribution of users and its high resolution, both spatial (X, Y coordinates) and temporal (precise time of Twitter messages).
• We analyze Twitter users instead of twitter messages (tweets), which represents a more meaningful unit of analysis in terms of urban planning and mitigates the effect of compulsive users posting several messages at similar spaces and times.
• During the games, we differentiate between residents and visitors and analyze the different spatiotemporal patterns.
Nonetheless, the use of geolocated tweets implies certain limitations to our study that we would like to acknowledge here. First of all, geotagged Tweets downloaded from the Twitter Streaming API account for approximately just 1% of all the messages that are sent using the Twitter service [86]. The strategy followed to mitigate this small proportion of tweets is to "harvest" messages for long periods. In our case, we were downloading data for almost a year, collecting a total of 2,9 million tweets, as we explain in section 4.1. A further consideration is that this social network is mainly used by young adults between the ages of 20s and 30s [87], and this fact may be taken into account when interpreting the results.

Case study
Over the last years, Rio de Janeiro (Brazil) has become a hub for world mega-events [32]: the Pan American Games, the 5th CISM Military World Games in 2007, the 2012 Rio + 20, the 2013 World Youth Day, the FIFA World Cup in 2014 and the Olympics in 2016. This paper focuses on the Rio 2016 Olympic Games, which led to extensive urban transformation, thanks to new sports venues and a series of projects associated with mobility, housing, and the environment. The highest amount of investment was received by four main areas: Barra, Deodoro, Maracanã, and Copacabana ( Figure 1).

Figure 1. Main Olympic areas in Rio de Janeiro
1) Barra is a beautiful, dynamic area with zones given over to both nature and leisure. In recent years it has experienced the greatest residential expansion in the city of Rio. The Olympic Park, one of the greatest legacies to Rio of the Olympic Games, was built in this area. As home to 14 of the 31 sports venues, there was massive investment in its residential and commercial areas and transport infrastructure. Investment in this area includes more than 2400 high-class homes [88] and two new Bus Rapid Transport lines (BRT), namely the Transolímipica and the Transcarioca, which connect this area to the north and center of the city. It is important to note that before the Olympic Games this area had weak links to the city center and the most famous tourist sites.
2) The Deodoro area is a less economically developed and peripheral location with several nearby slum neighborhoods, favelas. It was the second major area in terms of Olympic sites, new leisure zones, and new BRT connections.
3) The Maracanã area includes some of Rio's most famous icons, such as the Maracanã stadium and the Sambodromo. It also includes the city center and port area. For the Olympic Games, a new middle-class residential area was built near the Maracanã stadium. Furthermore, a complete restructuring of the port area was undertaken, with new residential areas for the lower economic classes and sites destined for leisure and tourism. 4) Finally, Copacabana is a privileged area, thanks to its beaches, mountains, and famous tourist attractions. This densely populated area did not experience any noteworthy changes in terms of sports venues and residential areas. However, in terms of transport infrastructure, it benefited from the construction of subway line 4, which extended the existing line 1 and connected it to the Barra neighborhood.
Apart from these Olympic areas, other improvements in road infrastructure were carried out in different parts of the city. Transport infrastructure projects promote accessibility gains [89]. However, access inequalities in the city of Rio seem to remain [90].
After the Olympic period, some of the Olympic venues were opened to the general public, as in the case of the Olympic Park. In other cases, it was clear that the Olympic sites were not planned for the current and future needs of the city. One of the most paradigmatic examples was the Canoe Slalom venue (Deodoro area) that opened as a giant swimming pool just after the games and closed Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 12 July 2020 doi:10.20944/preprints202007.0257.v1 in December 2016. Another case was that of the 31 residential towers opposite the Olympic park (Barra), built as homes for the upper classes, many of which remain unsold long time after the event.

Data and methods
Twitter data seem remarkably suited to our purposes in our pursuit of an understanding of the impacts on tourism and urban dynamics related to the Olympic Games. Geotagged tweets, in particular, constitute a useful tool for urban studies since they allow spatiotemporal tracking/tracing of users.

Twitter data download
Twitter messages (tweets) were downloaded using Python's package, for requesting petitions to the Twitter Streaming API, and store data on MongoDB over one year (from April 2016 till March 2017) in order to identify changes in the spatiotemporal distributions of tweets between the periods before, during and after the Games. Only geotagged tweets were downloaded, and those selected covered the city of Rio de Janeiro and Niteroi. Besides tweet' coordinates, other relevant information for this study is the user ID, the date and time the tweet was posted, the device language setting, and the number of friends and followers. Over 2.9 million tweets were downloaded in the Rio and Niteroi municipalities. The data were loaded into a GIS (ArcGIS 10.4.1), creating a layer of points with the point coordinates of the position from which each of them was posted. The data were then cleaned of bots and compulsive users that did not move, and over 36 thousand tweets were excluded. Bots we identified as users having a disproportional number of friends (more than 1500) compared to the number of followers (less than 50). Compulsive users are defined here as those profiles posting an average of more than 10 messages per day.

Spatial and temporal aggregation of data to obtain the number of unique active users
The same user often posts several tweets from the same location at approximately the same time. The number of such tweets from some users can be extremely high, leading to an overestimation of the presence of this type of user at these locations and times. It is, therefore, necessary to analyze unique users rather than tweets. To this end, the tweets were aggregated spatially and temporally (every quarter of an hour) depending on the user ID, to obtain the presence of unique active users in each spatial unit, rather than the number of tweets posted. The spatial aggregation of Twitter data was based on nonoverlapping regular hexagons with sides measuring 400 m in diameter. This zoning was preferred to other administrative boundaries, such as census tracks, because of their irregular size and shape (based on population density). Regular hexagons have the advantage of mitigating the problem of modified spatial units [91] since they have the same size and shape.

Temporal scenarios
The analysis considers three important time periods: the period leading up to the Olympic Games, the period of the event itself, and the period after the Games. The periods analyzed before and after (4 and 9 months, respectively) are ample in order to avoid the effect of temporary fluctuations, for example, holiday periods or other events. The Olympic Games period in this study, between July 31st and August 28th, covers the two weeks of the event, between August 5th and August 21st, to which one week before and one week after have been added, thereby including the more than probable presence of tourists associated with the Games in the weeks leading up to and following the event. Taking into account the different duration of the periods to be compared, the monthly average of users is used as an indicator of the presence of Twitter users.

Analysis of spatial distribution patterns
When analyzing the spatial distribution of users in the study area, density maps and spatial autocorrelation analysis were obtained, considering the average of Twitter users by hexagons and by Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 12 July 2020 doi:10.20944/preprints202007.0257.v1 month. Density maps provided an initial visual overview of the density distribution of users in Rio before, during, and after the Olympic Games. Unlike density maps, spatial autocorrelation techniques do not consider each location in isolation but concerning its environment [92]. Anselin's Local Indicator of Spatial Association (LISA) was calculated separately for the three periods using the inverse distance weighted (IDW). The IDW is one of the most used methods for a conceptualization of spatial relations, since it represents Tobler's first law that states that everything is related to everything else, but near things are more related than distant things. An 1834 m radius was selected after a calibration procedure based on z-scores, as the distance with the highest spatial clustering in our study area. The LISA analysis identified the type of areas according to their activity patterns (high/low concentration of users). The results for the different periods were combined in order to determine the possible change in each area's activity patterns. For instance, areas can be classified as always having a high concentration of users (High-High clusters or HH), High-High only during the Olympic Games, or High-High clusters during and after the Olympic Games.

Temporal dynamics
First, an analysis was carried out on the distribution of Twitter users throughout the whole period, focusing on possible changes in the periods before, during, and after the Olympic Games. The Olympic Games started on August 5th and ended on August 21st, and it was observed that the event had an important impact on the number of users in the study area ( Figure 2). The number of users increased just before the start of the Games, peaking in the third week, between August 14th to August 20th (70% above the average of the whole period) and decreasing after that. A small increase can also be observed during the Paralympic Games, with around 18,000 users (an increment of 20%). We believe the increase in users to be mainly due to tourists, teams of athletes, and other professionals arriving in Rio for the Olympic Games. The data also reveals other peaks at other times of year that are especially attractive to tourists, such as summer or Carnival, with lower increases in the number of Twitter users (between 6 and 20%).

Visitors to the Olympics
The second step was to determine the differences between residents and tourists in terms of activity patterns. It was, therefore, necessary to differentiate between both groups of users. Tourists of the Olympic Games are defined as those users that only posted during the 4 weeks of the Games, including one week before and one week after the Games. It was assumed that a "normal tourist" Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 12 July 2020 doi:10.20944/preprints202007.0257.v1 would not come to Rio during this period, considering the elevated prices of travel tickets and accommodation. During this period, more than 15,000 tourists were captured in the study area, representing around 30% of all active users in our data ( Table 1). The highest number of active users, both in terms of tourists and residents, were scored in the second and especially the third weeks, coinciding with the central weeks of the games. Users' self-declared user interface language can also be extracted from the dataset. This information was used to analyze the diversity of tourists during the Olympic Games. Figure 3 shows that one week before and after the games, Portuguese was the prevalent language of tourists in Rio (especially in the week before). This indicates the importance of national tourism in the city, with 61% of tweets posted by tourists in Portuguese, followed by English (26%) and Spanish (8%). Once the Olympic competitions stared, the situation was reversed, and English became the most used language (rising to 42%), followed by Portuguese and Spanish. Also, the mix of languages was much richer during the Games, with the appearance of other languages, such as Russian, Italian, French, or Indonesian, which had a high number of users.

Urban dynamics during the Olympic Games
The daily activity of users during the Olympic Games period differs depending on the site analyzed. Figure 4 presents the daily presence of users (%) at different Olympic venues. Resident and tourist users are jointly represented here since no remarkedly differences were observed between their use of the Olympic venues. The Olympic Park area (Barra) follows the general pattern observed for the whole study area (dark line), with the highest activity registered around 9 PM. This pattern, which is also in line with other studies [49], represents activity in an area that is a mix of commercial and residential zones, including the Olympic Village, where most athletes and teams stayed. As a residential area, Copacabana shows similar behavior to that of the Olympic Park, registering another peak between 11 and 12 AM. Other sites show a much earlier peak around 12 PM and 1 AM for Maracana and Sambodromo, and around 4-5 PM for Deodoro Park. These patterns reflect the increase in users attending sports events and leaving these areas afterward.

Figure 4. Daily distribution (hours) of Twitter users in different Olympic areas
Maps of the spatial distribution of residents and tourists in Rio during the Games ( Figure 5) show that residents are present, logically, throughout the city, whereas tourists are concentrated in specific areas. Residents are concentrated in the most densely populated areas of the center of Rio and the Santo Cristo, Santa Teresa, Cadete, and Flamengo neighborhoods; at the same time, although tourists are also found in these areas, they are present in much greater numbers in the areas of Copacabana and Ipanema. The Olympic areas show high densities of both tourists and residents, especially in the Barra and Deodoro areas. The third dimension in Figure 5 represents the number of users during the games. Colors are derived from computing LISA and confirm that the areas previously mentioned form clusters of statistically significant user concentration.

Changes on the territorial dynamic after the Olympic Games
By comparing the spatial distributions of Twitter users before, during, and after the Games, it is possible to analyze changes in the territorial dynamics of the city (Figure 6). Before the Olympic Games, the activity of users was concentrated in areas around the city center and Copacabana. During the Olympics, these areas remained active, but the greatest changes were found in the Barra Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 12 July 2020 doi:10.20944/preprints202007.0257.v1 and Deodoro areas, which showed an increase in the number of users. After the Olympics, users' activity in the Deodoro area decreased to pre-Olympic levels, whereas high activity levels were retained in Barra.

Figure 6.
Daily pattern of Twitter users, comparing before, during and after the Olympic Games Spatial autocorrelation was used to analyze the spatial concentration of Twitter users. LISA was calculated for the periods before, during, and after the Olympic Games. The index shows the spatial cluster distribution (significant at the 0.01 level), identifying those areas of high user concentration, surrounded by similar areas (High-high cluster or HH) ( Figure 7). As expected, HH areas are mainly located in the most dynamic and touristic parts of the city, such as the city center, the Copacabana, and Barra. During the Olympic Games, HH clusters were more concentrated, and there were two new clusters: Barra and Deodoro. Part of the Barra Olympic area remained an HH cluster for Twitter users after the Olympic Games. Interestingly, the Deodoro area does not constitute a HH cluster after the Olympics, despite the leisure zones open to the public after the games. Notice that there was not any low-low cluster in the whole study area.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 12 July 2020 doi:10.20944/preprints202007.0257.v1 Figure 7. Results of the LISA analysis before, during and after the Olympic Games By combining the results of the LISA indicator for situations before, during, and after the Games, it is possible to identify 3 main types of areas ( Figure 8): • Zones with a strong user presence before the Games that maintained this during the event and have continued to do so since (HH in the three situations: HH-HH-HH type). These are the most dynamic and touristic zones in the city center or around the beaches of Flamengo, Copacabana, and Ipanema.
• Zones with less user presence before the Olympics, where sports events held during the Games, made them into hot spots, and activity has subsequently been maintained (Not significant-HH-HH type). These are areas that received investment and events during the Games that have generated new spaces of activity in the city. The most obvious example is a part of the Barra area.
• Zones where there was no significant activity before the Games, where high activity levels took place during the Games, but which reverted to being unused afterward (Not significant-HH-Not significant type). These zones represent spaces that received investment, but this has not been able to attract enough people afterward. In the case of Rio, the neglect of many of the Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 12 July 2020 doi:10.20944/preprints202007.0257.v1 installations has led to substantial political repercussions in the city. The maps show situations of this type in the Deodoro area. Figure 8. LISA cluster combination (before, during, and after the Olympics).

Discussion
Mega-events have a huge impact on host cities, and not surprisingly, there is an increasing interest in assessing both the way the city functions during the event and changes derived in the urban dynamic afterward. The Olympic Games are, without a doubt, the event that has one of the greatest repercussions on cities, owing to the enormous investment entailed in infrastructure.
To carry out this study, geolocalized data from the Twitter social network have been used as a source. Data were gathered between April 2016 and March 2017 in order to determine the spatial patterns of users before, during, and after the Games. Given its spatiotemporal disaggregation, it is possible to know the spaces where users carry out their activities throughout the day. An analysis of this type would be very costly, if not impossible, to do if traditional data sources were used.
The main contributions of this work to the existing literature of Twitter data applied to Olympic Games are several. On the one hand, we asses of the spatiotemporal dynamics of the Games, at an intraurban level and focusing at the use of space made by residents and tourists. Other studies perform similar analysis with a broader scale (regional) focusing mainly on content and sentiment analysis [63][64].
The temporal analysis performed in this study accounts for Twitter users, instead of Twitter messages, which is the mainly approach used in other similar studies, correcting for users that post several messages at similar time and space. The data show a 70% increase in the number of Twitter users in the city during the third week of the event, which is a much higher amount than any other event held in Rio during the analyzed period. During the Olympic period, the number of tourist users detected was 30%. During the Games, a greater variety of users' languages was observed and, unlike the periods outside the Games, in which Portuguese language users were predominant, English was the language most in evidence. Concerning the differences in the activity pattern of residents and tourists during the Games, it was found that in both cases the most frequented places were the city center and the Olympic areas, the difference being that tourists were spatially less Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 12 July 2020 doi:10.20944/preprints202007.0257.v1 widely spread over other areas of the city. A similar outcome was obtained by Kovacs-Gyori, A. et al [69] when analyzing the London 2012 Olympic Games. Our main contribution in relation to this study is that we include an ampler period of analysis, in order to identify spatial patterns in the daily distribution of users and to analyze changes in the periods before, during the Games, and after them. We found that user activity increased significantly during the Games in the Olympic areas, with some differences where there was a greater mixture of land uses. Therefore, zones dominated by the presence of Olympic installations registered peaks of activity associated with sports events, while zones with a mix of residential use had a more regular activity pattern throughout the day. Finally, concerning the effect of the Games on users' patterns of activity and the use of space, our analysis shows an increase in activity in some Olympic areas, as in the case of Barra, if we compare the situation before and after the Games. In other zones, like Deodoro, where there was a serious attempt at urban restructuring, with the demolition of parts of some favelas, the activity expected after the Games has not materialized at the end of the analyzed period. Apart from the limitations of Twitter data already pointed out in section 2.2, we are aware that our study only analyzes the short-term impacts of the Olympic Games. Long-term impacts would require data collected for a longer period, but we will leave this analysis for further study. Further analysis could extend this study, including sentiment analysis on Twitter messages in order to analyze the opinion of users about the urban restructuring that took place in certain areas. Nonetheless, we believe the outcomes of this paper provide very detailed information that cannot be extracted by other sources of data, such as traditional surveys and mobility statistics. Combining LBSN with more "traditional" surveys would unleash the full potential of mega-events impacts analysis. However, such an extensive analysis goes beyond the scope of this study.

Conclusions
The methodology and outcomes presented here contributes useful information for urban planning. Firstly, it is possible to increase our understanding of urban dynamics during the holding of mega-events and to know not only the volume of the increase in visitors but also the use they make of the city during their stay, even the use made by its residents during the event. This is of interest concerning the provision of public sector services (for example, risk assessment and evacuation plans for the population) and private sector business activities (potential demand according to areas and time of day). Also, an understanding of the impact of investments after the holding of the event makes it possible to predict and assess future patterns of activity for new urban developments.
Funding: This research was funded by the European Social Fund, under the following Grant Agreements: H2015/HUM-3427; TRA2015-65283-R