Using Flickr Geotagged Photos to Estimate Visitor Trajectories in World Heritage Cities

: World tourism dynamics are in constant change, as well as they are deeply shaping the trajectories of cities. The “call e ﬀ ect” for having the World Heritage status has boosted tourism in many cities. The large number of visitors and the side e ﬀ ects, such as the overcrowding of central spaces, are arousing the need to develop and protect heritage assets. Hence, the analysis of tourist spatial behaviour is critical for tackling the needs of touristiﬁed cities correctly. In this article, individual visitor spatiotemporal trajectories are reconstructed along with the urban network using thousands of geotagged Flickr photos taken by visitors in the historic centre of the World Heritage City of Toledo (Spain). A process of trajectory reconstruction using advanced GIS techniques has been implemented. The spatial behaviour has been used to classify the tourist sites o ﬀ ered on the city’s o ﬃ cial tourist map, as well as to identify the association with the land uses. Results bring new knowledge to understand visitor spatial behaviour and new visions about the inﬂuence of the urban environment and its uses on the visitor spatial behaviour. Our ﬁndings illustrate how tourist attractions and the location of mixed commercial and recreational uses shape the visitor spatial behaviour. Overﬂowed streets and shadow areas underexplored by visitors are pinpointed.


Introduction
Urban tourism has been expanding globally since the 1980s. In the current mobilities era [1], characterised by the modelling force of the digital context and the global access to information [2,3], the world tourism dynamics are in constant change, at the same time that they are deeply shaping the trajectories of cities [4]. In the aftermath of the global financial crisis of the late 2000s, many urban destinations hailed tourism as the solution for many failing local economies to recover [5]. In this context, strategic planning in tourism has been based on the touristification of cities, by means of the creation of new (or the upgrading of already existing) environmental and cultural resources in order to open up new markets. Sport and culture-led urban regeneration projects [6], hotel development [7], the emergence of the so-called sharing economy of short-term tourist rentals [8], or over-tourism [9] are creating a day-to-day struggle for the management of tourist cities and, furthermore, they are shifting the nature of places up until an extent of what has been known in literature as the "tourismification of the quotidian" [10]. In tandem, tensions and conflicts between ISPRS Int. J. Geo-Inf. 2020, 9,  residential and tourist uses of the city's commons have started to appear. Many cities are experiencing the transformation of their everyday sites of activity because of visitor interaction, or how they gaze and go places [11,12]. These transformations might be structural (such as land use changes, building conversions, rent increases, store replacement), related to immediate nuisances created by the pressure exerted by the tourist activity (such as congestion, privatization, noise, litter, among others) or might be linked to the commodification of "the cultural", the homogenization of the urban space [13] and the subsequent loss of distinctive attractiveness [14]. Touristified cities are being adapted to tourists, partially in order to attract demand, but also because they are influenced by "the overwhelming economic success of mass tourism theme parks, cruise ships and historic shopping streets and shopping centres" [14]. These transformations are leading to the rise of anti-tourism movements [15] that question the management strategies and the sustainability of the tourist activity itself regarding its compatibility between the benefit of the local economy, the preservation of endogenous resources and the quality of both the tourist experience and the daily life of the residents. In addition, the impact of tourism in cities with the presence of UNESCO World Heritage (WH) assets is enhanced by the potential risk of losing their designation [15]. In many of these WH cities, tourism started to develop advantaging from the "call effect" for having the WH status. However, the large number of visitors and the side effects are arousing the need to develop and protect heritage assets to serve the needs of the tourists and stakeholder communities [16]. It is in this context that destination management competitiveness [17] becomes essential.
One of the major challenges in urban tourism management is created by the overcrowding of central spaces and specific street networks [18,19]. Visitor spatiotemporal behaviour involves movement and multi-attraction [20] and, consequently its analysis is critical for tackling the needs of the destinations correctly. Urban tourists tend to concentrate their movements in reduced areas within cities [21], where multiple kinds of tourist attractions are clustered. Moreover, the visitors' time-restrictions at a destination and their interest in visiting/consuming as many sites of interest as possible make their behaviour a fast and intense experience [22]. It is in this framework that understanding visitors' spatial behaviour can bring new knowledge about new flexible planning and management procedures and probably new visions about the built environment, its uses and its benefits [4].
With the advent of position-tracking technologies [20] and the need for a detailed examination of tourist behaviour within destinations/cities [21,23], the scale of analysis has zoomed in progressively. During the last few years, there has been an important increase in studies on spatiotemporal tourist behaviour at destination [24,25] that have enabled new analyses (or improved/completed existing ones) on the use of space by tourists on different scales [26].
In this regard, our study contributes to this research field on urban tourist mobility studies, by providing not only useful information for public authorities aimed at both improving the tourist experience and mitigating the side effects on the local population, but also a cost-effective and agile instrumental method to analyse urban tourist mobilities that could be replicated elsewhere. In this vein, this study aims to analyse the visitor spatial behaviour within the historic centre of the World Heritage city of Toledo (Spain), using geolocated photos of one of the most popular and accessible photo-sharing websites, Flickr. According to existing literature, the excessive tourist pressure on certain places of the historic centre of Toledo is causing conservation problems for monuments, to which are added the drawbacks of excessive touristification [27] that tends to make the historical centre a "theme park in history", to the detriment of other land uses and activities [28].
Concretely, this study aims to use the reconstruction of individual trajectories on street level to analyse to what extent there is an overlap between the visitor spatial behaviour, the Toledo official tourist map (see Figure A1 in the Appendix A) and the typology of land uses. We hypothesize that the definitive visitor flow map could match with the location of the tourist sites promoted in the official tourist map of Toledo, as demonstrated by Paül i Agustí [29], but also with the presence of mixed commercial and recreational land uses [30,31]. To achieve the aforementioned objective, we implement a detailed methodology to monitor visitor behaviour in city centres, by going beyond of what has been done by studies that have used geolocated data from photo-sharing platforms, such as the detection of most popular tourist attractions or travel recommendations [32][33][34]. Thus, the potential of this data source to reconstruct physical trajectories on the street layout using the photographers' digital spatial footprints (considering the associated time and location data of their photos) is approached.
The fulfilment of the research objectives will open new opportunities for urban tourist mobility studies. The results will clearly help to classify the tourist sites according to the flow of visitor trajectories per street segment where they are located. Therefore, this study could be of interest for public authorities to identify intensive uses of specific areas of the cities, as well as undervisited monuments that could help to both mitigate the impact of tourism growth and set up sustainable cultural destinations [35]. Furthermore, in the current era of COVID-19, this study may bring interesting implications for academia, practitioners and policy managers, in a context in which international tourist mobility has almost come into a halt, but locally-based tourist flows have been reinforced [36]. Hence, the analysis implemented could be considered a rapid and economic way to monitor human mobility and social interaction in dense urban spaces in the new paradigm that has burst into the era of mobilities [1].
Following this introduction, the paper presents a review of the relevant literature on the urban tourist mobility studies in cultural heritage cities, in order to highlight the opportunities, strengths and limitations of multiple data sources, especially data from social media and photo-sharing platforms such as the one used in this study, Flickr. Then, information regarding the city under analysis is provided. The data used and methods implemented are explained in the fourth section of the manuscript. The subsequent section exposes the results obtained, along with a discussion on the findings of other similar studies. Finally, the conclusions of the work are underpinned.

Challenges for Tourist Mobility Studies in Cultural Heritage Cities
Although the main objective of the UNESCO listing is the identification and protection of sites of outstanding value, the designation has been widely used in marketing strategies to attract more tourists [37]. An increase of tourist arrivals implies also higher tourist flows within the historic centres designated as WH which may have a twofold effect. On the one hand, the higher presence of tourists might induce an economic impact on the local economy as well as a positive cultural exchange. On the other hand, the visitor flows may represent an overload in areas with a complex structure such is the case of historic centres [38]. Thus, the cultural integrity of the heritage sites may be affected and damaged since the charging capacity of these sites is reduced. Furthermore, the social cohesion and identity of the local community [11,12] may also be compromised because of the tourismification of historic centres [10]. Either structural transformations such as land use changes, building conversions, rent increases or store replacement, or immediate nuisances created by over-tourism might lead to a downward spiral of cultural degradation with negative influences on both the quality of life of citizens and tourist experience [14], leading to a configuration of an unsustainable model of tourism.
According to the World Tourism Organisation (UNWTO), nonetheless, the massive arrival of tourists at destinations or the over-tourism effect does not have to represent a threat if public authorities develop a smart and effective control of visitor flows in the destination [39]. In this context, therefore, emerges the need to manage visitor flows and set up sustainable cultural destinations [35]. Thus, the minutely analysis of the spatiotemporal behaviour of visitors should be implemented not only to identify intensive uses of specific areas and places, but also to design new strategies of redistribution via the promotion of undervisited yet with potential sites or monuments [40,41]. Then, if we consider the centres of heritage cities as not only leisure and tourism places, but also as activity places [30] that intermesh formal and informal activities carried out by locals and visitors, there are key elements such as primary attractions combined with characteristic features of the urban environment, that induce a certain spatial behaviour. However, the data available for the analysis has been traditionally limited and, while it is true that in recent years there has been a substantial increase in the sources available thanks to the emergence of positioning technologies (as we will review in the next subsection), limitations and challenges remain important; besides that, the analyses developed in small and medium-sized heritage cities, such as the case of the present study, are few.

From Classical Tourism Statistics to Position-Tracking Technologies: Increasing Spatial and Temporal Granularity
Never before in history have researchers and practitioners had the mobility monitoring tools available that we have today. Since the last decades of the 20th century, scholars have been calling for more and better research on urban tourism. Previously, tourism studies had concentrated on distributions and flows on international, national and regional levels, neglecting to look into processes on a very localized scale (urban and neighbourhood levels). Some of the reasons for the traditional scarcity of specific research on urban tourism in general, and on spatiotemporal tourist behaviour in particular, were (and still are) the methodological challenges facing the task of gathering data. Classical tourism statistics cannot track tourist travel behaviour since they usually provide data from survey-based hotel occupancy which, in addition, miss those tourists who do not stay overnight at the destination (one-day visitors) or those who do not stay overnight in regulated accommodation (i.e., short-term rentals). In this regard, direct observation [42], travel diaries [43] and recent digital position-aware technologies are the three tracking methods available [44]. Even though both direct observation and travel diaries have traditionally constituted the primary data collection tool, they present some limitations such as low participation levels or inadequate/insufficient information on spatiotemporal visitor behaviour [25,45].
With the advent of position-tracking technologies, the scale of analysis has zoomed in progressively [20], though there are still several limitations related to the spatial resolution and the temporal granularity of the sources. Among the multiple sources of information that monitor human mobilities, the most used to track tourist behaviour at the urban scale have been the positioning loggers, the mobile phone satellite position records and geotagged content coming from social media. The three data sources have their strengths and limitations for the analysis of urban tourist mobilities within cultural heritage cities. Therefore, below we reviewed their main characteristics, showing special emphasis on geotagged social media data, since it is the data source used in this study.

Positioning Loggers
The positioning loggers have helped to explore visitor travel patterns on different scales, ranging from national/international tourism [46], to movements within urban destinations [25] or within confined recreational areas [47]. The main advantage provided by this method is the possibility to integrate the GPS tracks with ad hoc surveys answered by the participants. Hence, socioeconomic characteristics of the individuals can be crossed with their spatiotemporal behaviour. This methodology has been carried out previously in WH cities. Such are the cases of the studies developed in the historic centre of Melaka in Malaysia with 384 participants [48] or in the old city of Acre in Israel with 88 participants [49]. Similarly, both studies showed that different visitor profiles can be detected in WH cities according to their socioeconomic characteristics and, more importantly, based on their spatiotemporal behaviour. In this regard, they indicate that the study of visitor behaviour has to help city managers to improve mobility management plans to guarantee the quality of the tourist experience as well as the preservation of the heritage sites. However, despite the undeniable contribution of positioning loggers as a tracking methodology, they suffer some drawbacks [50]. Some of these disadvantages might be related to technical issues such as (1) transmission problems, (2) warm-up times before getting a valid position, (3) the cost of voluminous post-processing information from ISPRS Int. J. Geo-Inf. 2020, 9, 646 5 of 28 GPS loggers or (4) the non-applicability to indoor contexts. Nonetheless, the main limitation comes from a (5) potential selection bias (since certain population groups would be more participative, over-representing these individuals in the sample) and (6) a relatively low amount of observations that may also condition the representativeness and the analyses.

Mobile Phone Satellite Position Records
Mobile phone satellite position records and cell phone usage have also opened up multiple opportunities such as identifying urban activities and their spatial-temporal evolution almost in real time [51] and understanding tourist travel behaviour [52]. The potential to identify different user profiles with this data source has been shown in the heritage cities such as Rome [51], Venice [53] and Florence [54]. However, cell phone tracking encompasses certain limitations related to the uneven spatial accuracy (limited by the density of cell towers over the study area and thus posing a problem when studying movements on a local scale). Despite this limitation, the study of Mizzi et al. [53] achieved a spatial granularity not commonly seen in studies with mobile phone data. They were able to reconstruct the mobility paths on the road network of around 3000 devices during the Carnival of Venice and the Festa del Redentore in 2017. Thus, every time each device was used for a phone call or to access to the Internet recorded a GPS location that allowed the authors to reconstruct trajectories. This reconstruction is presented by the authors as an innovative tool that must be used to analyse how the tourist flows impact on the quality of life of residents and on the preservation of cultural heritage. Nevertheless, these data are not generally free of charge and the high cost might represent an important barrier for academia, practitioners and public authorities.

Geotagged Social Media Sources
Social media sources (photo sharing web sites such as Flickr, Twitter, Panoramio or Instagram; or social sport tracker sites and applications such as Wikiloc or Strava) connected with tourism activity, have contributed greatly to ameliorating both data collection and analysis issues, potentially contributing to urban tourist research. As stated by Chareyron et al. [55], big data unquestionably represents a new challenge for tourism. In addition, the recent widespread use of camera devices with GPS (including smartphones and tablets) enables storing geographical information for each photo taken. These geolocated photos allow estimating national tourist statistics [56,57], identifying points of interest or (tourist) landmarks by selecting representative and relevant photographs from a particular spatial region [58], depicting tourist concentrations and spatial-temporal movement trajectories within urban environments [34], even quantitatively ranking them [59] or recommending travel paths based on the previous travel information of tourists [60][61][62]. In addition, these data sources have been used to analyse the extent to which there is an overlap in the territorial distribution of tourism images promoted through official tourist brochures and travel guides [29], and to compare the spatial interactions between tourists and locals [63].
The geolocated social media data can be considered a valuable proxy of human movement since it provides detailed spatial information (up to street level precision) for a wide range of applications, such as identification of anomalous movements [64], point of interest categorisation [65] or community detection [66]. In fact, according to García-Palomares et al. [67], it is on an urban scale that these sources show the greatest potential, since they gather users' travel experiences.
Despite all these advantages, multiple drawbacks also arise when using geolocated social media data. The main limitations are user penetration [68] and the potential unreliability of the information provided by the users [69]. Both limitations could lead to a representativeness bias. In the first place, the territorial context under analysis is subjected to the popularity of the social network from which data are analysed. For instance, the number of Flickr users is highly correlated with the number of tourists from North American and European countries where the social network is more popular [57]. However, this correlation is much lower for Chinese users due to a lower user penetration in that country. Secondly, the nature of photo sharing networks is not transversal among them. In fact, the most widely used social networks for research purposes have been open access repositories that provide accessibility for downloading and analysing the data. Such is the case of Flickr, that in contrast to other photo sharing social networks, such as Instagram, offer much less noise when analysing data. This is related to the type of users and to the multiple factors that push them to upload content on photo sharing platforms, such as attention seeking, social influence, disclosure or information sharing [70]. Another drawback of geolocated social media data is related to the reconstruction of trajectories, that are much coarser than tracking trajectories obtained by positioning loggers or mobile phones [71], since the photos taken by the users may not be very continuous in time, or perhaps the users do not upload all the photos taken on the social media platform. An additional limitation of this kind of data is that there is no detailed information on the socio-economic and demographic profile of the tourist, or about their previous travel experiences.
Despite its limitations, the large amount of data uploaded onto the cloud makes it possible to approach the routes that visitors follow and to detect the most visited points of interest. In this regard, previous studies have defined routes as a sequence of previously identified city landmarks/regions of interest that a person visits [32,34]. Most of these studies represent straight connections as routes between tourist attractions [32,72,73]. Meanwhile, some of them are instead travel recommendation systems based on minimizing distances or optimizing the number of visited attractions [33,60,61]. Only a limited number of them provide a visitor flow along with the network. For instance, Orsi and Geneletti [74] used 3656 photos from Panoramio and designed a methodology to reconstruct hiker flows along with the trail network of the Dolomites natural park (north eastern Italy). In addition, Yin et al. [59] used Flickr photos from 12 cities and proposed a method to identify the most repeated travel sequences between main attractions. However, to the best of the authors' knowledge there is not any study that reconstructs individual visitor trajectories and infers them to the urban network, which is the methodological contribution of our study. Furthermore, none of the mentioned studies analyse the association between the visitor flows and the land uses in order to detect how the spatial behaviour is shaped by the presence or absence of specific characteristics of the urban environment. In fact, the studies that explore this dimension do so by means of aggregating the photos at the cell level (not at the level of visitor flows through the streets as it is done in the present study), and checking if there is a spatial correlation between the spatial distribution of photos and the presence of attractions, services and facilities linked to the tourist offer [75,76].

Study Context
Toledo, with a population of 84,282 inhabitants (according to the register of inhabitants of 2018), is the capital city of the Autonomous Community of Castilla-La Mancha (Spain). It is located at the centre of the Iberian Peninsula, just 70km at the south of Madrid, the largest Spanish metropolitan area (see Figure 1).
Toledo is known as the city of the three cultures (Christian, Jewish and Muslim) and was declared a World Heritage Site by UNESCO in 1986 for its exceptional history, traditions and valuable monumental and architectural heritage. Every year for the last five years its singularity has attracted around 600,000 tourists staying overnight in hotels, 65% of which are Spaniards [77]. As can be seen in Figure 2, where the percentage change of tourists staying overnight in hotels (Base 100 = 2005), a registered important growth in the aftermath of the global financial crisis was detected, with a remarkable increase of up to 80% of international tourists. This international projection is explained by its central position in the Iberian Peninsula and its proximity to Madrid (less than 30 min by High-Speed Train and less than 1 h by road). Even above tourists, one-day visitors are the population segment with the greatest presence in the city. In fact, it is estimated that around 3 million visitors passed through Toledo in 2018 [78]. Toledo is known as the city of the three cultures (Christian, Jewish and Muslim) and was declared a World Heritage Site by UNESCO in 1986 for its exceptional history, traditions and valuable monumental and architectural heritage. Every year for the last five years its singularity has attracted around 600,000 tourists staying overnight in hotels, 65% of which are Spaniards [77]. As can be seen in Figure 2, where the percentage change of tourists staying overnight in hotels (Base 100 = 2005), a registered important growth in the aftermath of the global financial crisis was detected, with a remarkable increase of up to 80% of international tourists. This international projection is explained by its central position in the Iberian Peninsula and its proximity to Madrid (less than 30 min by High-Speed Train and less than 1 h by road). Even above tourists, one-day visitors are the population segment with the greatest presence in the city. In fact, it is estimated that around 3 million visitors passed through Toledo in 2018 [78]. The historic centre of Toledo not only stands out for the value of its monuments, but also for its geographical location and landscape value: it extends over a steep and irregular rock that is surrounded and isolated by the Tagus River (Tajo in Spanish; Tejo in Portuguese), the longest river in the Iberian Peninsula. Mobility within the historic centre might be complicated by the nature of its own location, by the structure of a network of Muslim heritage and by its large extension (122 ha), The historic centre of Toledo not only stands out for the value of its monuments, but also for its geographical location and landscape value: it extends over a steep and irregular rock that is surrounded and isolated by the Tagus River (Tajo in Spanish; Tejo in Portuguese), the longest river in the Iberian Peninsula. Mobility within the historic centre might be complicated by the nature of its own location, by the structure of a network of Muslim heritage and by its large extension (122 ha), being the largest in Spain. Due to these attributes, the historic centre is mainly pedestrianised.
The historic centre has been losing resident population since the mid-20th century, going from 29,184 inhabitants in 1950 to 10,441 inhabitants in 2018 [27]. At the same time, public policies have revitalized economic dynamics in the historic centre that have promoted museification and touristification, but which consequently have also promoted processes of commodification of space and gentrification [79]. As the economic activity linked to tourism has strengthened, the numbers of tourists have been increasing substantially. Therefore, tourism and hospitality activities constitute the sector with the highest number of establishments in the historic centre [28]. At the same time, these tourism-oriented economic activities have grown following clear polarized spatial patterns in comparison with the location of the traditional and resident-oriented activities [27,79]. In this regard, multiple challenges arise for the management of visitor flows in the face of the pressure, saturation and congestion that takes place in certain areas of the historic centre [28]. In addition to this problem linked to the management of visitor flows, there are social organizations that have been warning of urban projects that could imply the disappearance of cultural vestiges in the surroundings of the historic centre and that, consequently, would jeopardize the status of WH, and with itself the quality tourist image of the city [80,81]. Although this is, for the time being, simply a theoretical problem (since only two WH sites have been delisted to date), the perception of the residents and other stakeholders about the situation is not a minor issue [16].

Data Collection and Cleaning
The data used in this study to mine visitor routes within the historic centre of Toledo were downloaded from the photo-sharing site Flickr, thanks to the possibility of obtaining geolocated photos via the Flickr API (https://www.flickr.com/services/api/). The geolocated photos used to reconstruct spatial trajectories cover eight consecutive years (photos taken from January 2010 to December 2017). The previously mentioned advantages of this photo-sharing website (see Section 2.2.3), including a large volume of available (free) data, influenced our decision to use it as our main data source. The "flickr.photos.search" and "flickr.photos.getInfo" methods of Flickr API on a Python script were used to gather and store a set of 57,824 geolocated photos (P), taken by 3077 users (U) within the administrative boundaries of the city, in a MongoDB collection. Each geolocated photo (p iu ) has the following attributes, among others: • id: unique ID of the uploaded photo (p id ). • owner-id: unique user ID of the person who uploaded the photo (u p ). • longitude: (geotag information) x coordinate (x p ). • latitude: (geotag information) y coordinate (y p ). • dates-taken: date and time when the photo was taken (t p ). • dates-posted: date and time when the photo was uploaded.
The data were uploaded into the R environment in order to clean and filter the dataset (see diagram flow presented in Figure 3). Out-of-date range photos were removed, since some photos uploaded in the reference period may have been taken prior to the defined temporal frame. A yearly exploratory analysis was carried out to identify whether the spatial patterns of the photos and users were different between years and the same patterns were found (see Figures A2 and A3 in the Appendix A). In addition, a similar number of users per year was detected (see Table A1 in the Appendix A). Then, only geolocated photos suitable for extracting visitor travel trajectories were selected. In this regard, we established a criterion to determine whether photos were taken by one-day visitors, tourists or local inhabitants since their behaviour and routes within the city are expected to be different. To this end, three indicators were calculated that allowed us to classify users according to their use of the social network (see Table 1). They are based on the average number of months active per year in Toledo, the difference between the maximum and the minimum dates of the photos taken during the same month in the city, and the total number of years active in the city.


dates-posted: date and time when the photo was uploaded.
The data were uploaded into the R environment in order to clean and filter the dataset (see diagram flow presented in Figure 3). Out-of-date range photos were removed, since some photos uploaded in the reference period may have been taken prior to the defined temporal frame. A yearly exploratory analysis was carried out to identify whether the spatial patterns of the photos and users were different between years and the same patterns were found (see Figures A2 and A3 in the appendix). In addition, a similar number of users per year was detected (see Table A1 in the appendix). Then, only geolocated photos suitable for extracting visitor travel trajectories were selected. In this regard, we established a criterion to determine whether photos were taken by oneday visitors, tourists or local inhabitants since their behaviour and routes within the city are expected to be different. To this end, three indicators were calculated that allowed us to classify users according to their use of the social network (see Table 1). They are based on the average number of months active per year in Toledo, the difference between the maximum and the minimum dates of the photos taken during the same month in the city, and the total number of years active in the city.  city level = 7-11; street level = 12-16. Our database consisted of photos with the two maximum levels of spatial accuracy (15 and 16). Hence, we did not have to exclude photos due to inaccurate coordinates. To detect potential spatiotemporal trajectories (STTs), each user's photo collection was denoted as P u ⊆ P, where all the photos p id ⊆ P u were taken by the same user (u p ) and were chronologically sorted as a spatial and temporal sequence (see conceptual scheme in Figure 4). For each user, multiple STTs can be reconstructed, as they may have taken the photos at different times of the day or on different days.
Months active/Year = Number of months active per year; Days difference (max-min) = difference between the maximum and minimum dates of a specific month, or mean of the difference between the maximum and minimum dates of n specific months in n years; 3 Years active = number of years active. Source: Authors' own elaboration.
It is worth mentioning that Flickr provides information about the accuracy level of the geographical coordinates of each photo, ranging from 1 to 16: world level = 1; country level = 2-3; regional level = 4-6; city level = 7-11; street level = 12-16. Our database consisted of photos with the two maximum levels of spatial accuracy (15 and 16). Hence, we did not have to exclude photos due to inaccurate coordinates.
To detect potential spatiotemporal trajectories (STTs), each user's photo collection was denoted as Pu ⊆ P, where all the photos pid ⊆ Pu were taken by the same user (up) and were chronologically sorted as a spatial and temporal sequence (see conceptual scheme in Figure 4). For each user, multiple STTs can be reconstructed, as they may have taken the photos at different times of the day or on different days. Each STT must have a minimum of 2 photos (with different spatial coordinates) to reconstruct a spatiotemporal trajectory, and the time difference between two consecutive photos in the same STT cannot be greater than two hours, as otherwise it is considered that it could highly disturb the trajectories obtained. A two-hour inactivity period (or lower) could represent the time that a user Each STT must have a minimum of 2 photos (with different spatial coordinates) to reconstruct a spatiotemporal trajectory, and the time difference between two consecutive photos in the same STT cannot be greater than two hours, as otherwise it is considered that it could highly disturb the trajectories obtained. A two-hour inactivity period (or lower) could represent the time that a user visits some specific space of interest, or stops for lunch or dinner. For example, a user with ten photos taken between 10 a.m. and 12 p.m., and five photos taken between 5 p.m. and 7 p.m. would have two sequences, since the time difference between the last photo taken at midday (12 p.m.) and the first one taken mid-afternoon (5 p.m.) is greater than two hours. After cleaning the data and implementing all the steps presented in Figure 3, a total of 33,051 photos taken between 2010 and 2017 (both years included) belonging to 1565 visitors were kept for analysis. Their spatial distribution can be observed in Figure 5. As can be observed in Table 2, in which the distribution per type of user of the Flickr accounts and photos kept for analysis is presented, more than two thirds of the users were classified as one-day visitors.
one taken mid-afternoon (5 p.m.) is greater than two hours. After cleaning the data and implementing all the steps presented in Figure 3, a total of 33,051 photos taken between 2010 and 2017 (both years included) belonging to 1565 visitors were kept for analysis. Their spatial distribution can be observed in Figure 5. As can be observed in Table 2, in which the distribution per type of user of the Flickr accounts and photos kept for analysis is presented, more than two thirds of the users were classified as one-day visitors.

Empirical Approach
The empirical approach followed in this article consisted of reconstructing STT between consecutive photos in order to build a city flow map. Due to urban characteristics of the historic centre of Toledo, presented in Section 3, and the fact that it is mainly a pedestrianised area, travel mode information is not considered. Subsequently, statistical procedures were used to find out whether or not there was an association between visitor mobility and land uses in the city centre.

Reconstruction of Spatiotemporal Trajectories
With the photo sequences, the ArcGIS Network Analyst extension was used to reconstruct each STT on the street network. First, the total number of Flickr photos per street segment was calculated via map-matching [82]. This procedure could also have been carried out with the number of unique users per street segment. However, based on studies that used the total number of photos to define the popularity of sightseeing hotspots [67,73], and considering that, after an exploratory analysis, we did not detect differences in the spatial distribution of photos and users, we opted for the first option. This variable was used to establish the hierarchy of the street network (The street network used in this study has been downloaded from the download centre of the National Geographic Institute of Spain (http://centrodedescargas.cnig.es) and belongs to the Project CartoCiudad (https://www.cartociudad.es). The average length of each street segment in the historic centre of Toledo and the standard deviation is around 40 m.) (see Figure 6). In our study, using quintiles we defined the thresholds between hierarchies: the highest level (1) was assigned to those street segments with more than 37 photos; hierarchy 2 to those with 12 to 36 photos; hierarchy 3 to those with 5 to 11 photos; hierarchy 4 to those with 2 to 4 photos; hierarchy 5 to those with 0 to 1 photos. The hierarchy used a heuristic method that mostly limited the route search to the highest levels of the hierarchy [83]. This means that the streets with a higher number of Flickr photos are also more likely to be selected as routes by each user sequence.

Empirical Approach
The empirical approach followed in this article consisted of reconstructing STT between consecutive photos in order to build a city flow map. Due to urban characteristics of the historic centre of Toledo, presented in Section 3, and the fact that it is mainly a pedestrianised area, travel mode information is not considered. Subsequently, statistical procedures were used to find out whether or not there was an association between visitor mobility and land uses in the city centre.

Reconstruction of Spatiotemporal Trajectories
With the photo sequences, the ArcGIS Network Analyst extension was used to reconstruct each STT on the street network. First, the total number of Flickr photos per street segment was calculated via map-matching [82]. This procedure could also have been carried out with the number of unique users per street segment. However, based on studies that used the total number of photos to define the popularity of sightseeing hotspots [67,73], and considering that, after an exploratory analysis, we did not detect differences in the spatial distribution of photos and users, we opted for the first option. This variable was used to establish the hierarchy of the street network (The street network used in this study has been downloaded from the download centre of the National Geographic Institute of Spain (http://centrodedescargas.cnig.es) and belongs to the Project CartoCiudad (https://www.cartociudad.es). The average length of each street segment in the historic centre of Toledo and the standard deviation is around 40 m.) (see Figure 6). In our study, using quintiles we defined the thresholds between hierarchies: the highest level (1) was assigned to those street segments with more than 37 photos; hierarchy 2 to those with 12 to 36 photos; hierarchy 3 to those with 5 to 11 photos; hierarchy 4 to those with 2 to 4 photos; hierarchy 5 to those with 0 to 1 photos. The hierarchy used a heuristic method that mostly limited the route search to the highest levels of the hierarchy [83]. This means that the streets with a higher number of Flickr photos are also more likely to be selected as routes by each user sequence.  Table 3 shows the number of visitors that took photos in the historic centre of Toledo at any time between 2010 and 2017, with their corresponding number of potential spatiotemporal trajectories (STTs). After running the network analysis, 50% of the potential STTs (1048) were reconstructed, since two additional debugging processes were carried out: • Around 2% of the potential STT could not be reconstructed due to sequence error problems between photos (illogical distribution).

•
Around 48% of potential STTs were discarded since they had an insufficient spatial distance between photos (trajectories that were no longer than 1 km). 1 Potential = maximum possible number of spatiotemporal trajectories to be reconstructed (derived from the analysis of the photos kept for analysis); 2 SV = Spatial Variability between photos forming the same sequence. If the length of the sequence is shorter than 1 km, it is discarded. Source: Authors' own elaboration.
Finally, with the 1048 STTs reconstructed, a city flow map was a built and the tourist sites offered in the official tourist map of the historic centre of Toledo were classified according to the percentage of STTs that passed through the streets where they are located. In other words, with respect to the total number of STTs reconstructed, it was calculated the number of trajectories passing through the different streets of the heritage centre of Toledo, following this formula: where n s is the number of STTs per street and n t is the total number of STTs reconstructed. Then, the 32 tourist attractions promoted on the official tourist map of Toledo (see Figure A1 in the Appendix A, corresponding to the official map) were classified based on this indicator, establishing the following criteria: • >30% STT per street: primary attractions of 1st order.

Association of Visitor Mobility and Land Uses
Finally, two statistical procedures were implemented with the number of visitor STTs per street.
On the one hand, it was correlated with the land uses (http://www.catastro.meh.es/esp/acceso_infocat.asp) in the historic centre of Toledo using a rank correlation test. Specifically, we calculated the Spearman correlation indicator between the number of visitor STTs per street and the total number of square metres allocated to the following different land uses (both in absolute terms and relative terms: percentage concerning the total surface): • Warehouse-parking: garages, storage rooms and parking lots. Furthermore, with these data related to the land uses, the Shannon index [84] was calculated to assess the diversity and mixture of land uses (see Figure 7), and this was also correlated with the number of visitor STTs per street.
On the other hand, two linear regression models (Ordinary Least Squares-OLS-method) were applied according to the nature of the explanatory variables (absolute terms: square meters; relative terms: percentage of square meters with respect to the total). A stepwise method for fitting the models was selected, and hence only significant and explanatory variables were kept in the models' specifications. The OLS can be defined as follows: where Y is the dependent variable that we are modelling (number of STT per street); β 0 is the constant/intersect; β n are the coefficients that determine the relationship and intensity of each explanatory variable (x n ) with respect to the dependent variable (Y). The sign (+/−) associated with the coefficient indicates whether the relationship is positive or negative; and ε is the residual error (portion of the dependent variable that is not explained by the model). Furthermore, with these data related to the land uses, the Shannon index [84] was calculated to assess the diversity and mixture of land uses (see Figure 7), and this was also correlated with the number of visitor STTs per street. On the other hand, two linear regression models (Ordinary Least Squares-OLS-method) were applied according to the nature of the explanatory variables (absolute terms: square meters; relative terms: percentage of square meters with respect to the total). A stepwise method for fitting the models was selected, and hence only significant and explanatory variables were kept in the models' specifications. The OLS can be defined as follows: where Y is the dependent variable that we are modelling (number of STT per street); 0 is the constant/intersect; are the coefficients that determine the relationship and intensity of each explanatory variable ( ) with respect to the dependent variable (Y). The sign (+/−) associated with  Table 4 presents the average length of the 1048 visitor STTs that were reconstructed with their corresponding standard deviation. Overall, the average length of the SST amounts to 4.9 km. Those visitors that do not overnight in Toledo have a shorter average length than those visitors who stay overnight. This result is in line with the literature on visitor spatial behaviour, which pinpoints those visitors with shorter stays as those who tend to stay centrally and visit outstanding attractions, while those with longer stays tend to visit a greater range of attractions and have wider mobility patterns [25,85]. On the contrary, we have not detected a clear difference between first-time visitors and repeaters as some other researches did previously [20]. This may be related, on the one hand, to the fact that there are few STTs corresponding to repeating users (135 repeat vs. 913 first-time). While, on the other hand, surely more than one first-time visitor may be an actual repeat visitor, for the simple fact that we do not know the complete travel history of individuals, but rather we use the proxy that Flickr offers us in this regard. The number of STTs per street segment are mapped in Figure 8, together with the 32 tourist attractions promoted on the official tourist map of Toledo. The STTs classified according to the type of users, are also presented in Figure 9. In each map presented in Figure 9 the STTs per street were mapped as a percentage of the total reconstructed STTs of each type of user to ensure comparability.

Visitor Mobility Patterns in the Historic Centre of Toledo
Existing literature on visitor spatial behaviour indicates that the spatial distribution of primary and secondary attractions has an essential influence over visitor spatial behaviour, irrespective of whether the attractions are clustered or dispersed influence whether visitors move widely or narrowly within a given destination [25]. Every tourist city/destination has its idealised sight images or icons that are promoted by both public and private tourism operators. This generates a pushing effect towards these sites that leads to a repetition of itineraries, and hence the creation of mobility patterns that are transversal across visitor types [86]. In the particular case of the historic centre of Toledo, the visitor flows are characterized by forming a spine that runs from east to west (or vice versa) and by a peripheral distribution towards the north and, to a lesser extent, towards the south. Apart from the central spine of Toledo, the viewpoint located in the south of the historic centre (just on the other side of the Tagus River) and the access gates in the north of the city centre (Puerta de Bisagra and Puerta del Sol) also stand out.  Therefore, from the maps, it is possible to identify the streets that absorb more visitor flows, while revealing at the same time those areas within the city that are underexplored by visitors. The streets strolled and places visited within the city tend to be concentrated [21]. Outside these streets the tranquillity is high. In this regard, considering there are multiple monuments underexplored and numerous potential itineraries, a strategy to manage visitor flows should be carried out [28,41]. In this regard, from the maps shown, it is possible to classify the tourist sites according to the percentage of STT per street segment where they are located (see Table 5). Hence, iconic tourist sites occupying a neuralgic position along the most fluctuated streets, such as the Catedral de Santa María (Saint Marie Cathedral), in the city centre, or the Puerta de Alcantara (Alcantara's Gate), in the east access to the city centre, have been classified as primary attractions of first order. Other important sites such as the Puerta de Bisagra (Bisagra's Gate) or the Alcázar have been classified as primary attractions of second order. Then, sites such as the Greco and the Santa Cruz museums have been classified as secondary attractions. Other sites located in peripheral areas or adjacent streets, such as the Tavera Hospital or the Transito synagogue, have been classified as complementary attractions. Meanwhile, those places located in streets where less than 2.5% of the STTs pass through, have been assigned the category of "off the beaten track", since these places have an almost non-existent incidence on visitor spatial behaviour. Precisely, from the perspective of the design and planning of strategies for visitor flows management, these attractions should gain visibility to deconcentrate the flows from primary attractions. Existing literature on visitor spatial behaviour indicates that the spatial distribution of primary and secondary attractions has an essential influence over visitor spatial behaviour, irrespective of whether the attractions are clustered or dispersed influence whether visitors move widely or narrowly within a given destination [25]. Every tourist city/destination has its idealised sight images or icons that are promoted by both public and private tourism operators. This generates a pushing effect towards these sites that leads to a repetition of itineraries, and hence the creation of mobility patterns that are transversal across visitor types [86]. In the particular case of the historic centre of Toledo, the visitor flows are characterized by forming a spine that runs from east to west (or vice versa) and by a peripheral distribution towards the north and, to a lesser extent, towards the south. Apart from the central spine of Toledo, the viewpoint located in the south of the historic centre (just on the other side of the Tagus River) and the access gates in the north of the city centre (Puerta de Bisagra and Puerta del Sol) also stand out.
Therefore, from the maps, it is possible to identify the streets that absorb more visitor flows, while revealing at the same time those areas within the city that are underexplored by visitors. The streets strolled and places visited within the city tend to be concentrated [21]. Outside these streets the tranquillity is high. In this regard, considering there are multiple monuments underexplored and numerous potential itineraries, a strategy to manage visitor flows should be carried out [28,41]. In

Land Uses Association with Visitor' Spatial Behaviour
As demonstrated in the previous section, the promotion of tourist sites on the official map shapes visitor spatial behaviour. However, little is known about the association of land uses with the spatial behaviour of the visitor within the heritage city. In this regard, the implementation of the Spearman correlation test (see Table 6) showcased a positive and moderate statistically significant association between the number of STTs per street segment and both the number of square meters assigned to retail and commercial uses (0.457) and their relative presence (percentage) per street (0.416). The office land uses also present a statistically significant coefficient, although more moderate. This is due to the fact that most of the administrative offices are located in historical buildings in the city centre, and also because of the location of bank offices in neuralgic areas. Furthermore, as explained in the methods section, the Shannon index [84] was calculated in order to assess the diversity and mixture of land uses. The correlation with the Shannon diversity index was also positive and moderate (0.403). This indicates that the visitors tend to be attracted by these streets with a wide variety of facilities and a higher diversity of land uses [30,31].
Finally, two stepwise linear regressions (OLS) were applied in order to determine which of the land uses are more determinant of the visitors' spatial behaviour in the historic centre of Toledo (see results in Table 7). In the first OLS model, in which the explanatory variables were included in relative terms (% of m 2 with respect to the total m 2 ), only two land uses were included as explanatory variables, since the stepwise method excluded other uses for collinearity reasons. Thus, according to the first model, the percentage of retail and commerce and the residential use are the two land uses with the highest explanatory incidence on the dependent variable (number of STT per street). Specifically, they showcased a contrary sign, reinforcing the results obtained in the Spearman correlation test.
In the second OLS model, in which the explanatory variables were expressed in absolute terms (m 2 ), the stepwise method kept three explicative variables, with the retail and commerce square meters being, as in the first model, the variable with the highest standardised coefficient (Std. β), followed by the singular buildings and religious uses.
Both OLS models underline that retail and commerce use is the one associated the most with the presence of visitor routes. However, the presence of religious uses and singular buildings (positively) and the residential uses (negatively) are also important explanatory variables of the spatial behaviour of visitors.
The high association between the visitor spatial behaviour and the commerce use is related to the fact that commerce uses tend to be spatially located in neuralgic areas with great potential to attract demand (or with an already existing one). Hence, not only the concentration of commerce activities acts as a push factor of the spatial behaviour of visitors, but also the coexistence with other land uses with interest for sightseeing determines the visitor flows. In fact, these results are supported by the existing literature related to the particular case of Toledo [27,79], that indicates that the commercial axes of the historic centre of Toledo concentrate most of the tourist-related economic activities, services and facilities, and coincide with the tourist route that visitors usually follow.

Implications of Our Findings and Main Contribution to the Field
This article has demonstrated that geolocated data, in particular that from big data and social media, is an undeniable and promising data source for geographical and tourism research, especially in the field of urban tourist mobilities. The article reveals that the reconstruction of visitor spatiotemporal trajectories from geotagged Flickr photos can be developed with a granularity and geographical level not addressed previously. Most of the earlier studies have concentrated on major cities [34] and in natural areas [74,82]. This research, on the contrary, has focused on the city centre of a medium-sized historic city. In fact, previous research showcased how to identify tourist attractions and generate frequency graphs between them [32,34,73]. However, none of them attempted to connect these tourist graphs/routes with the street layout.
The availability of data to develop studies on urban tourist mobility is often limited, and, therefore, this study is one more element to advance on the path of integrating the opportunities that open up the big data sources in the management of flows in touristified cities. Understanding visitors' spatial behaviour has brought new knowledge about new visions about the built environment and its uses [4], and, consequently, new flexible planning and management procedures could be developed. Our results have identified overflowed streets, and also shadow areas underexplored by visitors. Accordingly, the tourist sites have been classified based on the percentage of flows channelled through the streets where they are located.
In line with the study by Paül i Agustí [29], the map of visitor flows has allowed us to see that there is a partial overlap between the Toledo official tourist map and visitor spatial behaviour. In addition, the results showcased that visitor mobility patterns tend to be concentrated in streets where the commercial and recreational uses are greater [31], especially when these are related to the provision of products and services to the tourists and, therefore, are spatially concentrated in neuralgic areas [27,79]. The results provide clues on how to manage visitor flows [18,19] to protect heritage assets to serve the needs of the tourists and stakeholder communities [15,16]. The creation of alternative routes to promote undervisited tourist sites and the suggestion of alternative locations for tourist-oriented economic activities (such as leisure or tourist information areas) could be a help to this end [87]. Those underused tourist sites could be earmarked for tourism promotion and marketing strategies among visitors [28] in order to redistribute and redirect visitor fluxes, manage overcrowding and, at the end, configure a more sustainable and competitive tourism model [17,35]. In this regard, the influence of markers and signals in the city should be studied, since they can shape visitor spatial behaviour, distributing mobility flows towards less-visited tourist attractions [40,41].

Limitations
The relevance of results obtained is related to the fact that by using the location, date and time associated to photos uploaded by visitors to a sharing platform, we can reconstruct detailed spatiotemporal trajectories along with the urban network. This is especially important if the limitations of the data source are considered. In the first place, there might be a bias induced by the low user penetration of image-sharing websites. Although previous studies revealed that half of the users taking photos during a trip posted them online, and the portion of those who did that on sites like Flickr was considerable [88], we do not have this certainty for the particular case of the historic centre of Toledo. The number of tourists and the number of Flickr users have been demonstrated to be highly correlated particularly in North American and European countries [56,57]. However, considering the scale of our analysis (urban level), in future lines of research the results obtained should be cross-checked with other sources of information, such as counting sensors or video cameras, that allow us to refine the methodology used. Secondly, photographers do not consider everything equally worthy of capturing, and they are highly selective where, with whom and through which channels they communicate their activities and the places they visit [70,89,90]. The positive is that Flickr, as an open access repository designed to share photos of places of interest, induces less noise than other less professionalized social networks where the type of user tends to show greater self-attention rather than attention to the space visited. This allows, therefore, to identify precisely the sightseeing hotspots [75,76] and approximate the tourist behaviour with higher accuracy. Linked to the previous, there is a third limitation that relates to the impossibility to gather information about the purpose of the displacements. For instance, there are some trips that might not be reflected on our spatiotemporal trajectories, such as those displacements from the parking or the train station to the city centre, or those roundtrips to the accommodation, among others. Finally, an additional drawback of this kind of data is that there is no detailed information on the demographic and socio-economic profile of the visitor, or about their previous travel experiences. To illustrate this limitation, around 50% of our sample did not include information about the country of origin of the user, a fact that limits the possibility of segmenting the reconstructed trajectories by tourist markets. Nevertheless, despite of these limitations, with our method, we have been able to reconstruct 50% of the potential visitor spatiotemporal trajectories (1048), which, at the end, represents a higher number of trajectories, at a lower cost, than those that can be obtained using other data sources (e.g., GPS devices, travel diaries, among others). However, in the present study, the fact of analysing a pedestrianised historic centre has allowed us to reconstruct the trajectories without considering modal distributions. Thus, in centres that do not have restrictions on private vehicles it will be important to take this aspect into account.

Future Research Directions
In future lines of research, interactions between visitors and locals could be analysed [63] and complemented with qualitative research that helps to measure the perception, of both tourists and residents, about the everyday sites of activity that are being transformed as a consequence of how visitors gaze and go places [11,12]. Hence, potential coexistence problems could be addressed accordingly. Furthermore, another research direction should be dedicated to the exploration of the potential of data from photo-sharing platforms in terms of up until what extent it is possible to detect different spatial behaviour on the basis of the origin of the tourists, considering the information provided in the profile of the users and the historical register (if available), and the type of visit undertaken, considering the potential detection of organised trips thanks to the datetime and geographical attributes of the photos.
Moreover, the reconstruction of visitor spatial behaviour following the method presented could be applied in other contexts of heritage cities in order to test the validity of the data source. Although it is true that Flickr is one of the most used platforms [88], especially in western countries [56,57], the use that users make of the internet is everchanging and academia and practitioners must be up to date with new sources that can improve the analysis of mobility. The integration of different data sources, the analysis of how managers and public authorities market the destination and the evolution of the destination must be contemplated to correctly measure the mutual and reciprocal shaping between visitor behaviour and cities' characteristics [4].
This need to be updated on the opportunities that gives us the big data for monitoring human mobility in general, and visitor mobility more specifically, gains relevance in the current context of health emergency marked by the COVID-19 pandemic. Our study shows that the data from open access repositories of geolocated photographs allow the development of agile and cost-effective analyses of the mobility of visitors. Therefore, new emerging lines of research are related to knowledge acquisition from the analysis of how the pandemic is generating a disruptive effect (or not) on mobility patterns at a local scale. This is of significant value in dense urban contexts, where a high volume of population converges and related challenges in the use of the space may emerge. In addition, the fact that international mobility has been at ground levels for some months due to global mobility restriction measures, strategies to promote locally-based tourism flows and activities have been boosted [36], which could also generate other, perhaps more intense, ways of consuming cities that deserve to be analysed.