Visualizing Digital Traces for Sustainable Urban Management: Mapping Tourism Activity on the Virtual Public Space

: One of the challenges of heritage cities is sustainably balancing mass tourism and the daily life of its residents. Urban policies can modulate the impact of tourism through regulations focusing on areas with outstanding visitor pressure, which must consequently be delimited accurately and objectively. Within a traditionally data-scarce discipline, urban practitioners can currently employ a wide range of tracking technologies, but because of their limitations can also greatly beneﬁt from new sources of data from social media. Using Barcelona as a testbed, a methodology is presented to identify and visualize hot spots of visitor activity using more than a million public geotagged images collected from the Flickr photo-sharing community. Multiple complementary visualization approaches are discussed that are suitable for different scales of analysis, from global to sub-block resolution. The presented methodology is ﬁrmly grounded in a well-established spatial statistics framework, adapted to a “big data” environment, to extract knowledge from social media. It is designed to generalize to other urban settings, providing substantial advantages over other surveying methods in terms of cost-efﬁciency, scalability, and accuracy, while capturing the behavior of a larger number of participants and covering more extensive areas or temporal spans.


Urban Planning and Sustainable Tourism Management
Since the mid-20th century, the number of people living in urban areas has steadily increased in absolute and relative terms [1]. With the majority of the world's population living in urban areas, and projections indicating that over the next 30 years, most of the population growth will occur in cities, urban practitioners (planners and designers) require a deeper understanding of urban phenomena [2] and the mechanisms that drive their processes [3].
One of the challenges modern cities face is the management of mass tourism [4], which despite being capable of driving substantial economic growth [5], can hamper the daily life of their inhabitants [6] and negatively influence the experience of visitors themselves [7], as well as the environment [8].
However, the effects of these impacts are not distributed uniformly across the city [9], since the physical presence of visitors tends to concentrate on cultural heritage sites in city centers [10] while the effects on the residential market [11] or the environmental impact of the floating population [12] can affect other neighborhoods across the city.
Therefore, the accurate delimitation of outstanding visitor pressure areas is crucial for implementing policies that modulate these activities and keep them within their carrying capacity [13]. The taxation of specific uses, or subjecting tourism-related businesses to the procurement of a mandatory license within these zones, can sustainably balance local and visitor presence in the public space.

New Challenges in Urban Management
Urban planning and urban design shape the complexity of cities through building codes, urban regulations, or housing policies. While these disciplines have traditionally been data-scarce [14], the growing complexity of urban phenomena requires complementing traditional approaches [15,16] with data-driven methodologies [17], providing public authorities with actionable information [18] to respond to new challenges and guide urban policies [19].
In this context, urban planners and designers can significantly benefit from new sources of data from social media to complement long-established methodologies based on cadastral data [20]. However, this approach requires addressing several issues in the methodological framework [21], derived not only from the increased amount of data but also from the quality assurance necessary [22] because of the informal nature of these sources [23].
This paper analyzes the tourist activity from an urban management perspective using more than a million picture locations of Barcelona collected from the Flickr service [24]. The methodology focuses on the visualization aspects to extract knowledge from social media data effectively [25] as a communication tool to make its complexity interpretable intuitively for stakeholders, using a principled approach to data representation [26] and reproducibility [27].

Research Questions and Hypotheses
The main aim of our research is to investigate whether visualizing the digital traces of tourist activity can provide an advantage over established methods of tracking visitor behavior. The paper focuses on two main issues from a sustainable urban management perspective to understand the tourism activity using social media data: the origins of the visitors at a global scale and the location of the hot spots of visitor activity in the public space. The main hypothesis that supports research question is that the location of digital traces in the virtual space is a proxy that reflects the actual behavior of users in the physical public space, and in particular that picture-taking activity is closely aligned with visitor behavior.
A secondary research goal focuses on identifying the methodological aspects required to extract actionable knowledge from raw unstructured data from social media. The paper investigates the complete workflow developed in a case study, from the initial data collection, followed by the data cleaning and transformation, and finally the most effective visualization strategies to communicate the results to the intended audience. The main hypothesis of the secondary research goal is that a graphical summary can provide understanding to a non-expert audience, despite the size and complexity of the data sources.

Capturing Human Spatial Behavior
The strategy of recording human activity in public spaces to support planning decisionspioneered by Pushkarev [28] and Whyte [29]-has progressively percolated in the practice of urban planning and design [30], adopting a multifaceted approach [31]. This approach is critical to understand spatial behavior at the three main scales of analysis of the built environment: regional planning [32], urban planning and design [33], and architectural spaces in building interiors [34].
Recent technological advances in optical tracking [35] allow tracking behavior using cameras but raise privacy concerns, which can be mitigated using thermal imaging technologies [36] or motion sensing devices [37]. Other approaches use GPS technology [38] using handheld units [39], but require a sufficient number of volunteers that agree to par-ticipate in experiments [40]. The recruitment of participants for extended periods is also an issue in the emerging technologies of eye-tracking [41] and virtual reality simulations [42].
In contrast, the indirect tracking of spatial behavior can cover a broader temporal and spatial scope at the expense of reduced accuracy. The classic approach has relied on demographic [43] or transportation [44] statistics, but despite current advances in open data initiatives they are not universally applicable. Recent approaches aggregate data from cell towers [45], requiring ad hoc partnerships with telecommunications companies [46] at urban [47,48] or regional scales [49,50].
These technologies have been applied to tourism research [51], but can be expensive and time-consuming [52]. Moreover, they cannot be used in some research areas as they require designing and conducting an experiment, which is not always feasible, especially for retrospective or spatially extensive studies [53], in contrast to social media data [54].

Digital Traces, Picture Sharing and Tourism
Photography is inextricably associated with tourist activity, from the initial choice of destination browsing images on the Internet [55], through the motivations of the decision of taking a picture of oneself [56] or at a particular location [57], to the final selection of pictures posted online [58].
Current mobile devices are equipped with cameras, accurate GPS, and pervasive Internet connections, with user-friendly applications that allow users to post content on social networks. Since the introduction of the iPhone in 2007 and the generalization of smartphone use [59], the production and distribution of user-generated content of Web 2.0 services provide unique opportunities to gather data on user behavior at urban [60], regional [61], and global [62] scales.
Before geotagged media became pervasive, social media research in the tourism and hospitality field focused on travel planning from the consumer perspective, or the promotion of related services from the supplier perspective [63]. At the time of writing, location data from social media [64] can be mined from multiple platforms [65], and these emerging sources of data [66] can be used as a substitute for travel diaries [67] or replace resource-intensive field surveys [68].
The developed methodology focused on geotagged pictures collected from a photosharing community, as picture-sharing services typically provide a better location accuracy [69] than messaging platforms [70], which was required for the intended resolution of the analysis.

The Case of Barcelona
Promoted as a tourist destination since the beginning of the 20th century [71], and following several decades of international tourism expansion in Spain [72], Barcelona experimented an exceptional growth in the number of visitors [73] after the celebration of the Summer Olympic Games of 1992 [74] and is currently the 12th most visited urban destination in the world [75], holding the 6th position in the Airbnb top destination cities [76].
This sustained popularity as a tourist destination has situated Barcelona as an excellent candidate to test the proposed methodology, providing a significant volume of data for analysis as a thoroughly visited and photographed destination; according to Sightsmap and based on data from the discontinued Panoramio service, Barcelona was ranked as the third most photographed city in the world, while according to Flickr usage it was the 6th city in terms of the ratio of pictures of visitors versus its official population [77].
Moreover, from the perspective of its physical structure, the diversity of the urban fabric of Barcelona [78], where the historical evolution of the city has shaped a complex built environment [79] with a collage of different neighborhoods with their own distinct character and functional mix [80], has ensured that the developed methodology was suitable in a wide variety of urban morphologies and was capable of capturing the spatial dynamics of a rapidly changing city [81].

Retrieval of Picture Locations
Flickr is a photo-sharing community founded by the Canadian company Ludicorp in 2004, subsequently acquired by Yahoo in 2005, and owned by SmugMug since April 2018. The service allows users to share image content and receive feedback from its online community. Although it is possible to browse its catalog of public images without registration, the creation of an account is mandatory to upload content and use its social networking features.
The service supports storing spatial and temporal metadata if available, which are included by most location-enabled devices and stored in the Exchangeable Image File Format (EXIF) header of geotagged image files by default. Data can be accessed using a well-documented Application Programming Interface (API), which replicates most of the functionality available on its website through its entry point, which offers 222 methods at the time of writing.
Data collection was performed in March 2017 through the Flickr API using a custom script developed in the R programming language [82]. Using the flickr.photos.search method of the Flickr API, images were requested within a bounding rectangle encompassing the city limits of Barcelona in the WGS global reference system for geospatial information (EPSG:4326). The collected data consisted of 1,166,704 unique geotagged pictures and covered a period of 12 full years (from the beginning of 2005 to the end of 2016).
After parsing the 69 fields of the collected records in the JavaScript Object Notation (JSON) format, the coordinate pair of each geotagged picture was projected into the spatial reference used by the cartographic services of the city of Barcelona (EPSG:25831) with the maximum accuracy possible using the PROJ.4 library [83].
The locations outside the city limits, which included the points in the harbor within the Mediterranean Sea, were discarded with an overlay operation. Outliers that had the capacity to bias the results were excluded, in particular users with a disproportionate number of pictures (super-users) and coordinate pairs with an abnormal quantity of exactly coincident coordinate pairs, which were indicative of unreliable location precision. The resulting dataset contained 834,328 locations, a 28% reduction from the total points collected.
These raw point locations provided a detailed overview of the spatial distribution of the pictures publicly shared by Flickr users (Figure 1). However, the picture locations in this map did not provide information on whether the user who uploaded the picture to the Flickr service was a local or a visitor, making it unsuitable to identify the specific behavior of either group. In addition, while the map was capable of identifying the areas with a greater concentration of pictures, the large number of overlapping points could not provide an accurate representation of the true density in the most popular areas.

User Location Data
While it was possible to classify users into locals and visitors by analyzing their temporal behavior [84], this approach could incorrectly label a significant number of commuters from the metropolitan area as visitors, or conversely label recurring visitors taking advantage of short-haul low-cost flights as locals. To increase the accuracy of the classification, the information of the public profiles of the 34,283 users who posted the collected pictures was queried using the Flickr API, obtaining the time zone [85] for 72% of them using the flickr.people.getInfo method and, more importantly, the place of residence [86] of roughly 50% of the profiles, combining the information from the flickr.profile.getProfile and flickr.people.getInfo methods (Table 1).  Table 1. Flickr user's location-related data available using the discussed methods. Sets overlap partially and therefore combined data collected from the flickr.profile.getProfile and flickr.people.getInfo methods yield location data for 50% of the users. The anonymity of these users was preserved at all times during research using only their Flickr-assigned unique identifier (NSID). While most users shared the same time zone as Barcelona (Figure 2a), the proportion of users with location data appeared constant across all time zones (Figure 2b). The maps indicated that the time zone corresponding to Barcelona had the largest number of users, which was consistent with an overrepresentation of locals or visitors from locations close to the destination. In contrast, the maps suggested that the proportion of users that provided location data in their profiles was consistent across all time zones. This visual hypothesis was confirmed using the Cohen's D statistic [87], confirming that there was not a statistically significant difference in user engagement (measured as the number of geotagged pictures uploaded) when the user's profile included time zone information (p < 0.01) or when they included their location (p < 0.001), compared to users that provided no public data. It was therefore considered that there was not a consistent bias in the data.

Mapping Cities of Origin
The publicly shared location information collected from Flickr through its API consisted of unstructured text strings, with the name of the locality and/or the name of the country of residence of each user. This location information was not usable in the provided raw format, and it was necessary to extract the geographic coordinates corresponding to the named locations. This process required processing 10,908 unique user location strings with a custom script, obtaining 5693 unique geographic locations from which country and city of residence could be extracted and assigned to the corresponding users and-by extension-their pictures.
The coordinates of the user's locations showed the spatial distribution of the visitors who posted a geotagged picture of Barcelona on Flickr, which were concentrated in the European Union (Figure 3), and in particular on the 'blue banana' European Megalopolis of the Manchester-Milan axis (which is well connected by land and air to Barcelona), but also on the East and West coasts of the United States of America. While this visualization approach provided informative maps of the visitors' locations, its interpretation can be challenging for stakeholders, because the spatial distribution of settlements is not uniform across all areas of the world and can suggest a higher concentration of visitors from areas where the population resides in small towns, and conversely underestimate areas with fewer but more populated cities. It also requires significant geographic knowledge to interpret the maps and extract meaningful conclusions.
Furthermore, the representation of locations as dots in a map makes overlapping locations in densely populated areas challenging to identify, although this can be partially addressed by overlaying geometries using transparency, or using the screen blending mode in image processing. Furthermore, encoding the number of users as the size of the dots increases the overplotting of locations, and introduces additional interpretation challenges because of the wide range of the values to represent (more than three orders of magnitude), as well as perceptual biases in symbol size judgment [26].

Treemap Representation
A treemap is a space-filling visualization [88] of hierarchical structures using a series of nested rectangles. The areas of those rectangles are proportional to an associated numerical value, and each rectangle can be recursively tiled into smaller rectangles corresponding to a sub-level. This representation summarizes the information hierarchically and with an efficient use of space, allowing the detection of patterns present in the data that could otherwise be impossible to identify during exploratory analysis. In contrast, presenting the information as a bubble map can hide patterns present in world-scale data, as discussed above.
The treemapify 2.5.5 R package [89] was used to summarize a global perspective of the visitors' origins in a compact and intuitive visualization. The representation was split according the spatial granularity of the smaller aggregation unit, corresponding either to the city or the country of origin. To avoid clutter, only the units above a minimum user threshold were represented (1/1000 of its corresponding group), while the rest were consolidated into an overflow category.
The treemap corresponding to the smaller aggregation unit (city) was grouped according to seven geographic scopes around the city (Figure 4): municipal (corresponding to residents), metropolitan (well-connected cities in the functional area of influence), regional (level-2 administrative level), autonomic (level-1 administrative level), national (level-0 administrative level), and the countries that belonged to the Schengen area at the time of data collection. These scopes were chosen to reflect the different costs in the mobility patterns and other limitations in the travel (language, nationality, and border checks). The results showed that Barcelona residents were the largest group, but their share was relatively small (13% of users). The most populous European and North American cities were well represented, as well as some smaller Catalan localities near Barcelona (Badalona, Sabadell, Terrassa), which were however surpassed in number by travelers from Madrid, suggesting a strong connection between the two largest metropolises in Spain, which are well-connected by air through one of the world's busiest air shuttle routes and, more recently, through high-speed rail, in operation since 2008.
The treemap corresponding to the larger aggregation unit (country) was grouped according to the seven main regions [90] defined by the World Bank ( Figure 5). This classification was designed to help the stakeholders focus their promotion efforts on specific countries, train the staff to handle specific languages, and tailor their offerings to cater to different cultural backgrounds. This representation summarizes visually that the majority of users resided in European and Central Asia macro-region, which is home to more than 3/4 of the users. Spaniards were expectedly the most represented group, and beyond domestic visitors most users in this region originated from the United Kingdom, followed by Italy, France, Germany, and the Netherlands. In North America, the number of visitors from the United States was comparable in size to the United Kingdom, and Canadian visitors were similar in number to the ones coming from the Netherlands. In contrast, visitor counts from the whole of Latin America were about the same size as visitors from France, despite including Brazil, which accounted for almost the same number of users as Canada or Russia. Asian visitors were comparable in number to Germans. In contrast, visitors from the Middle East and North Africa, South Asia, or Sub-Saharan Africa were almost absent.

Picture Locations and User Origins
With these data, users were classified as locals when they belonged to one of the 164 municipalities of the Barcelona Metropolitan Region defined by the Regional Plan of Catalonia [91] and visitors otherwise. The origin of each user was assigned to all her pictures, resulting in 140,566 pictures from locals (17%), 348,197 from visitors (42%), and 345,565 for which the origin was not determined (41%). According to this classification, the number of pictures uploaded by locals and visitors did not show a statistically significant difference between both groups (p < 0.001), using Cohen's D statistic.
The point locations were classified according to the origin of the corresponding user who shared the picture. The results were visualized in a grid of maps focusing on 12 landmarks of the city at the same scale ( Figure 6), overlaid on a grayscale aerial photo. To avoid occlusions, the points were overlaid with the locals drawn on top of the visitors (because the smallest group was less likely to obscure the larger), with the pictures from users of unknown origin behind all of them.
The grid of maps revealed that the spatial distribution of the different user profiles was different across the city, exhibiting a higher intensity of visitors concentrated in the most recognizable landmarks, but also a higher spread around the areas of interest, while the local population tended to disperse more evenly across the city. However, beyond this visual assessment it was necessary to quantify the degree of clustering and dispersion to accurately measure the distinct spatial patterns of the behavior of locals and visitors.

Point Patterns
The visualization approach of the mosaic of maps of landmarks was very effective in showing the distribution of user origins in detailed scales, in particular when sorting the points in separate layers. However, it could not be applied to the whole area of study because of the differences in point density across the city.
The optimal visualization settings were dependent on both density and scale, because the point size and transparency designed for sparser areas were not suitable for the higherdensity clusters, and vice versa. Likewise, increasing the magnification was not adequate for lower-density areas, and while it was adequate for interactive visualizations where users could focus on an area of interest it was not an option for printed output, where the scale and map extent are fixed.
One of the approaches to visualize the full extent of the city was using small multiples [25], separating the map representation into a lattice of equally sized panels with the same resolution and extent, each containing the point pattern for each of the user profiles. In this case, the representation was sensitive to the size of the point symbols, because larger symbols tended to overlap in higher-density areas. This issue was alleviated using small-sized points for the maps and using transparency to identify areas with different degrees of overlap.
The resulting maps (Figure 7) showed that the point pattern corresponding to the local population exhibits a more dispersed pattern, and while it concentrates in the central areas, it is spread more evenly throughout the city. In contrast, the visitors' behavior is much more spatially clustered around touristic landmarks. The pictures taken by users whose origin could not be determined seem to exhibit a pattern between both extremes, indicating that they might correspond to a mixture of both profiles.
However, these qualitative judgments based on the visual analysis of the maps were adequate for exploratory analysis, but required a more accurate characterization based on the point pattern analysis framework.

Clustering
Although the spatial distribution of the points can be summarized into a single intensity value, the complexity of planar point patterns requires a different approach. However, purely graphical methods such as Fry plots [92] are not practical to compute for large sample sizes and are challenging to interpret visually. Therefore the majority of these summaries take the form of a function of some metric of the point pattern over a radius [93], generally plotted against the expectation of a Poisson process of the same intensity under the null hypothesis of Complete Spatial Randomness (CSR).
The K(r) function is the cumulative average number of points within a distance r of a typical point. In order to make the comparison of point patterns possible, the estimation is corrected for edge effects and divided by the intensity. The L(r) function is a common transformation of the K(r) function, which transforms the theoretical reference Poisson K-function into a straight line (Figure 8). The estimation was computed using the spatstat 1.64-1 [94] R package for the point patterns corresponding to the pictures taken by locals and visitors, and compared to the pictures without origin data. The corresponding function for the combined point patterns was also computed as a reference.
The results confirmed that the local pictures were more clustered than the rest of the patterns, as they were closer to the expected value under CSR. In contrast, the visitors exhibited a significantly higher clustering at all distances, but mostly higher at smaller ranges (up to 100 m), while the locals were more regular across all distance bands. The pictures from users whose origin could not be determined matched the global clustering pattern almost exactly, confirming the hypothesis that they consisted of a mixture of both profiles.

Rasterization
Compared to the maps, the graphical summary provided by the L(r) function described the clustering behavior of the different user profiles visually, and was supported by a robust spatial statistics framework. However, it could not provide an accurate representation of the specific areas of the city with a higher intensity of picture-taking activity. Therefore, it was necessary to develop a map representation to overcome the issues identified in the point maps while keeping the advantages of small multiples. The approach used rasterization, counting the number of pictures in a grid of pixels (quadrat counts), and addressed the main issues in the discretization process [95]:

•
The size of the aggregation unit; • The transformation of the aggregated values; • The representation of the values using a color scale.
The size of the aggregation unit (pixel) was determined from the spatial distribution of the points themselves, with the objective of optimizing pixel size: grids of very small pixels tend to contain mostly empty cells, while grids of very large pixels do not provide enough spatial resolution at the intended scale of analysis. Therefore, the discretization of points into an infinitely fine grid of pixels would consist of only zeroes and ones, and conversely, a single pixel that covered the whole area of study would consist of a summary of the density without any variance.
The chosen criterion was based on the "empty space" F(r) function, sometimes called "spherical first contact distribution" or the "point-to-nearest-event distribution". This function computes the probabilities (in the 0 to 1 range) of finding a point within a radius of any fixed reference location (Figure 9).
Because F(r) is defined as a cumulative distribution function, it was possible to define the pixel sizes where a certain fraction of the cells would have a point count different than zero, for each profile. These thresholds are shown as points on their respective curves, and their cutoffs as well as their corresponding pixel sizes (Table 2)    The pixel values were transformed to convert the counts into an attractiveness metric [96], which consisted of a multiplier in relation to a baseline, defined as a hypothetical uniform distribution of pictures. This metric was invariant by construction to both the pixel size and the number of pixels in the observation window. The resulting values were log-transformed to fit the computed magnitudes into the dynamic range of the color scale of the map, where some locations have a picture interest more than a thousand times larger than the reference baseline.
The numeric values of the cells were visualized in a perceptual uniform color scale (Parula). The results showed a very detailed spatial distribution of events ( Figure 10), but without the issues discussed in the plot of individual points [97]. The maps visually confirm the clustering of visitors in the major landmarks corresponding to the most famous buildings of the architect Antoni Gaudí, as well as their concentration around a few streets with a marked touristic character. In contrast, the panel corresponding to the locals confirms a more dispersed pattern around the central areas of the city but also reveals a high number of pictures taken by residents on the main landmarks of the city, indicative that the photographic attractiveness of these hotspots is aligned with the interests of both locals and the tourists, and that locals behave like tourists in these locations.

Kernel Density Estimation
The rasterized maps discussed above are useful to identify the landmarks in the city with the highest picture interest, but cannot produce the required delimitation of the areas of outstanding intensity. This delimitation is crucial to designate the areas where it is necessary to prioritize the implementation of policies that modulate tourist pressure, and requires a methodology that produces regions that can be delimited accurately, as a complement for maps that highlight specific landmarks.
The kernel density estimation (KDE) computes non-parametrically the intensity function of a point process [98] as a density map, and is suitable if this process is suspected to be inhomogeneous (Figures 8 and 9). Although the result is a raster composed of small-sized pixels, the underlying estimated intensity function can theoretically be discretized into an arbitrarily fine grid of pixels.
The point patterns were smoothed with a gaussian kernel, with a standard deviation (sigma) of 250 m, and therefore there was 62.8% of the mass around each point in a 250 m radius, and 95.4% in a 512 m radius. The values were chosen following the L(r) function (Figure 8), considering the radius where the curves approximately began to increase monotonically. This method was also effective in smoothing small inaccuracies in the coordinates of the points.
The KDE results were computed in a grid with the same 25 × 25 m resolution as the rasterized maps and classified in quantiles for each of the resulting pixel values. The classification was diverging and symmetric around the median, and defined a neutral (non-significant) area corresponding to the interquartile range, as well as two extremes to identify cold and hot spots (lower and upper quartiles, respectively). These cold and hot areas were classified into three intensities, with the extreme values corresponding to the lower and upper 1% (Figure 11).
The results clearly showed the hot spots of the visitors' activity, with had much more pronounced areas of interest around the landmarks in the city, compared to the local population. In particular, it was possible to identify the areas around two Gaudí landmarks (Sagrada Família and Park Güell), which in contrast did not appear as points of interest for the local population. The old quarter (Ciutat Vella) was identified as very popular area for both tourists and visitors, but in the case of tourists the area of interest was more compact, but extended through the axis of Passeig de Gràcia through another Gaudí landmark (La Pedrera). In addition, this approach was also capable of identifying and outline the "deserts", or areas devoid of tourist activity.

Discussion
Although digital traces provide an unprecedented opportunity for data-driven research on tourist activity in urban settings, it faces significant challenges regarding collection, processing, modeling, and visualization. This paper describes all the steps in this process in a coherent methodology, but focuses on the visual aspects which are oftentimes neglected but are the critical last step of the data analysis process [99] to communicate the results to the stakeholders effectively.
Multiple complementary visualization approaches were discussed, which are suitable for different stages of the analysis, from the exploratory phase where hypotheses are formulated, to the delimitation of the areas of outstanding visitor pressure. It also approaches the analysis using a multi-scale approach, from the global to the detail scales, aligned with the workflow requirements of the urban planning practice.
The presented methodology is firmly grounded in a well-established spatial statistics framework, adapted to a "big data" environment, to extract knowledge from social media, condensing loosely structured data into effective and interpretable visualizations, following the principles of the grammar of graphics [100] in a reproducible computational environment.
The results reveal the distinct digital traces of locals and tourists, adapting the visualization strategy to the specific requirements of multiple scales and resolutions of analysis. Despite the complexity of the raw data, the resulting maps condense social media activity into informative representations that reduce the cognitive load of the intended audience to interpret the results.
The discussed methodology addresses the main research question regarding the suitability of social media data as an emerging source to investigate tourism behavior from a sustainable urban management perspective. The methodology was designed to be generalized to other areas of study and provides a substantial advantage over other surveying methods in terms of economy, feasibility, and accuracy. In particular, identifying the geographic origin of individuals is not possible with camera-based tracking, and obtaining data on the behavior of a large number of participants is not practical at the urban scale.
Despite these advantages, the collected data can be considered a convenience sample, and WEIRD (western, educated, industrialized, rich, and democratic) subjects can be overrepresented and therefore introduce biases in the results [101] because of differences in Flickr usage across regions. Despite this limitation, this paper assumes that visitor behavior can be considered independent of cultural background, but corrective measures such as stratified sampling should be implemented otherwise in future research.
Another limitation are the privacy concerns; while the research preserved the privacy of all users at all times, it was obvious that the possibility of tracking the behavior of a single individual was trivial with the publicly available data collected. The increasing sensitivity of users on this topic can prevent bona fide further research if these data becomes unavailable.
The secondary research question focused on the data processing required to graphically summarize the unstructured data collected from social media into informative representations adapted to the requirements of the practice of urban planners and designers. The research indicates that there is not a single strategy that can be applied in all situations, but on the contrary the scale of analysis and intended application dictate the most effective visualization approach. For this reason, the paper presents different methods to address the issues identified in this case study.
Despite the contributions presented to the sustainable tourism management field, there are many open avenues of investigation that can be pursued in further research, such as the application to other cases of study or the inclusion of the temporal variable in addition to the spatial location, which would result in significant contributions to the field and provide a more comprehensive understanding of the visitor behavior in urban settings. Informed Consent Statement: Informed consent was waived because data was publicly available, and the anonymity of subjects was preserved at all times. Data Availability Statement: Data not available due to privacy restrictions.