Combining Social Media and Mobile Positioning Data in the Analysis of Tourist Flows: A Case Study from Szeged, Hungary

: Despite the growing importance of mobile tracking technology in urban planning and trafﬁc forecasting, its utilization in the understanding of the basic laws governing tourist ﬂows remains limited. Knowledge regarding the motivations and spatial behavior of tourists has great potential in sustainable tourism studies. In this paper, we combine social media (Twitter) and mobile positioning data (MPD) in the analysis of international tourism ﬂows in Szeged, a secondary urban center in Hungary. First, the content of geotagged and non-geotagged Twitter messages referring to Szeged in a six-month period of 2018 was analyzed. In this way speciﬁc events attracting foreign tourists were identiﬁed. Then, using MPD data of foreign SIM cards, visitor peaks in the investigated period were deﬁned. With the joint application of the social media and mobile positioning analytical tools, we were able to identify those attractions (festivals, sport and cultural events, etc.) that generated signiﬁcant tourism arrivals in the city. Furthermore, using the mixed-method approach we were also able to analyze the movements of foreign visitors during one large-scale tourism event and evaluate its hinterland. Overall, this study supports the idea that social media data should be combined with other real-time data sources, such as MPD, in order to gain a more precise understanding of the behavior of tourists. The proposed analytical tool can contribute to methodological and conceptual development in the ﬁeld, and information gained by its application can positively inﬂuence not only tourism management and planning but also tourism marketing and placemaking.


Introduction
Social media data have gained increasing popularity among researchers for its use in analyzing the spatiotemporal activities of people [1,2]. Open-access geographic data provided by large online social networking systems (e.g., Twitter, Facebook, Instagram) have been proposed as proxies for providing valuable information regarding human mobility dynamics, with high resolution in both space and time [3][4][5][6]. Social media covers a wide range of topics, from something as simple as products, events, or different services to more complex issues related to culture, sports, politics, the environment, or pandemics. Such data offer policy makers a good opportunity to improve their knowledge of the local environment, and to detect people's flows and the use of services in a certain area. Using this knowledge, they are able to make better and more sustainable decisions [1,7]. In the case of tourism, the use of information and communication technologies (ICT) on both the consumer and the management sides can contribute to a more sustainable development by increasing efficiency and optimizing tourist's decisions [8,9]. By processing and analyzing real-time data, and developing and implementing appropriate answers, the tourism products of a destination can be flexibly adapted to the needs and desire of consumers [10]. Collecting and analyzing data from social media not only requires less financial resources than surveying visitors, but such data are also more accurate, and due to space-and time-stamps have less limitations in spatial and temporal analysis [11].
One of the most popular online social networking and microblogging systems is Twitter, with a global reach of 500 million tweets transmitted each day by over 340 million total users and 186 million daily active users globally (Twitter, 2020). An important feature for research purposes is that it is possible to capture a small fraction of the streams of Twitter data, subject to limitations imposed by Twitter, and most users understand that contributions to Twitter are published and can be subject to legal sanctions as well as discussion and scrutiny [12]. These features make Twitter an unusually public and readily available form of social media, especially when compared to more tightly controlled platforms, such as Facebook or Instagram. Because of the structure of its content (i.e., the messages are exclusively text strings restricted to 144 characters), it is inherently hard to interpret and analyze Twitter data. However, users can opt to geotag their tweets with their current location via GPS (Global Positioning System), which makes certain spatiotemporal analyses possible. Although, this option is often turned off by users (only 1 percent of tweets are geotagged globally according to Lansley and Longley [13]), it is still a rich source of information in tourism research. Moreover, as an alternative, there are various text mining techniques that allow a quantifiable interpretation of their content (e.g., geographical locations, events), making non-geotagged tweets useful as well.
The main objective of this paper is to present a novel methodology combining two types of tracking technologies (Twitter and mobile phones) in the analysis of international tourist flows in Szeged, a major city of Hungary. Local events are important animators of destination attractiveness and play a key role in destination branding, which is essential given the increasing global competition among touristic destinations [14]. In this study, first Twitter data were used to analyze the temporal patterns of users in order to capture specific events in the investigated period between 1 July and 31 December 2018. We assumed that text messages (tweets) posted by foreigners indicate where they actually are as well as which place they are visiting and for what purpose; we also assumed that periods of mass arrival of foreign tourists can indicate specific events. Second, mobile-cell data of foreign tourists staying in Szeged over the investigated period were analyzed with a special focus on tourist peaks generated by events. We were particularly interested in the spatial patterns and behavior of foreign tourists. Such insights can be useful for understanding the spatiotemporal processes of international tourism and can enhance sustainable tourism planning.
The rest of this paper is organized as follows. The second section discusses existing works that focus on the use of social media and mobile phone data in the study of human mobility with special attention to tourism. The third section introduces the study area and its role in the tourism sector of Hungary, and it presents the methods of data acquisition and the filtering process. A section then follows with the main findings of the research, presenting and comparing the social media and mobile-cell data of foreign tourists visiting Szeged. Finally, we present our conclusions, discuss their wider implications, highlight the limitations of our method, and explore possible future work in the field.

The Use of Social Media and Mobile Positioning Data in Tourism Research: A Literature Review
Social media enables individuals to virtually interact and share content about their whereabouts, providing a source of information for understanding and assessing human behavior in space. Such information is especially beneficial to the study of mobility patterns, because of the volume and quality of the available geotagged, real-time data [15]. Consequently, social media data gained increasing popularity in the study of tourism as an alternative data source. Understanding the spatial and temporal aspects of the travel behavior of tourists is essential to comprehending their travel activities [16]. For example, managers and practitioners seeking sustainable tourism development can especially make good use of such user-generated data in their work [11]. Social media data can support product development and destination management in tourism in many different ways [8,17]. In local and low-traffic areas, social media provides a platform for direct communication between the host and the tourist [18]. Information gathered from social media and tourism applications, can help track the movements and the consumption of tourists. Destinations and firms can use refined and accumulated data to make accurate predictions and to respond to new demands effectively [8].
Moreover, the image of a particular tourist destination is influenced by information appearing on social media (i.e., placemaking, city branding), which can even increase demand [19,20]. Targeted marketing strategies can also be effectively implemented through social media platforms, making tourism consumption more sustainable at the individual level [21]. Social media can also function as a platform for protesting against overtourism, as it allows local residents to protest against the phenomenon, as they did in Barcelona in connection with Airbnb [22]. One of the major critiques of social media data is the bias of particular online platforms regarding the age, geographical location, and social class of the users. To avoid this limitation a combination of different social media sources can be a solution, as demonstrated by Salas-Olmedo et al. [23] in their study using Twitter, TripAdvisor, Panoramio, and Foursquare data simultaneously.
Among the various social networking and microblogging systems, Twitter is especially favored in tourism research. Although there are several methodological problems regarding data acquisition, Twitter data can provide useful information about tourist flows [24]. As Guo and Chen [25] argue, a strong filtering of information from Twitter is important, because non-personal accounts, spam users, and junk tweets distort information generated by the real users. Twitter can be a reliable source also for studying the use of urban spaces by different groups of people (e.g., locals and visitors). Su et al. [26] used Weibo, a Chinese microblogging website launched by Sina Corporation, as an alternative to Twitter in 2009 to compare the spatiotemporal concentration and dispersion of day trippers and tourists from Shenzhen as well as from the rest of mainland China in Hong Kong.
Twitter is an important source of information not only in research but also in tourism management. According to Bigné et al. [27], destination marketing organizations (DMOs) that actively use Twitter to promote their own products are able to achieve higher occupancy rates because the number of retweets and replies by the users and the number of event tweets, tourist attraction tweets, and retweets by destination marketing organizations can predict the hotel occupancy rate for a given destination. In addition, Wood et al. [28] were able to estimate the number of visitors to natural areas (quantification of nature-based tourists) that are difficult to measure statistically, based on information from Flickr.
Next to social media, mobile positioning data (MPD) increasingly tends to complement conventional data sources in tourism studies and destination management, as these data are becoming crucial to how a destination utilizes its resources [9,29]. Passive mobile-cell information is generated each time a mobile phone communicates with a network (i.e., by calling, sending messages, or using cell data). Based on the location of the mobile phones, it is possible to track human movement. This information can be extended with personal features of the SIM card owners, such as ethnicity, age, and gender, making the space-time analyses of movements of different groups of people possible [30]. By anonymizing these data, researchers are able to study human mobility with higher efficiency and accuracy compared to traditional research methods, such as surveys, without having to worry about privacy infringement [31,32]. Passive MPD can be used to measure the volume (arrivals and departures) of tourists (both domestic and inbound) to separate tourist movements from the daily rhythms of local residents [33,34] as well as to analyze tourist-and trip-related characteristics (e.g., country of origin, length of stay, and travel diary). It is also possible to create models for the spatiality of users [35] and make typologies for different tourist flows [36,37]. The use of mobile phone tracking technologies in tourism studies has swiftly expanded since the turn of millennium [30,38], enabling more sophisticated analysis of the spatial behavior of tourists [39]. The combination of GPS and mobile phone-based tracking enabled researchers to study various aspects of tourism more effectively [40], such as the seasonality of foreign tourists [41], cross-border mobility [42,43], and customer loyalty to a destination [44]. Using mobile positioning data, Nilbe et al. [45] were able to detect differences in distances travelled by foreign visitors to events and regular (non-event) visitors in Estonia, while Raun et al. [40] identified tourist destinations based on visitor flows. There has been a lot of technological advancement in the field [46][47][48][49][50][51], and there are already countries in the world (Estonia and Indonesia) where MPD is used to produce national tourism statistics [9].
Despite the widespread success in predicting specific aspects of human behavior via social media data or MPD separately, little attention has been paid to the joint application of social media and mobile phone data. A combination of social media (Twitter) and mobile phone data was first applied by Botta et al. [52] for the estimation of the number of real-time visitors in a specific area. Authors, using a football stadium and an airport as case studies, presented evidence of a strong relationship between the numbers of people in restricted areas. Such estimations can not only play an important role for the avoidance of crowd disasters and the facilitation of emergency evacuations, but also offer practical value for a range of business and policy stakeholders. In a case study in the Friuli-Venezia Giulia region in Italy, Sowkhya et al. [1] used geolocated tweets and mobile-cell data to determine people presence, movements, and the number of flows among different tourist destinations, with the help of QuantumGIS. Using the mixed method, the authors were able to capture and explain the presence and the spatiotemporal flows of both foreigners and Italians, considering different weather conditions, holidays, and specific tourist events (e.g., the Friuli Doc wine and food festival). Despite these examples, the advantages of the combination of mobile positioning and social media data in studying tourism have been hardly explored.

Study Area
The study area, Szeged, is the third largest city in Hungary, located in the south-eastern part of the country close to the Serbian-Romanian international border. According to the Hungarian Central Statistical Office (KSH), the city had a total population of 164,000 in 2018. Szeged is a major administrative, cultural, and business center in Hungary; the economy of the city is dominated by tertiary activities, including tourism [53,54]. According to official statistics, the number of nights spent by tourists in the city at commercial accommodations was 442,000 in 2018, one third of which was realized by foreign tourists. Disregarding smaller spa towns, such as Hévíz, Hajdúszoboszló, and Balatonfüred, in 2018 Szeged was the third most popular tourist destination among Hungarian cities, after Budapest and Győr. The tourist attractions of the city comprise several festivals (the most important of which is the Szeged Open-Air Festival), sport and cultural events concentrating mainly in the summer season. In addition, Szeged is one of the major university and research centers in Hungary, and international conferences also contribute to a thriving tourist economy there. Yet, there is no reliable information on the spatial and temporal behavior of tourists arriving to the city and to what extent the events (festivals, fairs, conferences, etc.) organized by local stakeholders contribute to the tourism industry of Szeged. Figure 1 describes the data acquisition and processing flow developed for this study as well as the data removed in each step. The preprocessing made the two datasets (Twitter and mobile phone) reliable and useful in studying the behavior of foreign visitors and analyzing their spatial activity pattern.  Figure 1 describes the data acquisition and processing flow developed for this study as well as the data removed in each step. The preprocessing made the two datasets (Twitter and mobile phone) reliable and useful in studying the behavior of foreign visitors and analyzing their spatial activity pattern.

Twitter Data
Twitter messages (tweets) used for this study were downloaded via the Twitter API using a PHP script over six consecutive months (from 1 July 2018 to 31 December 2018). Tweets were selected according to the following two criteria: on the one hand, geotagged tweets within a 10 km buffer from the center of Szeged (geotagged and assigned to a location); and on the other hand, tweets mentioning the word "Szeged" in their text or metadata. Since the study focuses on international visitors, only non-Hungarian-language tweets were collected. As Twitter is one of the least widespread global social networking sites in Hungary, the probability of mixing Hungarian users with foreigners was very low. From the raw data, which were in JSON format, we extracted the necessary information for the analysis using a PHP script and arranged it into CSV format. These data were stored in a MySQL Server, and a significant part of the preprocessing was also performed in this environment. The dataset was filtered to eliminate tweets according to the following criteria: • users with over 100 messages within the sample (compulsive users), who would otherwise have dominated the analysis, • Hungarian-language tweets, since we were interested in the spatiotemporal mobility of foreign visitors,

Twitter Data
Twitter messages (tweets) used for this study were downloaded via the Twitter API using a PHP script over six consecutive months (from 1 July 2018 to 31 December 2018). Tweets were selected according to the following two criteria: on the one hand, geotagged tweets within a 10 km buffer from the center of Szeged (geotagged and assigned to a location); and on the other hand, tweets mentioning the word "Szeged" in their text or metadata. Since the study focuses on international visitors, only non-Hungarian-language tweets were collected. As Twitter is one of the least widespread global social networking sites in Hungary, the probability of mixing Hungarian users with foreigners was very low. From the raw data, which were in JSON format, we extracted the necessary information for the analysis using a PHP script and arranged it into CSV format. These data were stored in a MySQL Server, and a significant part of the preprocessing was also performed in this environment. The dataset was filtered to eliminate tweets according to the following criteria: • users with over 100 messages within the sample (compulsive users), who would otherwise have dominated the analysis, • Hungarian-language tweets, since we were interested in the spatiotemporal mobility of foreign visitors, • retweets, because they do not add new content to the discourse about Szeged, and • tweets from users who have posted identical messages more than three times in the data, as these were likely to be fake accounts.
Altogether 16,082 tweets from about five thousand users were collected for the research. After filtering, 3724 tweets remained. Based on these, the average number of tweets was 20 per day in the investigated six-month period. To define specific events as possible tourist attractions, we applied the threshold of 30 tweets or more per day. This threshold was selected empirically; however, we emphasize that this value may vary from city to city. Of the remaining and analyzed tweets, 313 (i.e., 8.5 percent) were geotagged. The share of geolocated tweets can be considered high [55], since normally only 1% of users opt to share their locations based on the coordinates of their devices globally [13]. Thus, our dataset was rather small-scale compared to traditional approaches [13,56] and comprised predominantly (91.5%) non-geotagged tweets; nevertheless, it seemed to be reliable and useful for the purpose of investigation. The filtered tweets were visualized in the QGIS opensource software environment.

Mobile Positioning Data
For the study, we used the depersonalized passive mobile phone data from one of the three major mobile phone operators in Hungary covering six consecutive months (from 1 July 2018 to 31 December 2018). The raw dataset contains mobile-cell information at the event level, i.e., any occasion when the mobile device connects to the network-text messages, calls, or data usage. The events are geotagged based on the position of the nearby broadcast towers and the signal intensity. Each event contains a depersonalized user ID and trajectory analysis can thus be conducted. At this point, due to the huge number of events (80 million per day), the analysis is cumbersome; hence, data reduction was needed. Consecutive events that were generated at one specific location by the same user were merged into one event. Additionally, the elapsed time between events of the same user was added to the event as a new attribute.
In this research only the events of foreign SIM cards were used. It was necessary to filter out any data errors. Thus for those events that were generated during user movement, we assigned a speed value calculated from the distance between the previous and next event and the time elapsed between them. If this speed value exceeded 300 km/h between two places with distance exceeding 10 km, we deleted these events, since they are most likely data errors. In addition, only those users for whom we had sufficient information were retained. Therefore, the users with less than 5 events in total and an event density below 2.5 were filtered out (see Figure 1). Furthermore, additional filtering methods were used to narrow down the dataset for users that arrived possibly for touristic purposes. During the filtering, only those users were kept who did not spend more than 25 days in Hungary (i.e., the difference between the first and last events did not exceed 25 days). In this way, events by foreigners who stayed in Hungary for a longer period (e.g., Erasmus students) or long-distance drivers returning regularly were excluded. There is no consensus in the literature on the maximum length of stay that distinguishes tourists from other (nontourist) visitors. Depending on the purpose of the research, it can be 14 [57] or even 20 days [50]. In this research we applied the 25-day limit as maximum length of stay in Hungary to identify foreign tourists. At the other end of the scale, one-day tourists were also considered in the research [33], as local events may attract visitors from other destinations in the country (e.g., Budapest). As Lamp et al. [58] demonstrated, it is difficult to separate short-term visitors from transit travelers. Indeed, in the case of Szeged, two nearby motorways leading to Serbia and Romania as main transit corridors concentrated a lot of foreign SIM-card events. Therefore, users who crossed the administrative boundaries but did not have an event within the built-up area of Szeged were disregarded. With this single procedure, 60 percent of users were filtered out from the dataset. Altogether 417,387 events from about 176,000 foreign SIM cards were considered first for the study, which decreased to 358,126 events from 28,000 users after the filtering process.

Analysis of Twitter Data
In the first phase of research the temporal distribution of the tweets was analyzed. Based on the described methodology, altogether 23 days could be identified in the period when the number of tweets exceeded 30, of which five were omitted from the study because Sustainability 2021, 13, 2926 7 of 15 they were dominated by one person (a compulsive user) who generated a lot of tweets without any focus on a tourist attraction (Figure 2). From the remaining 18 days, two were adjacent (7-8 July) and the contents of the tweets referred to the same event. These two days were merged, and 16-17 July and 19-21 July were merged for the same reasons. Thus, through the applied classification we were able to identify 13 distinctive tweeting peaks that were considered in the next stages of research.
the study, which decreased to 358,126 events from 28,000 users after the filtering process.

Analysis of Twitter Data
In the first phase of research the temporal distribution of the tweets was analyzed. Based on the described methodology, altogether 23 days could be identified in the period when the number of tweets exceeded 30, of which five were omitted from the study because they were dominated by one person (a compulsive user) who generated a lot of tweets without any focus on a tourist attraction (Figure 2). From the remaining 18 days, two were adjacent (7-8 July) and the contents of the tweets referred to the same event. These two days were merged, and 16-17 July and 19-21 July were merged for the same reasons. Thus, through the applied classification we were able to identify 13 distinctive tweeting peaks that were considered in the next stages of research. By analyzing the content of tweets and checking the events calendar of Szeged, we were able to identify the main reasons behind the spikes of tweets. Surprisingly, out of the 13 spikes, 12 could be related to sport events and there was only 1 case for which we could not find a dominant topic. Moreover, out of the twelve sport events ten could be associated with handball. The local handball team Pick Szeged played several (partly Champions League) games in the half-year period; some of them took place at home, but some of them took place away. Obviously, handball games that took place away did not generate tourist arrivals in the city; therefore, we omitted them from further investigation ( Table  1).  By analyzing the content of tweets and checking the events calendar of Szeged, we were able to identify the main reasons behind the spikes of tweets. Surprisingly, out of the 13 spikes, 12 could be related to sport events and there was only 1 case for which we could not find a dominant topic. Moreover, out of the twelve sport events ten could be associated with handball. The local handball team Pick Szeged played several (partly Champions League) games in the half-year period; some of them took place at home, but some of them took place away. Obviously, handball games that took place away did not generate tourist arrivals in the city; therefore, we omitted them from further investigation (Table 1). Altogether, seven events with intense tweeting by foreigners could be identified in the studied period. Next to the five handball games there were two major events, the Rugby Europe Women's 7s Trophy with 12 national teams from different countries including Hungary, that was held 7-8 July 2018; and the 11th International Dragon Boat Federation Club Crew World Championships, when 6200 paddlers from 140 clubs from 28 different countries competed over six days, between 16-21 July 2018. Both events attracted not only players and paddlers from different countries but also managers, family members, fans, and spectators.

Analysis of Mobile-Cell Data
The daily amount of cellular network data of foreign mobile phones in Szeged shows that the number of foreign tourists was significantly higher in the summer months (July-August), with a secondary peak in December adjoining the Christmas holidays. This is in accordance with the seasonality of tourism in Hungary in general [59]. As a second step of research, we compared peak periods of tweeting with mobile-cell data of foreigners in Szeged. In addition to the seven sport events found on the basis of tweets, we were able to recognize eleven other spikes when the density of foreign SIM cards was above average (Figure 3). Again, we thoroughly checked the list of local events and we were able to specify four distinct festivals, cultural and youth programs that provided possible major attractions for foreign visitors (Table 2). the studied period. Next to the five handball games there were two major events, the Rugby Europe Women's 7s Trophy with 12 national teams from different countries including Hungary, that was held 7-8 July 2018; and the 11th International Dragon Boat Federation Club Crew World Championships, when 6200 paddlers from 140 clubs from 28 different countries competed over six days, between 16-21 July 2018. Both events attracted not only players and paddlers from different countries but also managers, family members, fans, and spectators.

Analysis of Mobile-Cell Data
The daily amount of cellular network data of foreign mobile phones in Szeged shows that the number of foreign tourists was significantly higher in the summer months (July-August), with a secondary peak in December adjoining the Christmas holidays. This is in accordance with the seasonality of tourism in Hungary in general [59]. As a second step of research, we compared peak periods of tweeting with mobile-cell data of foreigners in Szeged. In addition to the seven sport events found on the basis of tweets, we were able to recognize eleven other spikes when the density of foreign SIM cards was above average (Figure 3.). Again, we thoroughly checked the list of local events and we were able to specify four distinct festivals, cultural and youth programs that provided possible major attractions for foreign visitors ( Table 2).  3-4 November no specific event 6. 9-12 November Music festival (Jazz Days of Szeged) 7.
26-30 December no specific event Between 28 July and 5 August, the summer school of Hungarology was held for foreign students interested in studying the Hungarian language. At the beginning of the academic year (7-8 September), the Welcome Camp of Freshmen was organized by the university with concerts and other cultural events that attracted foreign visitors (e.g., friends or relatives of foreign students studying in Szeged). The weekend covering 24-28 October 2018 was the peak of the annual cultural festival in the city, with international programs like the Bulgarian folk music festival. The Jazz Festival of Szeged was held between 9-12 November 2018, which also attracted many foreigners. These cultural events could hardly be found in the tweets. The peaks of cellular network-based data of foreign mobile phones that could not be associated with any event normally fell on weekends; thus, these data can be related to ordinary tourists. The spikes of foreign SIM cards in December could also be attributed to the famous Christmas market of the city and the holiday itself, as the end of the year season is traditionally a secondary peak in the annual tourist turnover [59].

Tracking the Spatial Movements of Tourists
To extend our results on tourist behavior, we considered a case study of a specific event for which relevant data existed: the 11th International Dragon Boat Federation Club Crew World Championships. Szeged is known as the cradle of paddling sports, with its international standard course, the National Kayak-Canoe and Rowing Olympic Centre at Maty-er, which is located outside the compact city at the western fringe of Szeged. In this study, the passive mobile positioning data of international tourists (i.e., non-Hungarian SIM cards) staying in Szeged during the event were analyzed. Since the location of the dragon boat race (Maty-er) is outside Szeged, approximately 7 km from the main square, we were able to identify those foreign mobile phone users who stayed at the location of the race. As Table 3 shows, their share reached ca. 60 percent of all foreign SIM cards in the period of the event. Knowing how foreign tourists behave in space and what objects they visit while staying in the city is important for tourism planning. Therefore, we analyzed the movements of those SIM cards that were captured at the location of the dragon boat race (i.e., Maty-er). Altogether, we were able to identify 626 foreign visitors in our dataset during the race (i.e., 10 percent of the registered participants). The greatest numbers arrived from Germany, Canada, USA, Czechia, and Poland. As mobile phone providers and their international roaming partners significantly differ country by country, nationality as an aspect cannot be considered in the analysis. Figure 4 shows the spatial concentrations of identified foreign visitors inside Szeged.
The spatial activity of foreign visitors attending the dragon boat race shows a clear pattern within the city. Next to the Kayak-Canoe and Rowing Olympic Centre at Maty-er, they visited the downtown area where most of the tourist attractions and the bulk of the hospitality industry (restaurants, bars, cafes, etc.) of the city are located. Additionally, some parts of the race were held on the Tisza river near the city-center, attracting many visitors. Next to the downtown, the Napfényfürdő Aquapolis spa and wellness center in Újszeged and the Zoo embedded in large green areas [60] were frequented by foreign tourists during the period of the dragon boat race. These two locations can be perceived as additional tourist attractions, next to the paddling race.
We were also able to capture the movements of tourists in the surrounding of the city ( Figure 5). The activity pattern of foreigners is spatially more dispersed and forms several hotspots. Among them, the most important are the border crossing to Serbia (Röszke) and the transit motorway, where some of the foreigners entered or left the country.
Hódmezővásárhely is a medium-sized city nearby Szeged with local tourist attractions. There are two thermal spas in Mórahalom and Makó, and there is a famous fish restaurant at the northern edge of Szeged, near Fehértó lake. These locations provided leisure opportunities for tourists while they were visiting for the dragon boat race.
Sustainability 2021, 13, x FOR PEER REVIEW 10 of 15 many, Canada, USA, Czechia, and Poland. As mobile phone providers and their international roaming partners significantly differ country by country, nationality as an aspect cannot be considered in the analysis. Figure 4 shows the spatial concentrations of identified foreign visitors inside Szeged. The spatial activity of foreign visitors attending the dragon boat race shows a clear pattern within the city. Next to the Kayak-Canoe and Rowing Olympic Centre at Maty-er, they visited the downtown area where most of the tourist attractions and the bulk of the hospitality industry (restaurants, bars, cafes, etc.) of the city are located. Additionally, some parts of the race were held on the Tisza river near the city-center, attracting many visitors. Next to the downtown, the Napfényfürdő Aquapolis spa and wellness center in Újszeged and the Zoo embedded in large green areas [60] were frequented by foreign tourists during the period of the dragon boat race. These two locations can be perceived as additional tourist attractions, next to the paddling race.
We were also able to capture the movements of tourists in the surrounding of the city ( Figure 5). The activity pattern of foreigners is spatially more dispersed and forms several hotspots. Among them, the most important are the border crossing to Serbia (Röszke) and the transit motorway, where some of the foreigners entered or left the country. Hódmezővásárhely is a medium-sized city nearby Szeged with local tourist attractions. There are two thermal spas in Mórahalom and Makó, and there is a famous fish restaurant at the northern edge of Szeged, near Fehértó lake. These locations provided leisure opportunities for tourists while they were visiting for the dragon boat race.

Discussion and Conclusions
Big-data approaches provide new opportunities for sustainable tourism studies and open up new ways of understanding key aspects of tourists' behavior. Methods of traditional travel diary construction rely on surveying or interviewing, which are not only

Discussion and Conclusions
Big-data approaches provide new opportunities for sustainable tourism studies and open up new ways of understanding key aspects of tourists' behavior. Methods of traditional travel diary construction rely on surveying or interviewing, which are not only time-consuming and costly but also limited in scale and number of samples [16]. However, the continuous collection of digital data with fine spatial and temporal resolution opens new opportunities to capture reliable information on the spatiotemporal dynamics of tourism.
The main aim of this study was to combine social media and mobile positioning data for the analysis of tourist flows. Such an approach can contribute not only to methodological but also conceptual developments in the discipline. To the best of our knowledge, except for the pioneering experiment of Sowkhya et al. [1], no attempt in tourism literature has been made toward the simultaneous utilization of the data available on social media platforms and the data stored by mobile phone operators, at least not at such fine geographical resolution.
Our results show that data retrieved from social media (Twitter) and mobile phone networks may allow us to gain fine-grained information on the behavior of tourists, their motivations, the purpose of their stay, the places they visit, and the role of tourist events. With the joint application of the social media and mobile positioning analytical tools, we could identify and geographically locate attractions (festivals, sport and cultural events, etc.) that generated international tourism arrivals in the case-study city, Szeged. Our findings offer several opportunities for decision makers, planners, and other stakeholders to expand the tourism potential of a city, forecast travel demands, develop synergies among destinations, combat seasonality, and improve the utilization of tourism resources in city regions.
First of all, tourism management organizations and tourism planners can utilize the knowledge on the spatial and temporal behavior of tourists to appropriately plan and manage tourist flows. The proposed approach can help practitioners better understand the spatiotemporal features of urban tourism and formulate a more reasoned tourism planning policy. The digital movement tracking of visitors offers possibilities to study the routes of tourists inside the city and its surroundings, to identify places that they visit, to delimit tourist hotspots in the city and its hinterland, and to examine the role of different attractions. This knowledge can help local stakeholders organize specific tourist events outside the peak season and better regulate tourist flows both spatially and temporally to mitigate possible conflicts related to tourism (e.g., crowding in public spaces, noise, and overtourism at certain locations) [61][62][63][64]. Festivals and events may lead to crowding in specific areas [65]; therefore, such knowledge is important not only for the planning of tourism development but also for sustainable infrastructure and transport planning and environmental management in city regions.
In addition, planners can trace the perception of urban spaces by various groups of tourists, and they can identify positive images attached to their city, which is very useful for tourism marketing, placemaking, and place branding [47,66]. According to Lew [67], the planner's role as place-maker and the local's role as place-maker are well understood; however, the tourist's role in place-making, and placemaking itself, is more ambiguous. Sharing images [68] and stories through social media [69] means that tourists not only consume place but also actively contribute to placemaking. The construction of positive and charming images [70] is a fundamental tool for attracting global flows of tourists. Such images are important because they help other people to make generalizations and they influence intentions and decisions regarding holiday destinations. The proposed mixed use of social media and mobile positioning data can contribute to better understanding of the tourist phenomena in placemaking and can support place-branding activities to enhance the competitiveness of cities.
The lessons learnt and conclusions drawn from this study clearly reach beyond the town of Szeged. The method outlined may be easily applied in other cities to analyze their touristic potentials, to trace the spatial behavior of tourists with special attention to local attractions, and to involve surrounding municipalities in tourism development. The results also have some implications for peripheral rural municipalities to strengthen regional collaboration, to intensify their economic growth through tourism development, and to create synergies with attractions organized by cities in the region [71]. Moreover, such information can also help tourism planning to define new event-related Tourism Products (4A) inside cities and in their surroundings.
Despite its novelty and usefulness, our methodological approach has certain limitations that should be tackled in future research. As was demonstrated, information derived from Twitter messages enabled us to identify only certain events (predominantly sport events). This is probably linked to the demographic and socioeconomic background of the users. In addition, social media platforms have different levels of penetration across countries. Therefore, the use of a single social media website may lead to bias in the sampling and incomplete conclusions [72]. To get more detailed and balanced information on tourist motivations and the role of attractions, a mix of different social media sources (Twitter, Instagram, Facebook, TripAdvisor, etc.) should be applied. By mixing different sites, a more balanced mix of social media users can be achieved. Next to data collection, another important question relates to data processing and filtering. As it has been demonstrated, with a strict and well-defined preprocessing of data (i.e., the removal of retweets and junk tweets, or mobile cell data of non-tourist visitors) the accuracy of the whole dataset can be improved. On the basis of our research findings, we argue that combined big-data approaches with careful preprocessing of data can offer useful information about the background effects of tourism and the factors influencing tourist flows [73].