Using Social Media in Tourist Sentiment Analysis: A Case Study of Andalusia during the Covid-19 Pandemic

This paper explores the role of social media in tourist sentiment analysis. To do this, it describes previous studies that have carried out tourist sentiment analysis using social media data, before analyzing changes in tourists’ sentiments and behaviors during the COVID-19 pandemic. In the case study, which focuses on Andalusia, the changes experienced by the tourism sector in the southern Spanish region as a result of the COVID-19 pandemic are assessed using the Andalusian Tourism Situation Survey (ECTA). This information is then compared with data obtained from a sentiment analysis based on the social network Twitter. On the basis of this comparative analysis, the paper concludes that it is possible to identify and classify tourists’ perceptions using sentiment analysis on a mass scale with the help of statistical software (RStudio and Knime). The sentiment analysis using Twitter data correlates with and is supplemented by information from the ECTA survey, with both analyses showing that tourists placed greater value on safety and preferred to travel individually to nearby, less crowded destinations since the pandemic began. Of the two analytical tools, sentiment analysis can be carried out on social media on a continuous basis and offers cost savings.


Introduction
The use of information and communications technology (ICT) in tourism destination management has become an essential strategy in ensuring the sustainability of these destinations, making them more competitive and facilitating their long-term survival [1]. Tourism evolves quickly and constantly, throwing up new challenges that must be addressed by implementing sustainable tourism models and new ways of doing business [2].
In today's information society, tourists are more experienced, have greater access to information, and hold greater negotiating power through the use of the latest ICT [3]. As a result, competition is growing in the tourism sector, and effective use and management of ICT are very important for tourist destinations seeking to develop sustainable forms of tourism.
In this context, big data, the internet, and social media have changed the way we travel, influencing both the pre-travel phase, when we begin to think about travelling somewhere; the travel phase itself; and the post-travel phase, when we share our experiences [3].
These factors all help to influence decision-making among tourists as they plan their trips [3].
For those looking for leisure, entertainment, new destinations, and adventures, social media is the most common source of information for obtaining inspiration or opinions from other users [4].
These sources of information are also becoming key marketing resources for companies in the sector [5]. Marketing departments seek to exploit social media using natural language processing and machine learning techniques to segment campaigns, retain customers, and identify trends, among other activities. These techniques have proliferated as a result of easier access to the big data technologies used in most smart data analysis [6].
These changes make it especially important to analyse ICT management in general and social media in particular to facilitate sustainable development in tourist destinations.
Against this backdrop, this study seeks to demonstrate the potential contribution of social media in analyses of tourist behaviour and sentiments by tourist destinations. To do this, it uses social media to analyse changes in sentiments and behaviours among tourists in Andalusia (Spain). This overall objective is broken down into the following theoretical and empirical objectives.
The first theoretical objective is to analyse recent research on tourism and sentiment analysis using social media and to classify the main themes investigated. The second theoretical objective is to describe changes in tourist behaviour as a result of the COVID-19 pandemic according to research carried out on the subject.
Meanwhile, the empirical objective is to analyse changes in sentiments and behaviour among tourists visiting Andalusia in summer 2020 compared to summer 2019, using surveys carried out by the Andalusian regional authorities and comments made by tourists on Twitter.
Through these objectives, we aim to test the hypothesis that sentiments and behaviours among tourists visiting Andalusia changed as a result of the COVID-19 pandemic through surveys and sentiment analysis of the social network Twitter. This analysis will demonstrate the potential for tourist destinations to use social media as a way of detecting tourist behaviours and sentiments and any short-term changes in them.
In order to fulfil these objectives, the study is divided into the following sections: following this introduction, the second section examines other studies which have used social media to perform tourist sentiment analysis and describes changes to tourism and tourists' behaviours and sentiments in the COVID-19 era. The third section presents the case study, which focuses on analysing changes in tourist behaviour in Andalusia before and during the COVID-19 pandemic via statistical analysis and a sentiment analysis using Twitter data. The fourth and final section sets out the study's main conclusions, identifies its limitations, and makes several recommendations for the public authorities.

Use of Big Data in Tourism Destination Management
The tourism industry relies intensively on large amounts of information, and information and communications technology (ICT) is therefore of great importance to the sector from the perspective of both consumers and providers [7].
According to [8], there are three phases in the development of internet use in the tourism sector. The first phase took place in the 1990s, when the internet was used as a communication tool, and destination management organisations (DMOs) became information brokers. In the second phase, spanning the period from 2000-2010, the internet came to be used more for marketing than for communication. At this time, e-commerce was beginning to take off, and there was a demand for more personalised, aggregated experiences, giving rise to a new type of consumer [9]. Meanwhile, the third phase since 2010 has seen progress in areas such as search engines, social media, the internet of things, data analysis, and mobile technology. During this period, the concept of 'smart tourism' emerged to describe "the increasing reliance of tourism destinations, their industries, and their tourists on emerging forms of ICT that allow for massive amounts of data to be transformed into value propositions" [10].
Reference [11] reports the three levels of application for the concept of 'smart tourism' that correspond to tourist experience, business, and destination: smart experience, smart business ecosystem, and smart destination. At each level, big data is captured, exchanged, and processed.
The concept of the 'smart tourism ecosystem' is part of a systemic approach and refers to "tourist systems that take advantage of smart technology to create, manage, and deliver smart tourist services/experiences, which are characterised by an intense exchange of information and co-creation of value" [4,12].
The spread of information technologies throughout the travel cycle and the digital records derived from them offer a new and highly valuable source of data. This represents an opportunity in view of the sparse information available locally due to shortcomings in statistical systems for assessing movements without overnight stays [13].
Big data is the cornerstone of smart tourism destinations. Destination intelligence is powered by a smart information system allowing data to be collected, processed, and analysed to supply the necessary information at the appropriate time to the people who need the data to make informed decisions [14].
An innovative use of data offers added value by revealing connections that were previously undetectable, giving rise to a new debate around the nature of decision-making [11]. Data enable efficient management, greater transparency, and enhanced knowledge [4]. Smart data thus represent an extremely useful tool for boosting tourism competitiveness and sustainability [15]. Indeed, big data offers a more holistic, insightful overview of tourist activity, giving actors in the tourism industry the opportunity to streamline procedures, drive innovation, and deliver improved experiences [16].

Tourist Sentiment Analysis
The term 'social media' covers a variety of online platforms allowing users to create and share content and interact socially [17]. Several different categories of social media may be identified: social networks (Facebook and LinkedIn), blogs (Blogger and Wordpress), microblogs (Twitter and Tumblr), social news sites (Digg and Reddit), bookmarking sites (Delicious and StumbleUpon), shared media (Instagram and YouTube), question and answer sites (Yahoo! Answers and Ask.com (accessed on24 February 2021)), review sites (Yelp and TripAdvisor), and sites with mobile apps such as 'Find My Friend' [18].
Social media is currently one of the fastest growing marketing channels [10]. Usergenerated content (opinions, images, videos, etc.) and interactions between users (people, organisations and products) are the two types of information available on social media, offering large volumes of unstructured, dynamic content. This content can be analysed to generate knowledge.
According to data from TripAdvisor (2016), 77% of travellers check the comments left by former guests at the hotels they are thinking of staying at before booking [19]. Travellers have become the greatest influencers, as social media allows consumers to obtain first-hand information on the quality and prices of hotels at a click.
Sentiment analysis or opinion mining analyses people's utterances, including opinions, feelings, evaluations, attitudes, emotions, and appraisals of products, services, organisations, individuals, subjects, events, and their attributes. The emergence and rapid growth of this field of study coincided with the boom in online social media; for the first time in history, a large volume of digitally recorded data and opinions is available [20].
With a summary of opinions, consumers can share their perceptions of certain products or experiences with potential tourists planning to purchase them. On the other hand, companies can identify the most popular and unpopular features of their products among consumers [19].
In short, consumers are no longer obliged to ask their relatives or neighbours if they are thinking about purchasing a product as they can obtain evaluations and reviews online and on social media [21]. At the same time, organisations and tourist destinations no longer need to conduct surveys or questionnaires, which take longer and require more resources [22].
However, the large number of websites and volumes of content generated demand the use of automated systems to collect and analyse the information available online [23]. Technorati estimates that 75,000 new blogs with 1.2 million posts are created each day, many of which share opinions on products and services; 60% of consumers in the USA have researched products online [24].
Digital media represents a kind of infostructure for the tourism industry, within which social media acts as a producer and distributor of active tourist information [25,26].
Most studies using mass data from social networks have focused on Twitter [27] due to the global nature of the platform and the fact that the data generated in the form of tweets are available for free in real time. Each geolocalised tweet leaves a digital footprint of the time and place when it was sent [28]. If the data are processed by user name, it is possible to draw up a space-time profile for each user showing the places they have visited at different times. Social media activity can thus be used to analyse changing population densities in a city throughout the day [29], as well as mobility patterns among the population [30].
It is also possible to use geolocalised tweets to analyse the degree of social mixing in the use of space, tracking the movements of social groups in highly segregated cities such as Rio de Janeiro [31,32].
Unlike the information supplied by official sources offering data by place of residence, the indicators of multiculturalism and mixing analysed in these studies using big data refer to the use of space throughout the day. For example, studies have examined linguistic diversity in cities and regions based on the languages used in tweets as an indicator of cultural diversity [33]. In the field of tourism studies, very few studies have used geolocalised tweets; those that have focus on comparing tourists' spatial behaviour at the national or global scale [34][35][36], but not at the intraurban scale.
Other studies such as [37] analyse the way in which potential tourists used social media to make travel decisions during the Zika pandemic in the context of widespread disinformation where the authorities failed to provide sufficient information regarding tourism.
The work of [38] uses an automated process to analyse the cognitive, affective, and conative components of perceptions of the Basque Country as a tourist destination based on posts in the travel community www.minube.com. It concludes that the region's natural and cultural resources have the greatest influence on its image as a tourist destination [39].
Reference [40] adopted a methodology in which information was automatically downloaded and processed, and the content of 85,000 reviews by tourists who visited Catalonia between 2004 and 2013 from four different travel websites (TripAdvisor, Trav-elBlog, VirtualTourist and TravelPod) was analysed [41]. The authors used a combination of online resources and open access software, concluding that this methodology can be used for different locations, languages, and topics. The study provides relevant information for destination management offices, allowing them to identify their brands' positioning through sentiments and opinions posted by tourists on travel blogs [42].
Reference [43] tested the Destination Management Information System (DMIS) in Å re, Sweden, applying a business intelligence approach to organisational learning in tourist destinations. The system provides real-time information about indicators in three different areas: economic performance, with data on occupancy, price, stays, bookings, and sales; consumer behaviour, with data on consumer profiles, web browsing, and the purchase process; and brand management, analysing loyalty, value, satisfaction, and brand awareness [44].
A gradual evolution may be observed in the studies of tourist destinations based on content analysis and social media carried out by [45][46][47][48][49], which, although they fulfil their objective of analysing perceptions of the different components of a tourist destination's image, remain rather "homespun" [50] in terms of the methods used to capture, clean, and process data. The work of [51][52][53] is qualitatively different, using automated processes to extract, clean, process, and analyse data.
In order to expand upon and update this body of literature, a bibliographic analysis was carried out on conceptual and empirical studies analysing tourist behaviour via social media published in the last two years (2019-2020) on the Web of Science, with a view to identify their contributions and analytical procedures [54]. The studies covered by this bibliographic analysis are summarised in Table 1. Table 1. Studies using sentiment analysis in tourism research (2019-2020).

Authors
Title Objective Methodology [42] Exploring best practices for online engagement via Facebook with local destination management organisations (DMOs) in Europe: a longitudinal analysis To supply evidence of a positive trend in online engagement among tourists.

(Tourists)
The study is based on the use of Facebook pages by the destination management organisations in question. [9] City characteristics that attract AirBnB travellers: evidence from Europe To determine the characteristics prioritised by customers and draw up a typology of cities from the traveller's perspective. (Tourists) Data collection and most of the analysis were carried out using R, a very flexible method and trend programme offering specific packages for data capture and mining. [39] What do people think about this monument? Understanding negative reviews via deep learning, clustering, and descriptive rules To collate negative opinions about three cultural monuments to detect the characteristics in need of improvement.
A deep learning method based on a CNN and SD methods for aggregating information was used. [44] Business information architecture for successful project implementation based on sentiment analysis in the tourist sector To provide an architecture of principles, a strategy to meet the needs of tourism companies in the Peruvian market, so that when problems arise in tourism management processes, there are good practices available to improve these processes and develop technological solutions. (Methodological) Due to the current situation of tourism companies and the use of cloud services, Google and services such as Cloud Data Store API and Machine Learning are used as a case study due to the need for a platform for developing the solution. [18] A machine learning approach for the identification of deceptive reviews in the hospitality sector using unique attributes and sentiment orientation To identify differences and characteristics allowing deceptive and truthful reviews to be successfully classified using a textbased machine learning approach. (Methodological) A text-based machine learning approach provides an automatic tool capable of processing a large volume of reviews. [28] Design and validation of annotation schemas for aspectbased sentiment analysis in the tourism sector To compile a bilingual corpus (Spanish-English) of user opinions in the Andalusian tourism sector, provisionally entitled SentiTur.
Tourist destinations were downloaded from the TripAdvisor website using a custom scraper built from the infrastructure provided by the Scrapy tool. [12] Inconsistencies in TripAdvisor reviews: a unified index between users and sentiment analysis methods The study analyses opinions in six reviews of Italian and Spanish monuments and detects inconsistencies between sentiment analysis methods and user polarity methods that automatically extract polarities.
TripAdvisor is used as a data source. Results showing inconsistencies between polarities are presented, before the Polarity Aggregation Model is proposed to address this issue, and its outcomes are assessed using an aspect extraction approach. [21] Semantic analysis in social media for digital tourism communication To establish a methodology for ascertaining whether consumer opinion has a positive or negative effect on recommending tourism services and attracting customers.
The quantitative part of the study consists of quantifying and comparing data from tourism companies in terms of numbers of followers, likes, comments, and shares on the social network Facebook. [41] A proposal for sentiment analysis on Twitter for tourism-based applications To create a structure based on independent, interchangeable components to allow research to be conducted in a more uniform, open, and transparent manner. (Methodological) The study focuses on comments about hotels, proposing a platform that classifies tweets as positive, negative or neutral based on the author's opinion. [23] Using deep learning to predict sentiments: a case study in tourism To use different deep learning techniques and architectures to address the issue of classifying comments posted by tourists online, which are used by other tourists to inform their decisionmaking.
To extract the information, scripts were developed in Python based on the Scrapy framework, and information from reviews of hotels on the island of Tenerife in English was extracted from the websites.
Source: elaborated by authors. Table 1 shows that most recent studies performing sentiment analysis using social media in the field of tourism studies have focused on the social networks Facebook, TripAdvisor, and Twitter.
The most relevant themes identified in these studies were sentiment analysis, identification of tourist sites based on digital impressions using social media, tourist preferences harvested from social media, social media communication strategies, use of geographical labels, web platforms as a communication tool, tourism recommendation systems, cultural exposure to a foreign city through the media in particular, definition of smart tourism, and current trends.
Another theme emerging from the studies analysed was the need to detect messages with the greatest influence on purchase behaviours and contradictory messages, as well as to conduct comparative analysis of different methodologies to observe the existence of contradictory messages when different analytical methods were applied.
On the other hand, as Table 1 shows, many studies on sentiment analysis in the tourism sector are primarily methodological, and their objectives focus on data processing using different techniques, frameworks, methods, etc., for the following purposes: to detect contradictory messages, to classify different types of messages, to conduct research in a uniform, open manner, and to create a methodology for classifying the positive or negative impact of tourists' opinions on other tourists' decision-making.
Analysis of tourists' sentiments and opinions to identify the characteristics of destinations, resources, services, etc., that are most important to them and enable improvements to tourism management is another, less studied theme.
This study will therefore focus on the latter, aiming to analyse the behaviours and sentiments of tourists travelling in Andalusia using the social network Twitter and to identify differences between 2019 and 2020 due to the COVID-19 pandemic. To measure these emotions, machine learning algorithms will be used to automatically extract the sentiments expressed by tourists.
However, before moving on to the sentiment analysis, the changes in the performance of the tourism sector in general and in tourist demand during the COVID-19 pandemic in particular will be described, as well as the impact of the pandemic on tourism in Andalusia.

Tourist Behaviour during the COVID-19 Era
According to the World Tourism Organisation (UNWTO), the COVID-19 pandemic has had a huge impact on the global economy, with tourism among the worst-affected sectors. The travel statistics survey showed a reduction of 49.6% in March 2020 compared to the same month the previous year. The total number of foreign visitors declined particularly dramatically, with an 85.9% drop in inbound tourism [2].
It is hoped that the tourism sector can recover and overcome these challenges, adopting tourism development strategies that encompass economic, social, and environmental aspects and encourage more sustainable tourism in the future. It is also important to understand the positive environmental impact caused by the pandemic in a relatively short space of time [2].
As previous studies have shown, tourist demand is highly sensitive to any type of risk [43]. Faced with even a minor risk, potential tourists change destination or modify their travel plans [55].
The SARS virus that emerged in China in 2002-2003 and quickly spread around the world [56] led to warnings not to travel to certain Asian countries on health grounds. This led to the loss of thousands of jobs in the Asian tourism sector [57]. A number of studies have analysed the impact of the epidemic on the sector [58].
The H1N1 bird flu that broke out in 2009 had a significant impact on international tourism in 2010. Tourist demand decreased across all continents, with the exception of Africa and South America. Lee, Y. et al. showed that tourism declined the most in the five countries whose governments adopted the most restrictive measures to stop the virus from spreading, including quarantining patients, closing schools, cancelling public events, and controlling international borders.
Niewiadomski, P. analysed the role of the tourism industry in response to the SARS (2003) and H1N1 (2009) health crises. The MERS-CoV virus that emerged in Saudi Arabia in 2012 also had an impact on the tourism sector. The disease was particularly widespread in South Korea, which experienced a dramatic drop in tourist demand [59], especially from China, the main country of origin of tourists to South Korea [60].
The studies cited here all analysed the impact of pandemics on the tourism industry. However, studies on crisis communication management are few and far between [61]. Some studies have analysed crisis communication by public institutions and governmen-tal bodies during health crises, highlighting the best practices adopted in these cases, although they focus on communication relating to pandemics and public health rather than the tourism sector. Ritchie, Brent W. studied the British tourist board's crisis communication management following the 2001 health crisis.
With regard to the tourism crisis caused by the COVID-19 pandemic, there appears to be no doubt that tourists will return, as holidays are an essential expenditure for many families. The tourists travelling after the pandemic will no longer be the same, however. According to a survey carried out by Ernesto C., et al. 80% of people surveyed in April last year expressed a desire to travel. The main criteria in the choice of destination were low numbers of people, the characteristics of the destination, and the public health measures in place. Price was the fourth most important consideration. Despite this, none of the respondents said that they would travel with organised groups, and 77% said that they would travel within Spain [62].
The July 2020 report 'Tourism After COVID-19: Reflections, Challenges and Opportunities' suggests a change in tourists' behaviour following the COVID-19 pandemic: demand for less crowded destinations will grow, people will travel individually rather than in groups, demand for tourist products allowing flexible cancellation will rise, demand for hygiene and social distancing measures will grow, demand for better travel insurance covering pandemics will increase, people will eat in their accommodation instead of going to restaurants, and demand for outdoor activities will rise.
In short, an analysis of studies and reports on tourism during the COVID-19 pandemic reveals that tourists' habits, behaviours, and sentiments have changed substantially, and many of these changes will persist into the future. This may have a significant impact on the restructuring of the sector.

Tourist Behaviour in Andalusia before and during the COVID-19 Pandemic According to Survey Data
According to data from the Andalusian Tourism Situation Survey [63], tourism declined by 47.5% in the third quarter of 2020 compared to the same period in 2019, falling from 11,425,437 tourists to 6,000,293.
With regard to the origin of these tourists, 45.3% were from Andalusia and 42.0% were from elsewhere in Spain. Domestic tourism thus accounted for 86% of tourism in the region, whereas this figure was only 64% prior to the COVID-19 pandemic, confirming the observation that tourists prefer to visit nearby destinations made in this study. The Andalusian provinces that suffered the lowest decline in tourism during the same quarter, below the average for the region, depend to a greater extent on domestic tourism and were less crowded: Jaén (−16.2%), Cádiz (−30.5%), Huelva (−34.0%), and Almería (44.8%). The cities of Granada and Córdoba maintained their market share from previous years, falling to around the average for Andalusia.
With regard to the tourists' ages, the pattern was as expected, with the greatest decline in travel this quarter observed among people aged older than 65. This was followed by those aged 30-44 years and those younger than 18 years, pointing to a substantial, above-average decline in trips taken by couples with underage children.
According to [64,65], the main reason for travelling remained holidays and leisure, which was cited by around 90% of tourists, while the prevalence of visits to family and friends rose to surpass other reasons.
In terms of accommodation, stays in hotels and apartment hotels fell below the average in comparison with the third quarter of 2019, followed by stays at friends' and relatives' homes, hostels, guest houses, and bed and breakfasts. To a lesser extent, stays in rented apartments, second homes, and campsites also declined.
The average stay in Andalusia dropped from 10.1 days to 9.4 days (−7%). The average daily expenditure also decreased by 7%, with expenditure rising among domestic tourists and falling among international tourists.
Finally, the qualitative scores from 0 to 10 assigned by tourists to different aspects of their experience (accommodation, food, leisure, transport, safety, service, cleanliness, etc.) stayed around the same as in the third quarter of 2019. Only the following aspects received a lower score: public transport by bus (2%), public safety (2%), and public transport by train (1%). The remaining aspects received a higher score, including transport by taxi (7%), quality of beaches (6%) and natural parks (4%), car hire services (4%), cleanliness (4%), golfing facilities (4%), and ports and nautical activities (4%).
The impact of the COVID-19 pandemic is clear in these scores. While services perceived as more unsafe (public transport) scored the lowest, individual transport, cleaning services, and certain facilities that make tourists feel safer (beaches and protected natural areas) increased their scores.
The tourist behaviour identified in this analysis of the data from the Andalusian Tourism Situation Survey is perfectly aligned with the characteristics and changes in tourist demand during the COVID-19 pandemic identified in the bibliographic analysis. These characteristics include a steep decline in tourist activity, with trips to nearby, less crowded, safer destinations using private transport and accommodation preferred over shared accommodation. In addition, tourists give a higher score to aspects relating to safety, such as cleanliness, the natural environment, and certain facilities, than to riskier, less safe aspects of the tourist experience such as public transport.

Methodological Approach
This study is based on an exploratory analysis using the statistical programming language Rstudio and the library (rtweet) package for extracting tweets. Machine learning sentiment analysis algorithms were then applied to the resulting data. This is a type of artificial intelligence that trains a virtual machine via data mining to automate data analysis procedures. Among other features, it allows tweets to be classified into positive, negative, and neutral, as shown in the results process.
The social network selected for this study was Twitter, due to its capacity to reach a wide audience and its anonymous nature, which have led to exponential growth on a global scale and have transformed the platform into an alternative source of information alongside more traditional media.
The user accounts used for the sentiment analysis were geolocated in Málaga, which is the main hub for tourists in Andalusia. Accounts within a 500 km radius of Málaga were included in an attempt to cover the whole region.
To extract the data, we connected to Twitter's open API (Application Programming Interface), allowing us to develop applications to take advantage of the information available online. In this way, it was possible to perform a search on Twitter and compile all the messages linked to certain terms, which acted as filters. These terms were 'my trip', 'my experience', 'my holidays', 'as a tourist', 'as a visitor', and 'as a traveller', using the R programming language to extract the data. Table 2 details the process: The tweets that were extracted contained comments and interactions by users from Andalusia about tourism in Andalusia. The data collection process was divided into two phases: (a) phase 1, in August, September, October, and November 2019 and (b) phase 2 in August, September, October, and November 2020.
A descriptive analysis of the tweets collected in 2019 and 2020 was then carried out. Once the data had been cleaned and filtered, they were processed using the statistical software Knime. This data mining platform facilitates the tasks of data analysis, modelling, processing, and visualisation.
On the Knime platform, modelling is carried out in process blocks which can be executed separately to reduce processing time. The model is presented in Figure 1 and contains the following phases: (a) the xls file is read; (b) in the Document Creation block, the file is converted to text; (c) the column with which it will be evaluated is selected; (d) the Text Preprocessing block cleans the data before they are divided by relevant words and classified. Once the data have been processed, they are assigned a colour for the Analyse Network phase in which the data matrix is divided into training data and testing data. In Knime, the algorithm consists of the learner and the predictor; once the data have been processed, a block to plot the data and another to display the results must be added.
The database obtained allows for quick classification using filters for the following variables: user name, message (tweet), date and time, latitude, longitude, favourite, retweeted, and retweeted from.  Figure 2 shows data preprocessing on the platform, when punctuation marks and numbers are removed and all letters are in lowercase. The connector words, uploaded in a list in advance, are then removed. Finally, the tokeniser process is carried out. It is important to note that the tool occupies several process blocks to complete this action.

Results of Analysis
Once the methodological process of data collection, extraction, filtering, and cleaning was complete, a sentiment analysis was performed, and the average veracity was obtained using Knime. This task consisted of assigning an overall polarity to the tweets on a scale of three levels of intensity: negative, neutral, and positive (Neg, Neu, and Pos). The first set of data analysed contains 11,532 tweets from 2019, among which 21% were classified as neutral, 73% as positive, and 6% as negative. This binary classification of tourists' opinions is shown in Table 3. Table 3. Classification of sentiments in 2019.
The table above shows the analysis of the first set of data. From a positive starting point, the sentiment varies on the basis of the tourist's experiences, emerging news stories and other factors, predicting future sentiment among tourists. The scores indicate the polarity of the sentiment. These classifications allow us to observe that a high and low sentiment leads to a sentiment of 4 or 5, whereas a moderately low sentiment but very negative high sentiment leads to a final sentiment of −4 or −5. Cases in which the final sentiment is 5 (positive) tend to be characterised by moderately high and low sentiment over time and a volume of positive news. Figure 3 shows that the words most commonly used by tourists on Twitter include 'Málaga' and 'Benidorm' as destinations, followed by 'Spain'. 'Beach' and 'holidays' are mentioned as activities, with comments revolving around 'sun', 'people', 'sea', and 'sand'.  Table 4 shows the results of the analysis of tourist sentiment expressed from July to October 2020 based on the second set of data. This set contains 14,000 tweets, among which 12% were classified as neutral, 30% as positive, and 58% as negative. As the Table 4 shows, a greater number of negative sentiments were identified during the COVID-19 era. A word cloud indicating the words most commonly used during the third quarter of 2020 is shown in Figure 4 below. The words most commonly used by tourists on Twitter during the pandemic were identified, with 'Granada' appearing as a new destination alongside new words relating to different types of activities such as 'culture', 'city', and 'mountain'. Other words appearing in the tourists' tweets included 'travel', 'experiences', and 'tourism', as well as indications of change. These are the most common themes across all sentiments expressed by tourists on the social network. Table 5 shows tourists comment on 'sun', 'beach', and, to a lesser extent, 'cultural tourism'. These results were as expected for a beach destination in summer. The destinations with the most classified comments were 'Cádiz', 'Huelva', 'Málaga', 'Almería', and 'Almuñécar'. During the pandemic, the most commonly used expressions revolved around culture and the adaptation of tourism and leisure to the new conditions imposed by COVID-19. These expressions are shown in Table 6. 10 #Gibralfaro Castle The castle has a wall rising up over the city, it's worth climbing it for the stunning views of the city and the surrounding area.
Source: elaborated by authors.
The following matrix (Table 7) shows the most popular words in 2019 and the relationship between them. The words listed appear at least ten times in the tweets. Table 7. Matrix of words appearing at least 10 times in 2019.
The comparative analysis of the sentiments displayed by travellers in Andalusia based on tweets from the two periods under study shows that negative sentiments became more prevalent during the COVID-19 pandemic, as did references to inland and city destinations (Córdoba, Granada, Málaga, and Seville). During the previous period, there were more references to coastal destinations (Cádiz, Almería, Málaga, and Huelva), especially long-standing, mass tourism destinations such as Málaga.
Other words that were more common prior to COVID-19 were 'beach' and 'sand', confirming that beach holidays were predominant during that quarter in 2019. Meanwhile, the words appearing most often during the COVID-19 period were 'cultural' and 'city', demonstrating the need for the tourism sector to adapt to the health crisis and to the rise in travel to less crowded, inland, and city destinations on both the demand and supply side.
Analysis of tourists' behaviours and sentiments using social media data provides similar, complementary information to the survey data.
Both analytical tools highlight the increased importance of cultural tourism and visits to less crowded destinations, including cities (Granada, Málaga, and Córdoba) and inland destinations (mountains, natural parks), during the COVID-19 period in comparison with the previous summer. In summer 2020, there were far fewer references to 'sand', 'sun, and 'beach'. Words such as 'Benidorm', 'Spain', 'pool', 'beach', 'sun', 'sand', etc., which are associated with mass beach tourism, were largely absent in 2020.
On the other hand, the sentiment analysis indicates that tourists travelling in Andalusia experienced a more negative sentiment in summer 2020 than in the previous year. This is apparent from the survey, with lower scores for certain aspects of the tourist experience (public transport and tourist accommodation especially) and from some of the ten most widely cited comments, which highlight the experience of visiting less crowded destinations.

Conclusions
In terms of the first theoretical objective, this paper describes the recent rise in studies performing sentiment analysis using social media data in relation to the tourism sector. The majority of the articles published on the topic in the Web of Science in the last two years have focused on methodology, although studies classifying tourists' opinions of certain resources, destinations or experiences are also common. This study falls into the second category.
With regard to the second theoretical objective, this study has shown that COVID-19 has seriously affected international tourism in terms of numbers of trips and tourist behaviour and sentiments, with a greater impact than other health crises. Tourists have begun to demand safer, healthier destinations, and it is likely that these changes will persist in the future.
As for the empirical objectives, a case study has been used to show how sentiment analysis can be used to supplement or even replace surveys as a tool for analysing tourists' behaviour and opinions. This type of analysis offers cost savings and reveals tourists' behaviour and opinions on a continuous basis in real time, as shown by authors such as [22].
In the Andalusian case study, both the surveys and the sentiment analysis using Twitter data show how COVID-19 has changed the behaviour and sentiments of tourists travelling in the region. By combining the data from both sources, it is apparent that tourism declined in Andalusia in summer 2020, especially in crowded beach destinations.
This analysis demonstrates the need for public and private stakeholders in the Andalusian tourism sector to promote the region as a safe destination and to implement strategies and measures to enhance safety, such as monitoring visitor flows to certain locations, maintaining social distancing, and cleaning facilities and infrastructures, etc. These measures should be more visible in mass tourism destinations, as they are considered less safe by visitors and are at greater risk of declining numbers of arrivals during the pandemic.
An analysis of the data from the two tools offers complementary information. Whereas the survey provides information on behavioural changes in relation to quantitative variables (visitor numbers, average stays, average expenditure, most visited provinces and destinations, etc.), the sentiment analysis reveals subjective utterances and emotions and classifies them as positive, negative or neutral. It also allows us to analyse changes in these opinions on a continuous basis over time at a low cost.
Indeed, this study demonstrates the value of a simple exploratory data analysis in obtaining important information on potential causes of problems, changes in demand, etc. Data visualisation using algorithms, tables, word clouds, and simple graphs is a key element of this exploratory analysis, allowing us to detect possible changes in tourists' behaviour, opinions, and sentiments. Moreover, this type of analysis is inexpensive and affordable for small-scale tourist destinations due to the availability of free software, such as RStudio and Knime, which makes data analysis and graphic representation accessible to any individual or organisation.
This article aims to contribute to the gap in the literature on the use of social media by tourist destinations to analyse tourist behaviours and sentiments. This area of study has huge potential for growth in the coming years, given the significant progress made in the use of big data and social media [66,67].
The study also contributes to a greater understanding of changes in behaviours and sentiments among tourists visiting Andalusia as a result of the COVID-19 pandemic, using a combination of survey data and analysis of comments by tourists on the social network Twitter.
With regard to the limitations of the study, opinions were only analysed using one social network: Twitter. Although this is one of the most widely used social networks, it would be interesting to compare the data with other social networks such as Facebook or Instagram. This is a potential area for further research, which could be extended by comparing information and content on social media with news from traditional media outlets to observe differences in the handling of themes, coverage, and other aspects.