Social Media Data-Based Sentiment Analysis of Tourists’ Air Quality Perceptions

: Analyzing tourists’ perceptions of air quality is of great signiﬁcance to the study of tourist experience satisfaction and the image construction of tourism destinations. In this study, using the web crawler technique, we collected 27,500 comments regarding the air quality of 195 of China’s Class 5A tourist destinations posted by tourists on Sina Weibo from January 2011 to December 2017; these comments were then subjected to a content analysis using the Gooseeker, ROST CM (Content Mining System) and BosonNLP (Natural Language Processing) tools. Based on an analysis of the proportions of sentences with di ﬀ erent emotional polarities with ROST EA (Emotion Analysis), we measured the sentiment value of texts using the artiﬁcial neural network (ANN) machine learning method implemented through a Chinese social media data-oriented Boson platform based on the Python programming language. The content analysis results indicated that in the adaption stage in Sina Weibo, tourists’ perceptions of air quality were mainly positive and had poor air pollution crisis awareness. Objective emotion words exhibited a similarly high proportion as subjective emotion words, indicating that taking both objective and subjective emotion words into account simultaneously helps to comprehensively understand the emotional content of the comments. The sentiment analysis results showed that for the entire text, sentences with positive emotions accounted for 85.53% of the total comments, with a sentiment value of 0.786, which belonged to the positive medium level; the direction of the temporal “up-down-up” changes and the spatial pattern of high in the south and low in the north (while having little di ﬀ erence between the east and the west) were basically consistent with reality. A further exploration of the theoretical basis of the semi-supervised ANN approach or the introduction of other machine learning methods using di ﬀ erent data sources will help to analyze this phenomenon in greater depth. The paper provides evidence for new data and methods for air quality research in tourist destinations and provides a new tool for air quality monitoring.


Introduction
Air quality, climate change and ozone depletion are three important indicators for assessing the sustainable development of the atmospheric environment [1], of which the first two have become major challenges in tourism development [2].Coping with these challenges must involve tourists as core stakeholders.Compared with achievements from extensive studies on tourists' perceptions of climate change [3], tourists' perceptions of air quality have only attracted attention recently and have allowed for some achievements, mainly from the data acquired through traditional survey methods [4][5][6][7].
As an increasing number of people are voicing their feelings and opinions about their living environments on social media, it has proven feasible and effective to use social media data to monitor the quality of the environment, including air pollution, which has long been a serious public health issue [8][9][10].Performing a sentiment analysis on social media data to extract public opinions also provides a new data source and a new method for tourism research [11].Sentiment analysis is the computational study of people's perceptions, evaluations, attitudes, and emotions concerning entities, individuals, problems, events, themes, and attributes [12].The purpose of a sentiment analysis is to extract the emotional orientation of a text toward a certain thing based on the words in the unstructured text [13].Currently, sentiment analysis has allowed for many achievements in tourism studies [14][15][16][17][18][19][20], and existing UGC (user-generated content) data can effectively support sentiment analysis to understand tourists' perception of the overall environmental perception of tourism destinations [21]; however, whether the existing data can help understand tourists' perceptions of air quality at the level of environmental components through sentiment analysis is still a proposition worth examining.
Since the Office of the State Council issued the Action Plan for Air Pollution Prevention and Control in 2013, the Chinese government has made remarkable progress in air quality control, and significant improvements have been made in key areas [22].China is both a major tourist destination and a social media superpower; tourists have been very keen on sharing a wide range of tourism content on social media [11].Therefore, it is ideal to choose China as a pilot area to verify whether the existing data can support the study of tourists' perceptions of air quality.In summary, in this study, we collected Sina Weibo data on China's Class 5A tourism destinations, and these data were then subjected to an artificial neural network (ANN) machine learning method to perform a sentiment analysis to verify whether the existing data can support the study of tourists' perceptions of the air quality of various tourism destinations.After demonstrating feasibility, we then comprehensively and systematically summarized the features and patterns of the perception from three levels, i.e., macro, meso and micro levels; we also made comparisons with cases from the United States to introduce a new research approach to this field and enrich the content system from points to area, providing implications for the control of air pollution crises.

Tourist' Perception of Air Quality
Air is an important natural resource; air quality has an important impact on tourism development [23].To tourists, air pollution is more detrimental to the atmospheric environment than ozone depletion [4].Before going to Hong Kong, tourists generally have a neutral attitude towards the air quality in the region [24] but often express dissatisfaction after a tour [25].Most tourists in Beijing are aware of the risks of haze to health [5], and a structural equation model further revealed that concerns about haze directly affect their environmental risk perceptions and tourism satisfaction [6].Becken et al. [7] conducted a survey on residents in the United States and Australia and found that air quality plays a major negative role in the decision process regarding China as a potential tourist destination.These studies on tourists' perceptions of air quality indicate that the research subjects have mostly been countries or cities at the meso and macro levels, but these studies have rarely focused on tourism destinations, i.e., the micro level, with a limited number of cases, mainly through survey data-oriented statistical methods.

Sentiment Analysis in Tourism
Sentiment analysis has gained increasing interest among researchers in recent years [26].Current sentiment analysis methods fall into two categories: Lexicon matching methods based on a sentiment dictionary and corpus-based machine learning methods [27,28].According to different algorithms (e.g., geometry, probability and artificial intelligence), machine learning methods can be classified into three categories: (1) The support vector machine (SVM) method, which looks for the best linear separation between the data expressing positive emotions and those expressing negative emotions; (2) the naïve Bayesian (NB) method, which estimates the probability of a certain emotion based on the attributes of a text; and (3) the artificial neural network method, which simulates the biological brain through data processing using self-organizing networks of "neurons."Sentiment analysis varies with the granularity of the processed text at various levels, such as word, phrase, sentence and text [29].
In recent years, sentiment analysis has allowed for abundant achievements in tourism studies.Existing studies have mainly focused on overall perception of hotels [14,16,28,[30][31][32][33] and tourism destinations [15,17,18,20,34].In terms of methodology, dictionary matching method and machine learning methods have been applied equally frequently.The dictionaries used mainly includes English and Chinese dictionaries, such as WordNet, SentiWordNet, and HowNet [15,20,35,36], and some dictionaries used in studies, such as the Valence Aware Dictionary for Sentiment Reasoning (VADER), even include a tourism vocabulary [21,31].Among the aforementioned three types of machine learning methods, the support vector machine and the naïve Bayesian methods have been widely used in tourism research due to their fast processing speeds [18].The artificial neural network has yielded the most accurate results, but its high computational requirements have resulted in relatively few studies employing it [17,18,37].The data of the above-described sentiment analyses of tourism have often been from multilingual travel websites and social media data in English, while social media data in Chinese have rarely been addressed.The majority of studies discuss and compare analysis methodology rather than using sentiment analysis methods for research [11,18].
In addition to the abovementioned general studies, in recent years, special studies using sentiment analyses of social media data have also attracted the attention of tourism researchers.Among them, tourists' perception of environmental quality has been very representative.Becken et al. [21], Saura et al. [38], and Saura et al. [39] conducted sentiment analyses on tweets (comments on Twitter) regarding the overall environmental quality of tourism destinations, including the natural environment and social environment of Australia's Great Barrier Reef tourism destination and Spain's and Switzerland's hotels, using the vocabulary matching method, the unsupervised support vector machine method, and the Python-programmed algorithm; they argued that air quality has a very important impact on the destinations' environments.Unlike the high correlation between tourists' emotions and air quality based on questionnaire survey data [40], the correlation obtained through the analysis of large data collected from Weibo using the dictionary matching method is rather weak [41], which also represents a few studies that have analyzed tourists' perceptions of air quality based on social media data and focused only on a single city.In summary, using a sentiment analysis of social media data to understand tourists' perception of air quality has been covered in a few environmental thematic studies, and a small number of studies focusing only on this environmental component have been published.However, whether existing data can support the general research and systematically sum up the characteristics and laws of this perception has yet to be verified.

Research Methods
In content analysis, we adopted three word segmentation methods-the widely used dictionary matching-based ROST CM (Content Mining System) Chinese and English word segmentation method and two machine learning algorithm-based Chinese word segmentation methods, Gooseeker and BosonNLP (Natural Language Processing)-to conduct a word frequency analysis on social media data and use the semantic network diagram generated through ROST CM to compare and analyze the content and structure of the comments.In the sentiment analysis, to address the issue that the existing studies have rarely used social media data in Chinese, we adopted the semi-supervised and Chinese corpus-based artificial neural network machine learning method through the Boson platform, programmed using Python 3.0 based on an analysis of the proportions of sentence polarities of text using the emotional dictionary method of ROST EA (Emotion Analysis), i.e., a hybrid method that primarily an artificial neural network and secondarily affects ROST EA.
The artificial neural network method has two distinct characteristics.First, it can effectively dock data from Weibo.The semantic corpus of the platform was automatically constructed based on data from channels such as Weibo and online forums.In the analysis, we set the URL (Uniform Resoure Locator) parameters to define the corpus as the Weibo corpus, thus achieving seamless docking.Second, the artificial neural network method has a high accuracy.Kirilenko et al. [18] established the evaluation criteria of sentiment analysis, and the calculation formulas are presented in Table 1.The accuracy (A) represents the ratio of the number of correct topics to the total number of topics in the test results.The precision (P) indicates the ratio of the number of topics of a specific polarity in the detection results to the number of topics of the same polarity.The recall (R) indicates the ratio of the number of correctly classified topics in the test results to the total number of topics that should be of the same classification.The F1 value synthesizes the precision and recall values to capture the overall pros and cons of the algorithm.There are two reasons for the high accuracy of the artificial neural network method.First, the size of the training corpus reaches the level of tens of millions.With machine learning methods, the larger the size of the training corpus is, the higher the accuracy of the analysis is [26,34].Alaei et al. [11] also noted that machine learning methods would be heavily used for more complex sentiment analysis with more extensive training corpuses in the future.Second, the artificial neural network method not only effectively identifies various parts of speech of general vocabularies, such as nouns and verbs, but also recognizes proper nouns such as Timothy Easton, comprehends peculiar glossaries such as slang terms, and assesses the sentiment of online buzzwords.The method can even uncover the implicit emotions of neutral comments.The results of the artificial neural network method showed that the sentiment score of the proverb "liu wan'er" (go for a stroll) was 0.73, the sentiment scores of the online buzzwords "555" and "666" were 0.04 and 0.89, respectively, and those of the sentences "The smog in Beijing cannot be avoided" and "Beijing makes people hide from smog" were 0.25 and 0.80, respectively.
There were two stages to the Python 3.0 program.The first entailed calling the artificial neural network method on the Boson platform to analyze chapter-level review texts.The other stage involved generating the output of the annual, quarterly, and monthly sentiment scores of the scenic destinations.Considering that the sentiment score was between 0 and 1 and the threshold value to distinguish positive and negative images was 0.5, we established the following grade criteria based on the equidistance principle: Extremely poor (0~0.100),very poor (0.101~0.200), poor (0.201~0.300), somewhat poor (0.301~0.400), slightly below average (0.401~0.500), slightly above average (0.501~0.600), somewhat good (0.601~0.700), good (0.701~0.800), very good (0.801~0.900), and excellent (0.901~1).

Data Source
Sina Weibo was launched in August 2009, and by 2013, it had 536 million registered users [42].Like other online social media platforms such as Facebook and Twitter, Sina Weibo has a large text corpus and is an important data source for the study of sentiment analysis [43,44].The social media data were collected from Sina Weibo using a web crawler, "Bazhuayu" (octopus).Given that the time units used in this study were mainly years and quarters, we set the time period from January 1, 2010 to December 31, 2017 when collecting the data in 2018.The data were collected using query keywords of "air" and "name of the 5A tourism destination," and denoised as follows: (1) Redundant comments were deleted; (2) the data from 2010 were discarded due to the data size being too small; (3) information from government agencies, enterprises and other organizations, which also have Weibo accounts, was excluded because we wanted to focus on public opinions that are disseminated online; and (4) among 249 tourism destinations, those that had less than ten comments were excluded.The data cleansing was performed manually, which reflected the semi-supervised nature of the machine learning method adopted in this study and provided a more accurate and larger dataset despite being time intensive.After cleaning, 27,500 comments on 195 tourism destinations made from 1 January, 2011 to 31 December, 2017, containing a total of 2.05 million words (141 comments per tourism destination and 74 words per comment, on average), were obtained.Compared with the dataset collated by Alaei et al. [11], the dataset collected in this study was larger in size.In addition, it should be noted that first, although comments after denoising were mostly from tourists, some of them were from stakeholders, such as residents and tourism practitioners, who also commented on the air quality of tourism destinations from perspectives inconsistent with those of tourists.Second, to facilitate the Python program performing machine recognition on the comments, we performed normalization on some aspects, such as the date.Third, to compare with cases from the United States, we also collected 372 comments from Chinese tourists regarding US destinations.

Analysis of the Number of Comments
As mentioned above, in 2010, there were very few comments in Weibo about the air quality of tourism destinations, and, as such, these comments were neglected in this study.Starting in 2011, this topic attracted more attention, and in 2013, the number of comments on this topic peaked; during 2011-2013, Weibo underwent a surge period, and the number of comments increased by 233% (Figure 1).In 2014, Weibo returned to reasonableness, and in subsequent years, the number of comments has maintained a high level, indicating that tourists continued their attentions to air quality after entering the adaption stage in Weibo.In terms of months, the number of comments made in the period from March to October were high, especially in April and October, and those made at other times, especially in January and December, were low; the highest number of comments were made in April, and the lowest were made in January (the number of comments made in January was only 36.82% of those made in April), indicating that the strong seasonal characteristics of tourism activities are also reflected in the number of comments on air quality of tourism destinations.
In terms of the spatial distribution of comments, the top five provinces in terms of popularity were Zhejiang, Jiangsu, Fujian, Beijing, and Sichuan, four of which are in the south; the top ten tourism destinations were West Lake, Juzizhou, Dujiangyan and Qingchengshan, the Olympic Park, Gulangyu Island, Taihu Lake, Qiandao Lake, the Forbidden City, Sun Yat-sen Mausoleum and Wuzhen, eight of which are in the south, indicating that whether or not they are popular at the provincial level or tourism destination level, most of the destinations with high tourism strength are in the south, and only a few are in the north.In terms of the spatial distribution of comments, the top five provinces in terms of popularity were Zhejiang, Jiangsu, Fujian, Beijing, and Sichuan, four of which are in the south; the top ten tourism destinations were West Lake, Juzizhou, Dujiangyan and Qingchengshan, the Olympic Park, Gulangyu Island, Taihu Lake, Qiandao Lake, the Forbidden City, Sun Yat-sen Mausoleum and Wuzhen, eight of which are in the south, indicating that whether or not they are popular at the provincial level or tourism destination level, most of the destinations with high tourism strength are in the south, and only a few are in the north.

Content Analysis
Analyzing high-frequency words in text is an important means to study large datasets, and the numbers of high-frequency words can effectively reflect the key information in the comments [45].Relative to the two types of word frequency analysis software for bilingual (English and Chinese) or Chinese texts, i.e., the currently widely used ROST CM and BosonNLP (the latter has attracted more attention in recent years), Gooseeker can accommodate a larger scale when conducting analysis on word frequency statistics and thus was adopted to process the high-frequency words in the 27,500 comments in Chinese.The statistics showed that in addition to words that are of little emotional significance, such as "air," "here," and "today," image-related words with a word frequency of greater than 2000 mainly included "freshening," "fresh," and "not bad." To further analyze word frequencies, we examined the part of speech aspect of word frequency analysis.Alaei et al. [11] and Kirilenko et al. [18] pointed out that the vast majority of studies on tourism sentiment analysis only took into account subjective emotion words such as "happy" and "sorry."In fact, objective words such as "family" and "war" also carry emotions and should be included in the category of sentiment analysis.Among them, the part of speech of subjective emotion words is the adjective, while that of objective emotion words is mostly the noun [46,47].These two types of emotion words can be further categorized into two types: Positive and negative.Therefore, the words in the comments can be divided into five categories: Positive subjective emotion words, positive objective emotion words, negative subjective emotion words, negative objective emotion words, and non-emotional words.It was argued that "forest," "sunrise," "landscape," "diversity," etc., are positive objective emotion words that reflect the quality of the natural environment [39], while "low temperature," "high temperature," "precipitation,"

Content Analysis
Analyzing high-frequency words in text is an important means to study large datasets, and the numbers of high-frequency words can effectively reflect the key information in the comments [45].Relative to the two types of word frequency analysis software for bilingual (English and Chinese) or Chinese texts, i.e., the currently widely used ROST CM and BosonNLP (the latter has attracted more attention in recent years), Gooseeker can accommodate a larger scale when conducting analysis on word frequency statistics and thus was adopted to process the high-frequency words in the 27,500 comments in Chinese.The statistics showed that in addition to words that are of little emotional significance, such as "air," "here," and "today," image-related words with a word frequency of greater than 2000 mainly included "freshening," "fresh," and "not bad." To further analyze word frequencies, we examined the part of speech aspect of word frequency analysis.Alaei et al. [11] and Kirilenko et al. [18] pointed out that the vast majority of studies on tourism sentiment analysis only took into account subjective emotion words such as "happy" and "sorry."In fact, objective words such as "family" and "war" also carry emotions and should be included in the category of sentiment analysis.Among them, the part of speech of subjective emotion words is the adjective, while that of objective emotion words is mostly the noun [46,47].These two types of emotion words can be further categorized into two types: Positive and negative.Therefore, the words in the comments can be divided into five categories: Positive subjective emotion words, positive objective emotion words, negative subjective emotion words, negative objective emotion words, and non-emotional words.It was argued that "forest," "sunrise," "landscape," "diversity," etc., are positive objective emotion words that reflect the quality of the natural environment [39], while "low temperature," "high temperature," "precipitation," "cloudiness," etc., are negative objective emotion words that reflect weather conditions [48].However, the word frequencies of these five categories have not been comprehensively analyzed.
First, to gain a preliminary understanding of the perception of air quality by Chinese tourists at the macro level, we obtained the statistics on the most important subjective emotion words in the comments among the five categories using Gooseeker, and the results showed that among the first 100 high-frequency words, the positive subjective emotion words "freshening," "fresh," "good," "like," "beautiful," and "comfortable" ranked 2nd, 6th, 15th, 20th, 32nd and 35th, respectively, and the only negative subjective emotion word "pollution" ranked 71st.Be it quantity or ranking, positive subjective emotion words were higher than negative subjective emotion words, and tourists' perceptions of air quality were dominated by positive images.The comments containing "pollution" only accounted for 2.63% of the total comments, indicating that on social media, tourists have poor awareness of the air quality crisis.
Second, to further analyze the compositions of the five word categories and compare the word segmentation accuracies of three word frequency analysis tools, i.e., Gooseeker, ROST CM and BosonNLP, while considering that ROST CM and BosonNLP have been mostly used in small-scale data processing (BosonNLP exports its emotion dictionary to ROST CM to achieve word frequency analysis, so the two have the same capacity when processing text), we chose different tourism destinations above the mesoscopic level; these destinations were Heilongjiang, which is dominated by rural tourism destinations, and Shanghai, which is dominated by urban tourism destinations.Table 2 indicates that the numbers of comments with non-emotional words, positive subjective emotion words, positive objective emotion words, negative subjective emotion words and negative objective emotion words on tourism destinations in Heilongjiang accounted for 68.69%, 12.14%, 19.16%, 0% and 0%, respectively, of the total comments, while those on tourism destinations in Shanghai accounted for 92.24%, 4.77%, 1.20%, 1.35% and 0.43%, respectively, of the total comments.The similarities between the proportions of destinations in Heilongjiang and Shanghai were that non-emotional words dominated and that positive emotion words were far more common than negative emotion words; this outcome is consistent with the conclusion from other related studies that the tourism reviews are dominated by comments with positive polarity [18,20,49,50], and comments with objective emotion words and those with subjective emotion words accounted for similarly high proportions in the total comments, which is worth noting.The differences between the proportions of sites in Heilongjiang and Shanghai were that sites in Shanghai exhibited the comments with all five word categories, while those in Heilongjiang lacked the negative high-frequency words and exhibited higher proportions of comments with non-emotional words and positive emotion words.Clearly, the tourism image of Heilongjiang is significantly better than that of Shanghai.
Table 2 also indicates that the word segmentation accuracies of Gooseeker, ROST CM and BosonNLP on comments on Heilongjiang destinations were 93.65%, 94.43% and 93.09%, respectively, while those on Shanghai destinations were 92.14%, 96.10% and 94.51%, respectively, indicating that all three tools could achieve a high word segmentation accuracy; of these tools, the most widely used, ROST CM, had the highest accuracy.BosonNLP followed, and Gooseeker, which has the highest data processing capacity, had the lowest accuracy, though it exhibited only a slight accuracy difference from BosonNLP.These results support the results of Cong et al. [51] obtained using ROST for Chinese word frequency analysis while providing implications for choosing appropriate tools in future studies.

Sentiment Analysis
The above content analysis concluded that tourists' perceptions of air quality were mainly positive, so we further conducted a quantitative study through sentiment analysis.Among the above-described methods, Gooseeker was not able to perform a sentiment analysis, ROST EA could generate a general assessment on the ternary polarity proportion of each sentence of the comments using the sentiment dictionary method, and the artificial neural network machine learning method on the Boson platform was able to estimate the sentimental value of each or all comments with spatiotemporal message tags.In this study, we adopted a hybrid method in which the artificial neural network was the primary method, and ROST EA was the secondary method.

ROST Sentiment Analysis
The results of the ROST EA analysis showed that in the sentences of comments on the air quality of China's 195 Class 5A tourism destinations, those with positive emotions, neutral emotions and negative emotions accounted for 85.53%, 7.21% and 7.26%, respectively, of the total number of sentences, indicating that positive emotions dominated, negative emotions also corresponded to a large proportion and should not be ignored, and neutral comments exhibited the lowest proportion; therefore, the comments posted by tourists on social media are mostly with emotional content.Due to differences in subjects, directly comparing these proportions with those in other studies is meaningless.However, comparing the proportions of sentences with different emotional polarities can facilitate the generalization of these patterns.Like this study, in the sentences of the comments about the hotel accommodations in Spain and Switzerland and the Great Barrier Reef in Australia, the proportion of positive comments was also significantly higher than that of negative comments [21,38,39], and in some special circumstances, tourists' emotions regarding tourism destinations may also be more negative than positive [34].Therefore, we postulate that tourists' comments regarding the environment, including various aspects such as eating, lodging, traveling, touring, shopping and entertaining, are mostly positive, although not absolutely positive.

ANN Sentiment Analysis
We wrote some Python codes to use the artificial intelligence algorithm-based artificial neural network method on the Boson platform to further estimate the sentiment value of the text that contains both objective and subjective emotion words.Prior to the analysis on sentimental values at three levels, i.e., overall, annual and quarterly, we first analyzed the parameters of the assessment criterion.
We obtained a series of assessment criterion coefficients by analyzing the data using the semi-supervised artificial neural network method, and the accuracy, precision, recall rate and F1 were all 0.87.Given that the parameters generally varied little (Table 3), we adopted accuracy, the most representative parameter, to assess performance.Table 3 indicates that the accuracies were generally high.Given that only coefficients based on the same data are comparable, we focused on the analysis of the results from social media data that were only in English rather than Chinese.In a previous study, the accuracy of an analysis on 116 tweets about Iizuka City, a tourist city in Fukuoka Prefecture of Japan, using an adjusted unsupervised naïve Bayesian method was 0.78-0.92[15].In a recent study, the supervised support vector machine, naïve Bayesian, and unsupervised deep learning methods were used to analyze 762,475 tweets, achieving accuracies of 0.50-0.51,0.54-0.55,and 0.39, respectively [18].Generally, an accuracy value of 0.7 can be used as a baseline for the accuracy of a sentiment analysis [52].In this study, we analyzed a total of 27,500 comments, so an accuracy of 0.87 could be regarded as rather high.The results showed that the overall sentiment value of tourists for the air quality of China's Class 5A tourism destinations was 0.786, which was at the positive and medium level.Compared with the proportion of sentences with positive emotion in the total sentences (85.53%) obtained through ROST EA, this sentiment value reflected a slightly decreased positive image, while the result that the comments with positive images dominated is consistent with the conclusion in the content analysis section-as well as with those of previous studies-that emotion words were mostly positive and only slightly negative [18,20,49].
From the time evolution point of view, the sentiment values of 2011-2017 were 0.801, 0.770, 0.787, 0.798, 0.806, 0.774 and 0.777, respectively, all dominated by positive images, and the highest value was greater than the lowest value by 4.71%.After removing the "abnormal" value of 2011 when Weibo was in its initial overzealous stage, the sentiment values exhibited slight fluctuations (up-down-up, with a standard deviation σ of 0.014), assuming an overall stable trend (Figure 2).The linearity shown in Figure 2 indicates that the time period of 2012-2017 was a stable stage of sentiment value, and air quality showed the pattern of a slight increase accompanied by a slight drop as well as a slight drop accompanied by a slight increase.In 2013, the General Office of the State Council of the People's Republic of China issued the Action Plan for Air Pollution Prevention and Control, and this year may have also been the "first year" in which China took air pollution seriously.The relatively stable sentiment value reflected the effectiveness of China's control of the atmospheric environment to some extent.In terms of quarterly changes, the sentiment values of the first, second, third and fourth quarters were 0.777, 0.798, 0.789 and 0.775, respectively, of which the highest value (the second quarter) was greater than the lowest value (the fourth quarter) by 2.98%, indicating a rather small seasonal fluctuation.
Sustainability 2019, 7, 24 FOR PEER REVIEW 12 of 24 (1) National Sentiment Value The results showed that the overall sentiment value of tourists for the air quality of China's Class 5A tourism destinations was 0.786, which was at the positive and medium level.Compared with the proportion of sentences with positive emotion in the total sentences (85.53%) obtained through ROST EA, this sentiment value reflected a slightly decreased positive image, while the result that the comments with positive images dominated is consistent with the conclusion in the content analysis section-as well as with those of previous studies-that emotion words were mostly positive and only slightly negative [18,20,49].
From the time evolution point of view, the sentiment values of 2011-2017 were 0.801, 0.770, 0.787, 0.798, 0.806, 0.774 and 0.777, respectively, all dominated by positive images, and the highest value was greater than the lowest value by 4.71%.After removing the "abnormal" value of 2011 when Weibo was in its initial overzealous stage, the sentiment values exhibited slight fluctuations (up-down-up, with a standard deviation σ of 0.014), assuming an overall stable trend (Figure 2).The linearity shown in Figure 2 indicates that the time period of 2012-2017 was a stable stage of sentiment value, and air quality showed the pattern of a slight increase accompanied by a slight drop as well as a slight drop accompanied by a slight increase.In 2013, the General Office of the State Council of the People's Republic of China issued the Action Plan for Air Pollution Prevention and Control, and this year may have also been the "first year" in which China took air pollution seriously.The relatively stable sentiment value reflected the effectiveness of China's control of the atmospheric environment to some extent.In terms of quarterly changes, the sentiment values of the first, second, third and fourth quarters were 0.777, 0.798, 0.789 and 0.775, respectively, of which the highest value (the second quarter) was greater than the lowest value (the fourth quarter) by 2.98%, indicating a rather small seasonal fluctuation.We then compared the sentiment values with the actual situation.According to the Report on the State of the Environment in China issued by the Ministry of Ecology and Environment, except for 2016, the national air quality in each year during 2012-2017 was generally better than that of the previous year.The comparison showed that the direction of sentiment value change was precisely consistent with the actual situation.This provides evidence for the credibility of the method that uses a sentiment analysis on the existing social media data to understand tourists' perceptions of air quality.The sentiment value of 2016 decreased by 4.02% compared with that of 2015, while the national average air quality index (AQI) over the same period fell by 5.4% [61], and the difference between the two numbers could mean that in terms of the level of air quality change, tourists' perceptions may be worse than the actual situation.
(2) Regional Sentiment Value We then compared the sentiment values with the actual situation.According to the Report on the State of the Environment in China issued by the Ministry of Ecology and Environment, except for 2016, the national air quality in each year during 2012-2017 was generally better than that of the previous year.The comparison showed that the direction of sentiment value change was precisely consistent with the actual situation.This provides evidence for the credibility of the method that uses a sentiment analysis on the existing social media data to understand tourists' perceptions of air quality.The sentiment value of 2016 decreased by 4.02% compared with that of 2015, while the national average air quality index (AQI) over the same period fell by 5.4% [61], and the difference between the two numbers could mean that in terms of the level of air quality change, tourists' perceptions may be worse than the actual situation.
(2) Regional Sentiment Value Based on the sentiment value of each province, we generated China's sentiment map of tourists' perceptions of air quality (Figure 3).Except for Shanghai, Gansu, Shanxi, Shaanxi and other provinces with the lowest sentiment values connected to form the north-Northwest China region; Beijing, Hebei, Shandong, Tibet, Sichuan and other provinces with low sentiment values formed the North China region and the Southwest China region; provinces with moderate sentiment values were numerous and mainly included Henan, Hubei, Hunan, Guangdong, Jiangxi, Zhejiang and Jiangsu, which extend inland from the southeast coast and form the vast Southeast China region, while Liaoning, Qinghai and Yunnan are relatively isolated; except for Guizhou and Guangxi, provinces with high sentiment values, such as Jilin, Xinjiang and Fujian, are relatively scattered, and Heilongjiang, Inner Mongolia and Ningxia, which had the highest sentiment values, constitute the northeast-North China region.The three regions with the lowest or lower sentiment value of north-Northwest China, North China and Southwest China were further combined into a northeast-southwest low-value area.In other words, the two regions with the lowest and the second-lowest emotional values were roughly the dividing line of the emotional division.The regional distribution pattern indicated the regional characteristics of tourists' perceptions of air quality, while the presence of some decentralized provinces reflected the complexity of the perception.
provinces with high sentiment values, such as Jilin, Xinjiang and Fujian, are relatively scattered, and Heilongjiang, Inner Mongolia and Ningxia, which had the highest sentiment values, constitute the northeast-North China region.The three regions with the lowest or lower sentiment value of north-Northwest China, North China and Southwest China were further combined into a northeast-southwest low-value area.In other words, the two regions with the lowest and the second-lowest emotional values were roughly the dividing line of the emotional division.The regional distribution pattern indicated the regional characteristics of tourists' perceptions of air quality, while the presence of some decentralized provinces reflected the complexity of the perception.
Figure 3 also shows that except for Shanghai, provinces with the worst air quality were mostly in the north, those with the best air quality were in both the north and south, and those with medium-level air quality were mostly in the south, indicating that the perception assumed a pattern of lower air quality in the north but higher air quality in the south; divided by the Hu Huanyong Line, the east-west difference was small.Previous studies revealed that air pollution in China's cities is characterized by the spatial pattern of "high in the north and east but low in the south and west," of which the Beijing-Tianjin-Hebei region, the Northwest China region, the Shandong Peninsula, and the middle reaches of the Yellow River are highly polluted areas, and the southern coastal areas, the Yunnan-Guizhou Plateau and the Qinghai-Tibet Plateau are clean areas [62][63][64].The comparison indicates that the result of tourists' perceptions was essentially consistent with the actual situation of regional air pollution.The existence of the above-described large regions as well as the consistency between their sentiment values and the actual air pollution situation provide additional evidence for the reliability of the sentiment analysis results.  Figure 3 also shows that except for Shanghai, provinces with the worst air quality were mostly in the north, those with the best air quality were in both the north and south, and those with medium-level air quality were mostly in the south, indicating that the perception assumed a pattern of lower air quality in the north but higher air quality in the south; divided by the Hu Huanyong Line, the east-west difference was small.Previous studies revealed that air pollution in China's cities is characterized by the spatial pattern of "high in the north and east but low in the south and west," of which the Beijing-Tianjin-Hebei region, the Northwest China region, the Shandong Peninsula, and the middle reaches of the Yellow River are highly polluted areas, and the southern coastal areas, the Yunnan-Guizhou Plateau and the Qinghai-Tibet Plateau are clean areas [62][63][64].The comparison indicates that the result of tourists' perceptions was essentially consistent with the actual situation of regional air pollution.The existence of the above-described large regions as well as the consistency between their sentiment values and the actual air pollution situation provide additional evidence for the reliability of the sentiment analysis results.
The perception result of each tourism destination was basically consistent with the actual air quality of the region where the tourism destination was located, but there were still some differences due to the following reasons: First, air pollution is an important part of air quality perception, which was reflected in that among the top 100 high-frequency words nationwide, "pollution" was the only negative subjective emotion word; however, other aspects, such as air thinness and air temperature, also affected the sentiment value of perception.For example, comments about "air thinness" accounted for 0.50% of the total comments nationwide but 19.33% of the comments in Tibet, and the thin air lowered the sentiment value.The above analysis provides an explanation that although there is only mild air pollution in Western China [62][63][64], the sentiment value of this region was not significantly different from that of Eastern China, i.e., Western and Eastern China did not exhibit the air pollution pattern of "high in the east but low in the west."Second, the reference to the object only included the city, but tourism destinations as the carriers of the perceived content included those from both urban and rural areas, and the sentiment values of urban and rural tourism destinations differed significantly.The statistical results showed that the average sentiment value of urban tourism destinations was 0.772, while that of rural tourism destinations was 0.800 and thus higher than that of urban tourism destinations by 3.57%.The northeast-North China region showed the highest sentiment value, which is related to that their tourism destinations are mainly located in rural areas.One of the reasons for the formation of the "enclave-type" provinces, such as Chongqing and Anhui, that had significantly different sentiment values from their surrounding regions is that the vast majority of their tourism destinations had rural characteristics.Third, the tourism destination areas are generally discontinuous in space, and only a small part of the region had better air quality.Fourth, tourism activities have profound seasonal characteristics, so the perception was uneven or even biased.Fifth, the characteristics of tourism activities themselves are an important cause.The sentiment value of the "enclave-type" provinces such as Shanghai was much less than that of Beijing, which is also a mega city and located in a heavily polluted area, likely because the comments about Shanghai were mostly on the Oriental Pearl TV Tower, and the tourism activities at this particular destination have a high requirement for air visibility so the visitors can have a bird's-eye view from the towering height.Low visibility has a significant negative impact on tourists' perceptions of the atmospheric environment [65].The above analyses on the causes provide some explanation for the "distortion" of the perception in addition to its "truthfulness." In addition to the regional perspective derived from the provincial sentiment values, China's eight-region division scheme based on geographical similarities and provincial-level administrative division integrity [66,67] also provides another perspective.The eight major regions are the Northeast China region (Heilongjiang, Jilin and Liaoning), the North China region (Beijing, Tianjin, Hebei, Shanxi, Henan, and Shandong), the East China region (Shanghai, Jiangsu, Zhejiang, and Fujian), the Central China region (Anhui, Jiangxi, Hubei, and Hunan), the South China region (Guangdong, Guangxi, and Hainan), the Southwest China region (Yunnan, Guizhou, Sichuan, and Chongqing), the Northwest China region (Shaanxi, Gansu, Ningxia, Inner Mongolia, and Xinjiang) and the Qinghai-Tibet region (Qinghai and Tibetan).Based on this division scheme, in terms of sentiment value, the regions were ranked in the following descending order: Northeast China, South China, Central China, East China, Northwest China, Southwest China, North China and the Qinghai-Tibet region.Among them, the Northeast China region was first-ranked, mainly because the tourism destinations are mainly in rural areas; the South China and North China regions, respectively, ranked the second highest and the second lowest, which is consistent with the rankings in the national air quality list reported by others [62][63][64]; although the Qinghai-Tibet Plateau had little air pollution, it was still bottom-ranked, largely because it has thin and cold air.
Quarterly sentiment value can provide a criterion for assessing the quarterly air quality.Table 4 indicates that the highest quarterly sentiment value was 0.805 (the fourth quarter in the South China region), which was at a very high level and significantly greater than that of other regions.Tourists are more satisfied with warm weather in the cold winter than that in the summer [3], and the results for Shanghai was only at the moderate level because it is a megacity and has its own characteristics of tourism activities, as mentioned above.Among the 17 provinces rated at the "high level," those with low sentiment values, such as Gansu, Shanxi, Shaanxi and Beijing, are mainly in the northern regions, while those with high sentiment values, such as Guangdong, Jiangxi, Jiangsu and Yunnan, are mainly in southern regions, which is largely consistent with the spatial pattern of air pollution in China, i.e., heavily polluted in the north but mildly polluted in the south [62][63][64].
Haze is a key indicator of current air quality in China and was found to be significantly correlated with public sentiment [68].By reviewing the sentiment value of each province and the proportion of the comments containing "haze" out of the total comments, the correlation coefficient between the two was −0.54 (p < 0.05), indicating that there was a significant negative correlation between haze and tourists' sentiment value.Therefore, using "haze" as a representative to analyze tourists' perceptions of the air pollution crisis is more convincing.Of the 27,500 comments made nationwide, only 522 contained the word "haze," accounting for 1.90% of the total comments, of which the proportions of comments mentioning "haze" made by tourists to the Shanghai, Gansu, Shanxi, Shaanxi, Tibet and Beijing destinations in their respective total comments were 3.89%, 0.81%, 2.81%, 2.96%, 1.12% and 3.63%, indicating that contrary to the conclusion that tourists' perceptions of haze are strong, which has been drawn based on suggestive questionnaires [5,40,41], tourists' perceptions of the air pollution crisis on the internet are more relaxed, more free, and not strong; they are, however, more truthful and reliable, which is consistent with the conclusion in the content analysis section based on the word "pollution."(4) Sentiment Value of Scenic Spot Accurately grasping the overall polarity of tourists' online reviews of tourism destinations is a key to tourism analysis and application [69].At the tourism destination level, in terms of sentiment value, the top ten tourism destinations were ranked in descending order as follows: Huanglong, Yunnan Stone Forest, Bailidujuan, Qingyan ancient town, Langshan, Dajueshan, Keketuohai, Kunming Exposition Garden, Dongjiang Lake and Jingpo Lake.Among them, except for Keketuohai and Jingpo Lake, all the tourism destinations are in the south (Figure 5).The bottom ten destinations in terms of sentiment value were ranked as follows: Terracotta Warriors and Horses Museum, Oriental Pearl TV Tower, Eastern Qing Mausoleum, Xiangsha Bay, Qiao Family Courtyard, Famen Temple, Jiayuguan, Yueya Spring, Minggucheng (Cemetery of Confucius, Confucius Family Mansion, Confucian Temple) and Pingyao Ancient City.Except for the Oriental Pearl TV Tower, all the tourism destinations are in the north, concentrating in areas with severe air pollution, such as Shaanxi, Shanxi, Gansu and Hebei (Figure 6).These results show that tourism destinations with high sentiment values are mainly in the south and that those with low sentiment values are mainly in the north, which further confirms that the air quality of tourism destinations in China assumes a spatial pattern of low pollution in the south and high pollution in the north.In particular, the destination of Terracotta Warriors and Horses Museum exhibited the lowest sentiment value, possibly because of the poor overall regional air quality, heavy smell in the air, oxidation of terracotta figurines, and other factors.Previously, the reliability of the sentiment value at the macro level has been confirmed through the AQI.We also used this index to examine the reliability of the tourism destination sentiment value at the micro level.The atmospheric movement characteristics made such analysis rather difficult, so we chose to analyze the island-type destination Putuoshan, which is relatively untouched, as a case study.By comparing the daily sentiment value in 2017 and the AQI of the corresponding date from the actual monitoring, the two sets of data exhibited a low correlation but were close to a moderate correlation (correlation coefficient −0.45, n = 25, p < 0.05), which also provides a third piece of evidence for the reliability of the sentiment analysis.Previously, the reliability of the sentiment value at the macro level has been confirmed through the AQI.We also used this index to examine the reliability of the tourism destination sentiment value at the micro level.The atmospheric movement characteristics made such analysis rather difficult, so we chose to analyze the island-type destination Putuoshan, which is relatively untouched, as a case study.By comparing the daily sentiment value in 2017 and the AQI of the corresponding date from the actual monitoring, the two sets of data exhibited a low correlation but were close to a moderate correlation (correlation coefficient −0.45, n = 25, p < 0.05), which also provides a third piece of evidence for the reliability of the sentiment analysis.Previously, the reliability of the sentiment value at the macro level has been confirmed through the AQI.We also used this index to examine the reliability of the tourism destination sentiment value at the micro level.The atmospheric movement characteristics made such analysis rather difficult, so we chose to analyze the island-type destination Putuoshan, which is relatively untouched, as a case study.By comparing the daily sentiment value in 2017 and the AQI of the corresponding date from the actual monitoring, the two sets of data exhibited a low correlation but were close to a moderate correlation (correlation coefficient −0.45, n = 25, p < 0.05), which also provides a third piece of evidence for the reliability of the sentiment analysis.
(5) Sentiment Values in China and the US Based on Sina Weibo comments, it was also possible to analyze the air quality sentiment values of a few important destination countries.Considering that the US datasets are large and that China and the US are similar in terms of geological location and land area, we focused on comparing the air quality difference of the two countries, which enabled us to examine some propositions such as "the American moon is brighter than the Chinese moon" from the perspective of the human sensor.The sentiment value of air quality perception of the US by Chinese tourists was 0.803, which was greater than that of China (0.786) by 2.22%, indicating that Chinese tourists have a higher regard for US air quality, with a higher level of satisfaction.
The semantic network diagrams of the Chinese and American comments generated by ROST showed that compared with that of the US, the overall air quality image of China was slightly poorer, with a higher level of diversity and structural complexity (Figures 7 and 8).Specifically, from the perspective of positive and negative emotion words, in the case of China, the number of positive emotion words was higher than that of negative emotion words (in Figures 7 and 8, dark green represents the positive subjective emotion word, sky blue represents the positive objective emotion word, red represents the negative subjective emotion word, orange represents the negative objective emotion word, and dark blue represents the non-emotional word).Meanwhile, in the case of the US, only positive emotion words were present, and no negative emotion words were present.In terms of content, in the case of China, more names of tourism destinations showed up, reflecting that tourists' attention to air quality was mainly at the tourism destination level, while in the case of the US, the names of tourism destinations were rarely mentioned; instead, names of cities were frequently mentioned, indicating that tourists' attention was mainly on the regional environment, not yet at the tourism destination level.In terms of structure, the emergence of negative subjective emotion words, such as "unfortunately" and "pollution," in the case of China increased emotion word diversity as well as complexity; "fresh" (a positive subjective emotion word) and "breathing" (a non-emotional word) coemerged frequently in both the US and China cases, showing remarkable aggregation structure and constituting a subcore.
Sustainability 2019, 7, 24 FOR PEER REVIEW 18 of 24 (5) Sentiment Values in China and the US Based on Sina Weibo comments, it was also possible to analyze the air quality sentiment values of a few important destination countries.Considering that the US datasets are large and that China and the US are similar in terms of geological location and land area, we focused on comparing the air quality difference of the two countries, which enabled us to examine some propositions such as "the American moon is brighter than the Chinese moon" from the perspective of the human sensor.The sentiment value of air quality perception of the US by Chinese tourists was 0.803, which was greater than that of China (0.786) by 2.22%, indicating that Chinese tourists have a higher regard for US air quality, with a higher level of satisfaction.
The semantic network diagrams of the Chinese and American comments generated by ROST showed that compared with that of the US, the overall air quality image of China was slightly poorer, with a higher level of diversity and structural complexity (Figures 7 and 8).Specifically, from the perspective of positive and negative emotion words, in the case of China, the number of positive emotion words was higher than that of negative emotion words (in Figures 7 and 8, dark green represents the positive subjective emotion word, sky blue represents the positive objective emotion word, red represents the negative subjective emotion word, orange represents the negative objective emotion word, and dark blue represents the non-emotional word).Meanwhile, in the case of the US, only positive emotion words were present, and no negative emotion words were present.In terms of content, in the case of China, more names of tourism destinations showed up, reflecting that tourists' attention to air quality was mainly at the tourism destination level, while in the case of the US, the names of tourism destinations were rarely mentioned; instead, names of cities were frequently mentioned, indicating that tourists' attention was mainly on the regional environment, not yet at the tourism destination level.In terms of structure, the emergence of negative subjective emotion words, such as "unfortunately" and "pollution," in the case of China increased emotion word diversity as well as complexity; "fresh" (a positive subjective emotion word) and "breathing" (a non-emotional word) coemerged frequently in both the US and China cases, showing remarkable aggregation structure and constituting a subcore.

Conclusions
(1) In this study, using Chinese social media data about the air quality of 195 tourism destinations, we performed a content analysis using the Gooseeker, ROST CM and BosonNLP tools and a sentiment analysis using the artificial neural network machine learning method on the Boson platform through Python programming; we confirmed that the social media data posted by tourists online can provide a new perspective and data source for the study of tourists' air quality perception at the micro, meso and macro levels, and the Chinese social media data-oriented artificial neural network application helps enrich the existing tourist emotion research methodology, while the findings with high reliability can provide implications for the management of the air pollution crisis in the tourism industry.
Based on these conclusions, the method of performing a sentiment analysis on social media data posted by tourists online can provide a new tool for air quality monitoring in tourist destinations; for example, the new tool can be used to some extent to evaluate the implementation outcomes of environmental protection policies.The Scheme for Building the Ecological Environment Monitoring Network issued by the State Council of the People's Republic of China in 2015 stated that the Ecological Environment Monitoring Network will be fully constructed by 2020.Our study found that tourists' air quality perception can largely reflect the air quality status of the area where the scenic spots are located.This finding indicates that the method can assist air quality monitoring in the background of economic activities in the whole of society, can be a vital component in the Ecological Environment Monitoring Network, and can be interconnected with other monitoring measures.In addition, for tourist destinations without an air quality monitoring station, air quality monitoring can be achieved using this method.Furthermore, a supervision network that covers major tourist destinations can be constructed to provide a reference for the formulation of air quality protection policies.
(2) The results of a content analysis showed that when investigating tourists' air quality perception, while paying attention to subjective emotion words, we should also include objective emotion words; when perceiving air quality, tourists in China used positive emotion words primarily and negative emotion words secondarily, and they exhibited a weak awareness of the air

Conclusions
(1) In this study, using Chinese social media data about the air quality of 195 tourism destinations, we performed a content analysis using the Gooseeker, ROST CM and BosonNLP tools and a sentiment analysis using the artificial neural network machine learning method on the Boson platform through Python programming; we confirmed that the social media data posted by tourists online can provide a new perspective and data source for the study of tourists' air quality perception at the micro, meso and macro levels, and the Chinese social media data-oriented artificial neural network application helps enrich the existing tourist emotion research methodology, while the findings with high reliability can provide implications for the management of the air pollution crisis in the tourism industry.
Based on these conclusions, the method of performing a sentiment analysis on social media data posted by tourists online can provide a new tool for air quality monitoring in tourist destinations; for example, the new tool can be used to some extent to evaluate the implementation outcomes of environmental protection policies.The Scheme for Building the Ecological Environment Monitoring Network issued by the State Council of the People's Republic of China in 2015 stated that the Ecological Environment Monitoring Network will be fully constructed by 2020.Our study found that tourists' air quality perception can largely reflect the air quality status of the area where the scenic spots are located.This finding indicates that the method can assist air quality monitoring in the background of economic activities in the whole of society, can be a vital component in the Ecological Environment Monitoring Network, and can be interconnected with other monitoring measures.In addition, for tourist destinations without an air quality monitoring station, air quality monitoring can be achieved using this method.Furthermore, a supervision network that covers major tourist destinations can be constructed to provide a reference for the formulation of air quality protection policies.
(2) The results of a content analysis showed that when investigating tourists' air quality perception, while paying attention to subjective emotion words, we should also include objective emotion words; when perceiving air quality, tourists in China used positive emotion words primarily and negative emotion words secondarily, and they exhibited a weak awareness of the air pollution crisis on the internet, which means that it is easier to find positive messages about air quality from the tourists' comments and difficult to find negative comments on air pollution.We compared word frequencies using the ROST CM, BosonNLP and Gooseeker tools and found that the word segmentation accuracies of the three tools were similarly high, while Gooseeker is more capable of processing large text datasets; these findings provide some references for future related research.The tourism industry requires a high quality air environment.Therefore, it is crucial to strengthen education about atmospheric environment protection, enhance tourists' consciousness of atmospheric environment crisis and responsibility, and help them practice air pollution control measures while enjoying tourism activities.
(3) In the sentiment analysis section, the results of ROST EA showed that Chinese tourists' comments with positive emotions on air quality accounted for 85.53% of the total comments, and it was confirmed that tourists' comments on environmental topics including eating, lodging, traveling, touring, shopping, and entertaining on social media were dominated by positive images; however, they were not absolutely positive.The sentiment value obtained through the artificial neural network was 0.786, which was increased slightly compared with that obtained through ROST EA and at the positive and high levels; temporally, it had the same direction of change to that of the actual air quality, which to some extent reflects the effectiveness of the atmospheric environment control by the Chinese government.Spatially, it exhibited the pattern of high in the south but low in the north that is in line with China's actual air pollution situation.The heterogeneity at the tourism destination level was significantly higher in rural areas than in urban areas, which is closely associated with the actual AQI measurements and profoundly lower than that of the US.Therefore, the comparison of the change directions and spatial patterns at the national macro level, the detection on the presence or absence of multiple regions that reflect the regional characteristics of air quality at the meso level, and the correlation analysis on sentiment value and AQI index at the micro level, as well as the word frequency analysis results, provide evidence for the feasibility of using social media data to conduct a sentiment analysis to examine air quality.

Discussion
(1) In terms of methodology, we tried to use the Chinese social media data-oriented artificial neural network method to conduct a sentiment analysis to understand tourists' perceptions of air quality, but this study is still at the exploration stage.In addition, various basic theoretical aspects, e.g., the definitions of objective emotion and nonsemotional words as well as the validity of conclusions, need to be strengthened.Moreover, in the future, the results obtained using the artificial neural network can be compared with those obtained using other Chinese artificial intelligence methods, such as the Baidu AI (Artificial Intelligence) and Tencent Wenzhi, to further enrich the emotional research methodology while cross-examining each other.The English version of the machine learning software can be introduced for processing data on social media such as Twitter to verify whether such data and methods can support the relevant studies, especially large-scale perception studies.Future studies will improve the algorithms of the support vector machine, naïve Bayesian, the artificial neural network, and deep learning to improve the recognition of emotions, including implicit emotions.
(2) In terms of data collection, the data acquired using "tourism destination name" and "air" as keywords could reflect the air quality of tourist destinations.However, they were not solely from tourists, and a small portion of the data were from residents and other stakeholders; this outcome is in conflict with the analytical perspective of tourists.When screening the data, we adopted the manual identification method, which is low in efficiency despite its high accuracy, and in the future, machine learning methods can be applied to synchronously denoise [70].In addition, the data collected in this study only contained text content and excluded data in other forms such as picture and video.In the future, these multisource data can be integrated to improve the reasonableness and accuracy of the study [19].In future studies, text data from travel websites and social media will be analyzed, and the pros and cons of different data types will be compared; in addition, different methods will be used to process different types of data such that the effect of the data and method matching can be better verified.
(3) In terms of result analysis, since the data from only a short period of time were acquired, the data-derived perception characteristics (especially the evolutionary patterns) are not comprehensive enough.In the future, the inclusion of data from longer time period or the addition of more spatial subjects can help draw more reasonable and complete conclusions.This descriptive investigation did not dig into details.For example, the sentiment characteristics and patterns in China's sentiment map of tourists' perceptions of air quality await further studies.Moreover, the underlying causes of certain phenomena were not comprehensively analyzed.For example, the underlying cause of the downward trend of the sentiment values from 2015 to 2016 was briefly mentioned, and there was a lack of in-depth analysis regarding the impact of the thin air on tourists' perception of air quality in Tibet.An analysis of the underlying mechanisms will deepen the explanatory studies on the basis of methodological studies and descriptive studies.
Author Contributions: The data analysis and writing of the article were done by Y.T.The data collection, method application and analysis were done by F.Z., and C.S.The results interpretation and English editing were done by Y.C.All authors have read and approved the final manuscript.

Figure 1 .
Figure 1.The numbers of Weibo comments on the air quality of Class 5A tourism destinations in 2011-2017.

Figure 1 .
Figure 1.The numbers of Weibo comments on the air quality of Class 5A tourism destinations in 2011-2017.

Figure 2 .
Figure 2. Sentiment values and the numbers of comments on air quality in 2012-2017.

Figure 2 .
Figure 2. Sentiment values and the numbers of comments on air quality in 2012-2017.

Figure 3 .
Figure 3. China's sentiment map of tourists' perceptions of air quality.Figure 3. China's sentiment map of tourists' perceptions of air quality.

Figure 3 .
Figure 3. China's sentiment map of tourists' perceptions of air quality.Figure 3. China's sentiment map of tourists' perceptions of air quality.

Figure 5 .
Figure 5. Top ten tourism destinations in terms of sentiment value.

Figure 6 .
Figure 6.Bottom ten tourism destinations in terms of sentiment value.

Figure 5 . 24 Figure 5 .
Figure 5. Top ten tourism destinations in terms of sentiment value.

Figure 6 .
Figure 6.Bottom ten tourism destinations in terms of sentiment value.

Figure 6 .
Figure 6.Bottom ten tourism destinations in terms of sentiment value.

Figure 7 .
Figure 7. Semantic network diagram of air quality of China's tourism destinations by Chinese tourists.

Figure 7 .
Figure 7. Semantic network diagram of air quality of China's tourism destinations by Chinese tourists.

Figure 8 .
Figure 8. Semantic network diagram of air quality of the US's tourism destinations by Chinese tourists.

Figure 8 .
Figure 8. Semantic network diagram of air quality of the US's tourism destinations by Chinese tourists.

Table 1 .
Confusion matrix of the results obtained for a general three-class classification problem.

Table 2 .
Tourists' high-frequency words about air quality perception for Heilongjiang and Shanghai sites and their deviations (first 50 words).Note: OB represents the actual frequency of each word recognized in the Word document composed of all comments on tourism sites in Heilongjiang and Shanghai by tourists; GO represents the deviation between the word segmentation results through Gooseeker and the actual frequency (the same below); RO represents the deviation of ROST CM; BO represents the deviation of BosonNLP; ** represents positive subjective emotion words; * represents positive objective emotion words; represents negative subjective emotion words; represents negative objective emotion words; and non-emotional words are unlabeled.