Reconsidering Tourism Destination Images by Exploring Similarities between Travelogue Texts and Photographs

: With the rise of user-generated content (UGC) and deep learning technology, more and more researchers construct and measure the tourism destination image (TDI) through online trave-logues. However, due to the impact of COVID-19 prevention and control, the number of online travelogues has decreased significantly and, therefore, the scientific validity of the TDI based only on text or photos has been questioned. This research fills a gap by comparing the differences be-tween visual and semantic images in terms of the overall image perception and image formation through natural language processing technology and image caption technology in obtaining TDIs, taking Tiantai County in Zhejiang Province of China as a case. Our results show that the texts and photographs shared major similarities in the overall TDI, but from the perspective of interest, they reflect differently. Therefore, when considering the data source selection for TDI research with a small number of travelogues, texts should be the main content, supplemented by photographs.


Introduction
With the popularization and maturity of the tourist market, the competition of tourist destinations in recent years is more reflected in image competition. Tourism is a unique visual experience [1]. The arrival of the COVID-19 epidemic disrupted people's normal lives, and the development of offline tourism was challenged by the restrictions of epidemic prevention. In the meantime, people's demand for travel is suppressed and pent up [2]. With the expansion of online community users, people's image perceptions of tourist places tend to rely heavily on travelogues comprising photos and text [3], and these UGCs (user-generated contents) tend to be more authentic [4], which brings new opportunities for the image building of tourist places [5][6][7][8] and the development of tourism in the post-epidemic era. The development and utilization of cultural tourism resources in China are still in a rough stage, and most cultural tourism products lack brand image building. However, as tourism continues to develop and progress, the importance of brand images will become increasingly prominent and will gradually become a priority in promoting the development of tourism. Brand image is an important concept in consumer behavior [9], and destination image has been the dominant area of tourism research [10]. As tourism services are intangible, images become more important than reality [11]. According to Urry's [12] theory of the tourist gaze, tourists are "directed to features of the landscape that, which separate them off from everyday experience. Such aspects are viewed because they are taken to be in some sense out of the ordinary". That is to say, tourists' expectations and experiences of tourism are not natural but constructed. The concept of branding is an important part of regional development, especially for the undeveloped regions that need to pay more attention to the construction of place branding through place marketing and place branding approaches to improve their competitiveness [13]. Ebrahimi et al. [14] surveyed users on the Telegram and Instagram social networks, and the effect of place branding on the place image shows a higher path coefficient.
Therefore, place branding needs to supplement the activities of the administration of the province to improve the place's image and attract tourists. Therefore, in order to appeal to potential tourists and achieve sustainability in tourism industry development, cities need to figure out their own tourism image comprehensively, thus, making the tourism destination image (TDI) research a hot topic for scholars, tourism managers, and destination marketing organizations (DMO).
Even if the collective image of a destination is generally hard to change at a population level [15,16], it could be dynamic and altered by others' experiences from the perspective of individuals [11]. In the era of Web 2.0, the Internet has become an important vehicle for transmitting information. As for tourism activities, sharing experiences online has become another important purpose of travel today. In addition to the traditional promotional materials that influence TDI construction, such as brochures and television advertisements [17][18][19], more and more people are enthusiastic about recording and sharing their travel experiences on social media in the form of reviews, travelogue texts, and photographs. The gaze of tourists on tourism destinations is mediated through the Internet and mobile technology, which makes the narratives of people who share travelogues online a rich resource of insight for creating better travel experiences, together with a new direction: researchers use these online data for the research of TDI. However, due to the influence of COVID-19 prevention and control policies, short-term travel and peripheral travel have become the characteristics of tourism in the post-epidemic era. The tourist gaze, as Urry clarifies, is finely coordinated with the psychological needs of consuming to maximize pleasurable experiences. Smith et al.'s study show that tourists realize a preconceived notion of the destination image [20]. Jutla compared tourists' and residents' image perception of the same tourist destination, Simla, and showed disagreement on the image perception of popular spots [21]. As the similarity between the surroundings and daily life is extremely high, the purpose of tourism has changed from exploring different signs (symbols) to different experiences. Therefore, the number of travelogues of destinations has dropped significantly in the post-epidemic era, and the scientific and comprehensive validity of TDI research, based only on text content, has been questioned. This is why some scholars use other elements in travelogues, such as geotagged photographs [22] and trajectory [23], as an effective supplement to the limited data source. Unfortunately, most scholars use one of the texts and photographs as data sources for TDI research; the differences and substitutions among them have not been researched yet.
With the popularization of Python and other coding techniques, interdisciplinary cooperation between computer and tourism research is inevitable. It is possible to identify the contents of UGC photographs based on deep learning technologies, such as computer vision and image captioning. However, the similarities and differences between TDI research based on this and current textual methods have not been confirmed. The purpose of this study is to compare the differences between visual and semantic images in TDI research by analyzing the travelogues uploaded to online travel platforms by tourists. More concretely, natural language processing (NLP) and image content analysis techniques are used to analyze the text and photographs in travelogues, respectively, to compare the heterogeneity and substitutability of TDIs. As for NLP, we summarize the hot topics based on the LDA (latent Dirichlet allocation) topic model; meanwhile, co-occurrence semantic network construction and adjective extraction are used to conduct social network and emotion analyses to explore the correlation of specific scenic spots. For image content analysis, the encoder-decoder framework with an attention mechanism is trained to describe the image content in the form of text. Then, natural language processing technology is used to analyze this.
Based on the above discussion, the main contributions are as follows:


We use NLP and image caption technology to analyze the destination image perception from texts and photographs as data sources in travelogues, respectively.  We compare the differences exhibited by the two forms of data, text, and image, in the perception of TDIs.  We propose a scientific basis for the selection of data sources and measurement methods for future TDI perception.

Theoretical Background
Since the 1970s, many scholars have focused their research on image formation [24], the influencing factors [25], and market research [26] on TDIs. Furthermore, all these analyses need to be based on scientifically rigorous measurement methods. Pike reviewed 142 papers from 1973 to 2000 about TDIs and found that 144 papers used structured techniques, while 63 papers used qualitative methods to operationalize the TDI construct [27]. In contrast, Echtner and Ritchie advocate for a combination of quantitative and qualitative approaches [28,29]. At present, the measurement methods of TDIs mainly focus on questionnaire surveying [30][31][32] and web text mining [33][34][35].
Among the existing studies, there is an increasing number of studies on the construction of tourism images of destinations based on tourism texts or images. Athena [36] explored the differences between "perceptual images" and "projected images" in eastern Taiwan through the analysis of image content, photographic materials, and textual materials. Marine-Roig et al. [37] compared the "projected image" of tourism administration documents and tourist guides with UGC on the Internet, using Catalonia as an example. Other scholars, such as Hunter [38], used semiotic analysis to explore the tourism photos and texts that represent tourists' perceptual images. We, in turn, will summarize and analyze the relevant literature in the following subsections.

Tourism Destination Image and Marketing Communication
The construction of local brands can contribute to the development of tourism and a regional economy. Additionally, the role of the brand and regional image shaping in the target population has also been fully confirmed [39]. Ashworth and Voogt [40] described the destination product as predominantly "a bundle" of services and experiences. Similar to most product and service brands, before having actual consumption, tourists develop a brand image of a destination [41] which can evoke their idea, belief, feeling, or attitude [42]. According to Reynolds [43], the brand image is constructed through significant details. To ensure the effectiveness of marketing communications, marketers must fully understand the inner structure of the image because the brand image of a destination, as perceived by tourists, influences their choice of destination and their willingness to travel [44]; interested parties in tourist places will use appropriate brand communication tools to promote and improve their brand image, thus attracting more visitors [45].
Brand communication plays a role in the marketing of a brand to convey value [46]. Gunn [47] believes that "destination brand image is the totality of what a person already knows or perceives about that destination from newspapers, radio and TV news, documentaries, periodicals, dramas, novels, and non-fictional books and classes on geography and history". With the development of the times, communication methods have been gradually enriched, and destination marketing methods have evolved from traditional paper-based promotion to multimedia promotion. With the development of new media, represented by self-media, the mechanism of forming the image of a destination has become more complex [48].
In the field of marketing communication, semiotics is an important element. The use of symbols to communicate information is very important. It is not a perfect copy of the displayed object but a symbolic expression of important features that establish a connection with the territory [49]. The "consumer", as a recipient of important information, obtains the destination image through the process of decoding, selecting, and, respectively, addition [50]. Using semiotics will allow the deconstructing of photos and texts in a reasonable way. For a region or city, the most common iconic symbols of the brand are, in some ways, important elements of the area, such as buildings, bridges, architecture, rivers, lakes, etc., which will be described or shown in text and photos. These symbols are an important source of information for the image perception of tourist places.

Conceptualizing Tourism Destination Image Formation
Signs and symbols play an important role in communication by helping humans convey meaning and understand the world. In the field of tourism studies, the symbols refer to an advertisement, while the interpreters are the potential tourists [51]. Generally speaking, the signs and symbols convey information and can be translated into natural language so that they can be mined by related texts. Song and Jeon [52] evaluated the slogans in terms of the semantic and morphological aspects of the texts to understand the construction of local governments in Korea. Meanwhile, according to Tresidder [53], photographs posted by marketers are received by members of the community, each interpreting, negotiating, and finding meaning in their personal sphere. Thus, the semiotic analysis of photographs displays what travel to a certain destination should look like [38]. Above all, starting from the notion of semiotics, identifying the unique content of textual and photographic aspects of a travelogue can reflect how a destination is constructed by not only marketers but also tourists based on their actual travel experiences [54].
The formation of an image has been described by Reynolds [43] as the development of a mental construct based on a few impressions chosen from a flood of information. Tourism promotion, as part of the image-building process, is not isolated; rather, it is interdependent with many other available sources of information that are often perceived as biased in nature [55], and also, it is dynamic [11]. Therefore, it is valuable to study TDIs from an information theory perspective. The mechanism of brand image formation is not a one-size-fits-all approach and is relatively complex. For example, destinations with a long history and cultural heritage are often more likely to have a positive destination brand image [47]. Morgan et al. [56] argue that brands have functional and symbolic image attributes, and these attributes are partly derived from the visitor's imagination of the destination. Studies on tourism destination image formation from the perceptive of tourists can be summarized as the following two aspects.
On the one hand, scholars focus on the factors affecting TDI formation. According to the general theoretical model of image-formation factors by Baloglu and McCleary [24], these factors can be summarized as stimulus factors (information sources, previous experience, and distribution) and personal factors (psychological and social) in the absence of actual visitation or previous experience. For example, Charkbarty and Sadhukhan's [57] study of the Tibetan region of Mount Gang Rinpoche reveals that believers from different backgrounds conceal their own spiritual narratives in the destination image, and the study highlights the role of sacred elements and geographical features in the image of the destination. Tasci et al. [58,59] demonstrate that in addition to age, gender, income, and other basic characteristics of tourists, familiarity through a previous visitation, ad exposure, media, and travel context are also important influencing factors.
On the other hand, scholars have investigated the process of TDI formation. From an information theory perspective, due to the limited breadth of absolute human judgment and immediate memory, there is a limit to the amount of information we can receive, process, and remember [60]. Tourists are also limited in their perception of the image of tourist places and have biases in information processing [61]. Gunn conceptualizes it as the seven steps of the travel experience: "accumulation, modification, decision, travel to destination, participation, return travel, and new accumulation." According to whether the source is non-commercial information, TDIs are classified as "organic" and "induced" images [62].
The studies of the influencing factors and processes of TDI formation provide important implications for strategic image management, thus helping in designing and implementing marketing programs for creating and enhancing TDIs [63]. Understanding the formation process of the TDI provides a solid theoretical basis for a better understanding of the role of text and pictures in information transmission.

Travelogues in TDI Research
Travelogues, in the form of textual and visual information, are "Covert Induced" tourism destination image-formation agents [64]. The content of numerous travelogues on the same destination could reflect tourists' overall preference and experience, thus serving as a scientific and important data source for TDI research [65]. Photographs, conveying the theme and story of a place, are the main carrier of visual information in a travelogue [66,67]. As for textual information, not only can it tap into richer travel-related topics and specific locations through text mining technologies, but it also covers abstract aspects, such as history and culture. That is to say, textual information supports more comprehensive descriptions of destinations than visual ones [64].
With the rise of social media that strongly influences the channels where people acquire tourism information today, tourists are increasingly involved in constructing the TDI and adding content based on their experience through social media options, such as sharing, commenting, and recommending places and activities to do [33]. For example, Wise and Farzin [34] showed that user-generated content (UGC) on the Facebook page "See you in Iran" has positively affected the willingness to visit Iran. Lin et al. [35] compared social media analytics and intercept surveys in TDIs. The results indicated that the survey data and social media data shared major similarities in the identified key photography phrases; however, the social media data revealed more diverse and specific aspects of the destination. In other words, in the Web 2.0 era, TDI research, based on the textual and visual information contained in UGC, is scientific and reliable and has become the mainstream method of TDI research today.

Research on Photograph-Based TDIs
Photographs, used as visual materials that convey the theme and story of a place [63,65], are an important source of data in tourism-related studies, which can motivate visitors to a destination. The research on the relationship between photography and tourism has always been a hot topic in academia. According to Urry [12], the practice of photography is closely related to the conditions of being a tourist and constitutes a self-reinforcing "closed circle of representation" in which travel photographs reflect and convey the TDI. Some scholars applied content analysis to the photographs from tourists and confirmed the existence of Urry's circle of representation [66][67][68]. Ryan and Cave [69] believe that values and emotions are given by humans in pictures. In this sense, photographs can be understood as a compression of the TDI [70], as their effect on people's memories and attitudes is more pronounced than other forms of information, such as texts and sounds [71]. Originally, visitor-employed photography (VEP), first used as a practical research technique in the early 1970s by Cherem and Traweek [72] and developed by Cherem and Driver [73] and Chenoweth [74], was one of the most common ways to capture photographs in tourism research, and it is used in the analysis of outdoor experiences and landscape preferences [75,76].
In the Web 2.0 era, the "Travel 2.0" phenomenon is catalyzed [77]. Online photographs have become one of the main information carriers of UGC and an important medium for tourists to perceive the topographical images of tourist destinations [78]. Tourists have the right to freely take photographs and upload them to the Internet [79]. The content of the photographs is exposed to the travel experience, such as landscapes, events, people, etc.; that is, the concretization and visualization of the tourist gaze, and can both reflect and inform the destination images [12]. In other words, an online photograph is an important carrier of tourist gaze in the Web 2.0 era and can be used as a data source for today's TDI research, thus offering larger sample sizes with greater objectivity at a lower cost.
At present, the analysis of photograph-based TDI research can be divided into two categories: the analysis of photograph information only, such as geographic analysis and metadata, to analyze tourists' temporal-spatial behavior; a comprehensive analysis combined with text content.
For the analysis of photo information only, scholars usually use maps [80] or photosharing sites as the object, such as Flickr [81] and Instagram [65,82], to analyze the spatialtemporal behavior of tourists through geographic location information, shooting time information, etc., and then summarize the tourists' perceived images of tourist destinations. Flickr's application programming interface (API), which provides user profile information, including the user's permanent location, is one of the primary data sources in photograph-based TDI research. Deng et al. [83] proposed a novel TDI measurement method based on a photo's metadata. They used large-scale metadata of user-generated photos to retrieve the TDIs of inbound tourists in Shanghai as an example based on a photo-metadata set from Flickr. This method has the advantages of easy access to data and uniform data formats but ignores the text information that contains visitors' emotions.
With the maturation of deep learning algorithms, such as the convolutional neural network (CNN) and recurrent neural network (RNN), image content can be analyzed efficiently and accurately, attracting more and more scholars to conduct text-and-photograph-based TDI research. Huang et al. [22] obtained pictorial data from the OTA, including Ctrip, Mafengwo, etc., and adopted exploratory research methods, such as image analysis, text analysis, and the IPA model to explore the representation and construction process of tourists' images of health tourism in Bama. Xiao et al. [84] identified the content of the tourist photographs by CNN and showed that it is possible for photographs to understand the TDI and to reveal the temporal and spatial heterogeneity of the image. This analysis method combines photograph data with text-based TDI research to address the shortcomings of small sample sizes in some niche destinations.
Overall, the perception analysis of the TDI by using photographs as a data source enriches the data dimension and measurement method of TDI research; however, the differences between the text-based and photograph-based analyses in TDI studies, how to use them in combination and the advantages of each, have rarely been analyzed by scholars. Hence, this research conducts TDI research on the textual and photograph contents in travelogues separately and analyzes the similarities and differences to provide a scientific basis for the future measurement methods of tourism destination image perceptions.

Destination Choice
This study conducts comparative research on travelogue texts and photographs. The selected research object was Tiantai County in the Zhejiang Province of China, which has been selected as one of the top 100 counties in National County Tourism Competitiveness for four consecutive years. As the opening place of Xu Xiake's travels, Tiantai has beautiful scenery and popular religious culture. It is renowned, at home and abroad, for Tiantai Mountain, a 5A tourist scenic spot. The scenic spot of Tiantai Mountain comprises 13 scenic spots, including the Guoqing Scenic area, Chicheng Scenic area, and Huading Scenic area. In addition to the magnificent scenery and fresh air, Guoqing temple, known as the "source of Buddhism and Taoism", attracts many tourists every year to visit and pray. In recent years, the government of Tiantai has incorporated the development of tourism into the "modernization and harmony of the city" strategy, focusing on high-quality development. It can be seen that the Tiantai County government attaches great importance to the planning of tourism resources. Unfortunately, no relevant literature was retrieved from CNKI about Tiantai, and no researchers have conducted research on the TDI of Tiantai based on either online texts or photographs.
For the government of Tiantai, a comprehensive analysis of the TDI, in a fast and scientific way, is conducive to formulating appropriate marketing plans and promoting sustainable development and common prosperity for the whole tourism area. As a non-Internet famous tourist county, the number of related travelogues on the Internet is small, which will influence the design of subsequent marketing plans. That is to say, rapid and accurate TDI construction for the Tiantai government is necessary and imminent. Above all, we chose Tiantai, which has insufficient travelogues and little TDI research, as the object of this research.

Data Acquisition and Preprocessing
This research selects the travelogues on Ctrip Travel (the largest online travel platform in China) as the data source. With the help of the Python packages, 414 travelogues were collected. After that, 61 duplicate or priceless travelogues were removed. Finally, we obtained 353 travelogues, with a Chinese word count of over 100,000 and 14,710 photographs. Figure 1 shows photographs of the representative spots of Tiantai County. Data preprocessing consists mainly of Chinese word segmentation, POS tagging, destination extraction, the removal of stop words, and the integration of synonyms. For Chinese word segmentation and POS tagging, we made use of Jieba, one of the most popular Python packages for Chinese natural language processing. Meanwhile, in order to solve the problem that attractions, hotels, etc., should not be separated during the word segmentation, we extracted the words with internal links in the travelogue to build a dictionary of proper nouns and loaded them into the Jieba word segmentation package after duplication. We also formed the initial stop words list in accordance with the Baidu stop words list and Chinese stop words list. Then, according to the results of the word frequency analysis, those meaningless high-frequency words identified were added to the final stop words list. The last step was to integrate words with similar meanings so as to complete the semantic deduplication and integration. After the above data preprocessing, noise data interference was reduced.

Image Caption Generation
Image caption generation, which refers to the understanding of image contents by machines and then generating corresponding natural language description texts, is one of the research hotspots in the field of image processing. The emergence of this multimodal image semantic technology, which represents the image content in the form of text, has important implications in the areas of life assistance for visually impaired people and medical CT-image reporting [85,86].
In recent years, with the development of deep learning techniques, neural networks have been widely used in the field of natural language processing (NLP). Similar to machine translation, deep learning methods can learn the mapping of image-to-text descriptions from large amounts of data to improve the image description. Generally, Image Caption extracts the image features by convolutional neural networks (CNN) and then translates them into human-understandable natural language descriptions using recurrent neural networks (RNN). Therefore, Image Caption mostly adopts the seq2seq learning method and the encoder-decoder framework with CNN-RNN as the basic model (Figure 2). To save on training time, the encoder uses migration learning to load the pretrained model and trains the model parameters for its own task on this basis. However, the machine interpretation of the images based only on the encoder-decoder model is usually disturbed by information, such as its own background, which makes the text description biased and less effective. To solve this problem, the attention mechanism is usually introduced in the decoder to focus on different parts of the image when generating different words ( Figure 3). The decoder keeps the intermediate output utterances of the RNN and uses the attention mechanism first when generating new words to obtain the weights of the feature vectors. The next word is jointly predicted by the previously predicted utterance and the new weights. Wang et al. [87] used the encoder-decoder framework and introduced a three-layer top-down multiple attention mechanism, which performs well in both datasets. In the field of image captioning, common datasets include the Microsoft COCO (MCOCO) caption dataset, the Flicker dataset, and the AI challenge Chinese dataset.
The MSCOCO caption dataset includes 330,000 images and 1.5 million corresponding text descriptions, most of which are complex daily scenes. The images of the Flickr dataset are from the photo album website Flickr of Yahoo. Most of them show scenes of human participation in a certain activity, but the number is far less than that of the MSCOCO caption. The AI challenge builds a database of the Chinese descriptions of images. The training set includes 210,000 images and Chinese descriptions, and the validation set includes 30,000 images and their Chinese descriptions-the image and description text in the dataset need to be preprocessed before training. The image preprocessing is relatively simple, based on migration learning; that is, the images are fed into ResNet, and the output of the specified layer is obtained and saved. Furthermore, text preprocessing is relatively troublesome. Taking the Chinese text as an example, it needs to complete the steps of word segmentation, filtering low-frequency words, and completing the description to equal length.
PyTorch is a new machine learning toolkit released by Facebook in early 2017 for Python. It has the advantages of GPU acceleration and is easy to get started. It is one of the most popular deep learning frameworks in recent years. Image Caption needs to complete the training of huge amounts of data. PyTorch's GPU acceleration can significantly reduce the training time, so it is widely used. Due to the hardware and other constraints, the complete training of high-accuracy models in PyTorch requires a lot of resources and time. At the same time, the threshold of a self-training model is high for researchers of other disciplines who have weak computer programming skills but need a large number of image analyses. For these reasons, some people train the Image Caption models in advance and provide interfaces [88], making it more convenient for Image Caption to be used in research.

Natural Language Processing (NLP) Technology
Both the textual content of travelogues and the textual description of Image Caption need to rely on NLP techniques. Many researchers at home and abroad have conducted diverse studies on TDIs through various NLP techniques.
Generally, content analysis is used to analyze and mine the text, and the TDI is obtained from three aspects: emotion, high-frequency vocabulary, and co-occurrence semantics. High-frequency words complete the word frequency statistics by building a dictionary of stop words and proper nouns. The construction of a co-occurrence semantic network requires the construction of a network graph based on the number of common occurrences. Emotion analysis can be done by using self-training models or Python packages with emotion analysis functions, such as SnowNLP. In addition, we can also quickly and accurately extract high-frequency words, analyze emotions, and analyze the co-occurrence of Chinese text with the help of the ROST software developed by Wuhan University [89].
Since the content of travelogue texts is usually long, some researchers use the textual topic analysis methods of the word frequency-inverse document frequency (TF-IDF) and latent Dirichlet distribution (LDA) for keyword and topic extraction to identify the TDI [90]. While the descriptions of Image Caption are mostly short sentences text, the research of TDI can be carried out by identifying the nouns and adjectives. As the MSCOCO caption dataset outputs the text descriptions in English, and the AI challenge is in Chinese, it can be quickly and accurately analyzed based on Python's NLP package, NLTK, and Jieba, respectively.

LDA Topic Model
Late Dirichlet allocation (LDA) is a document topic generation model, also known as a three-layer Bayesian probability model, containing three layers of a word, topic, and document structure, and is one of the most representative text-topic mining methods [91]. In other words, we considered that each word in an article is obtained by the process of choosing a topic with a certain probability and by choosing a word from that topic with a certain probability. The probability distribution function of the LDA model is shown in Formula (1), where w is for word, D is for document, and T is for the topic. P(w|d) = P(w|t) × P(t|d), The identification of a TDI depends on the result of the LDA topic classification. To ensure that there was no overlap between topics with as many topics as possible, we visualized the LDA in advance using the Gensim package in pyLDAvis. According to the visualization results, the number of topics with a good classification effect was determined, and the final LDA topic model was established. Using the LDA model, we can obtain the probability distribution of the topic feature words so as to analyze the semantic characteristics of the feature words under each topic. The theme name is artificially induced, which is the perception dimension of the TDI.

Social Networks Analysis
It is difficult to find the relationship between the topics and the emotional tendency of the respondents by only relying on topic mining. Therefore, based on the application of the probabilistic topic model, this research further constructs social networks with the help of social network analysis. Specifically, the feature words are sorted, and then the top feature words in each perceptual dimension are formed into a feature word dictionary. After that, the co-occurrence matrix of feature words is calculated. The co-occurrence matrix is used to count between two feature words so as to describe the intimacy between them. After the construction process of the co-occurrence matrix, the social network analysis tools, UCINET and Netdraw, are used to intuitively show the feature word social network relationship of the TDI.

Results and Analysis
According to Urry [12], tourism symbolizes the tourism resources and the process of constructing tourist-gaze objects. Burgess and Wood [92] have deciphered the advertising messages promoting the territory of the London Docklands by identifying the three types of signs (symbols) contained in the ads, namely iconic signs, indexical signs, and symbolic signs. Matlovičová et al. [49,93] took Prague as an example to analyze the city's brand image from the perspective of such three signs. In order to ascertain the signs of a tourism destination, thus identifying its image, our research analyzes the TDI of Tiantai from the perspectives of the overall image and image formation. Figure 4 reflects the changes in the number of travelogues, the average number of photographs per travelogue and the emotional tendency of Tiantai. The number of travelogues published can reflect the hotness of the destination, while the emotional tendency can reflect the tourists' emotions. Photographs are the product of tourists' photographic behavior in travel behavior, and the content can reflect the focus of the tourists' attention [94]. In other words, the higher the number of photographs of a certain scene, the higher the interest and freshness of tourists to that type of scene.

Analysis of Overall Perception
From the year of publication, the number of annual travelogues and the average number of photographs have been on the rise since 2012, reaching a peak in 2017. In the following 2 years, the number showed a significant downward trend. In 2020, affected by the COVID-19 epidemic, the epidemic prevention policies, such as "not leaving the province" and "home isolation", placed China's tourism industry in a predicament. However, since 2020, the number of annual travelogues in Tiantai has increased significantly, while the increase in the average number of photographs is not significant. This indicates that the post-epidemic era has brought more traffic and attention to Tiantai, and the demand for local and peripheral tours will grow further. However, due to convenient transportation, local tourists have gazed at some famous scenic spots many times with a less fresh gaze before visiting.
From the perspective of seasonal changes, the peak of tourism in Tiantai mainly occurs in the spring and summer. The month with the highest heat is October, while the month with the highest interest and freshness is August. However, the emotional tendency is more positive in winter. Compared with the summer vacation in August, the National Day holiday in October is short. The tourists in Tiantai are mainly from the surrounding urban areas; that is, there are more peripheral tourists. This also explains why the average number of photographs in the travelogues is not enough, despite the high heat. In winter, most of the trips occur during the Spring Festival holidays, and the scenic spots often plan special activities to render the festive atmosphere. During the Spring Festival, most of them are family travelers. As middle-aged and elder tourists are not used to editing and publishing travelogues online, it is not reflected in the total number of travelogues. However, from the perspective of emotional tendency, it is more positive, indicating that the activities to celebrate the festival also affect the TDI of Tiantai.  In summary, the changes in the number of travelogues, text emotion, and the average number of photographs per travelogue are not entirely consistent, reflecting the overall TDI from different aspects. On the whole, and affected by the influence of the COVID-19 epidemic prevention and control measures, peripheral tourism brings an outbreak of interest to Tiantai tourism; however, the freshness and interest are not as high as that of distant tourists. In the context of regular epidemic prevention and control, Tiantai needs to consider how to launch its activities to create unique signs and atmosphere, thus constructing a Tiantai tourism destination image different from the daily perceptions of local people.

Analysis of Image Formation
In order to ascertain the image formation and specific tourism signs of Tiantai, we conducted content analyses of the text and photographs in the travelogues, respectively: photograph-based research counts the photo content descriptions in each travelogue by accessing the trained image caption interface and treats the high-frequency descriptions as image formations; text-based research extracts high-frequency adjectives through the Python package Jieba, as well as the LDA topic model and social network analysis to further conduct a comprehensive analysis of the image formation. The analysis results display the top ten feature words according to word frequency and photograph deceptions, from high to low, as shown in Table 1.
The top ten adjectives with the highest frequency describe the "nice", "spectacular", "fresh", and "clear" of the scenery, the "clean" and "comfortable" related to accommodation, and "famous" for describing general popularity. In contrast, the high-frequency sentences repeated after the Image Caption of the travelogue photographs are mainly atmospheric scenery descriptions and a few sightseeing derivative activities, such as food, clothing, housing, and transportation. High-word frequency adjectives can intuitively reflect TDIs; however, the semantic association among words is not considered. Thus, the problem that several high-frequency words originated from the same topic is not considered. By processing the colinear information of words through the LDA topic model, the topic vector containing the topic probabilities and corresponding keywords can be calculated, and the topic clustering results can be derived to reflect the TDI. Based on the visualization results in pyLDAvis, we built the LDA theme model with no overlap and good independence for K = 4. Each topic's top ten feature words were selected as representatives (Table 2). We then analyzed the semantic features of the feature words, manually identifying and summarizing the four types of image formation.
Topic1 is the iconic signs that account for 51%, dominated by the small scenic spots in the Tiantai mountain scenic area, including landscape themes, such as "Shilang Waterfall" and "Huading Mountain", and religious themes, such as "Guocheng Temple" and "Jigong's former residence". Further, the noted 22.4% of Topic2 is the theme of the tourderived activities that form the indexical signs, including "tickets", "highway", "hotel", and other feature words of clothing, food, housing, and transportation. This shows that, while improving the tourist experience of scenic spots, we should also pay attention to the experiences of supporting facilities. Both Topic3 and Topic4 are part of the symbolic signs, indicating the connection with Tiantai tourism indirectly. Topic3 is the theme of the sightseeing scenes, with keywords such as "waterfall", "tea leaves", "canyon", etc. Topic4, which accounts for the least, is the theme of the surrounding attractions. Most of the feature words are famous scenic spots around Tiantai, such as Shepan Island, Sanmen, Taizhou City, Ziyang Street, Linhai, etc. As for the iconic signs, such as the specific scenic spots, we used social network analysis to study the hotness and their interactive relationships. Since the textual descriptions of images, such as "waterfall" and "mountain", are vague, it is impossible to analyze them based on photographs. In other words, we needed to analyze the hotness and co-occurrence mechanism of the scenic spots based on the texts with internal links.
The most commonly perceived iconic sign that differentiates Tiantai from other similar places is Tiantai Mountain, which is also considered famous, at home and abroad, for its "origin of Buddhism and Taoism, and beautiful scenery". Among them, the high-popularity scenic spots (Table 3) revolve around the three themes of "religion", "harmony culture", and "tea culture". "Qiongtai Fairy Valley", "Shilang Waterfall", and "Chicheng Mountain" are the most popular scenic spots for the movie and TV drama, Fuyao. In the co-occurrence network diagram of scenic tourist spots in Tiantai ( Figure 5), a node represents a high-frequency scenic spot mentioned in the travelogue; blue-colored nodes represent the local scenic spots in Tiantai; red-colored nodes represent other cities and their scenic tourism spots. The lines between the nodes represent the interactions between scenic spots, and the thickness indicates the strength of the interaction. Tiantai's tourism revolves around the mountain and water line of "Chicheng Mountain-Qiongtai Fairy Valley-Shilang Waterfall" and the religious line of "Guoqing Temple-Huading Temple-Jigong's former residence". Meanwhile, the "Shenxianju Scenic Spot" in Xianju County and the "Taizhou Fucheng cultural tourism area" in Linhai County, as 5A scenic spots around the Tiantai Mountain scenic spot, do not have a strong interaction.  Through comparison, it was found that the image formation of Tiantai, obtained from the two data sources, showed relative consistency; that is, iconic signs, such as the magnificent landscape sceneries of Tiantai Mountain and Qiongtai Fairy Valley, leave the deepest impression on tourists. Meanwhile, tourism is a comprehensive socio-cultural activity integrating food, accommodation, transportation, travel, shopping, and entertainment. Besides the iconic signs, tourists also attach great importance to the indexical signs, including food, accommodation, and transportation. The difference is, however, that the photograph-based TDI is more clearly directed; that is, the magnificent landscape scenery should be visualized as waterfalls and mountains. Through the extraction of high-frequency adjectives in the text, we can extract more dimensional perceptual images. For example, "famous" can reflect the TDI in more detail from the aspect of popularity, but they lack the subject, and the directivity is not clear enough. The LDA model can further reflect the specific signs of tourism in Tiantai, but it is mainly nouns which need to be combined with the extracted adjectives for a comprehensive analysis. Moreover, textbased TDI research can further provide the popularity and interaction relationship of specific attractions.

Discussion
The applicability and scientific combining of texts and photographs in TDI research is one of the objectives of this study. Compared with previous studies [35,95,96], the innovation of this research method is to replace natural intelligence analysis with artificial intelligence analysis, which facilitates objectivity of analysis and time saving when processing a large number of photos. Specifically, combined with natural language processing technology and deep learning technology, this study carries out social network analysis and content analysis based on the LDA topic model and image caption generation and, therefore, compares the heterogeneity and substitutability between text-based and photograph-based TDI research from an overall perception and image formation. It helps to provide scientific support for expanding the TDI research data sources, which are meaningful when promoting tourism marketing and enhancing niche destination attractiveness.
Meanwhile, our research is designed based on the theories of past scholars. Garrod [66] explores the relationship between TDIs and tourist photographs, which is one of the key theoretical foundations of this study. Garrod combines visitor-employed photography (VEP) with content analysis and quantitative statistical techniques to validate Urry's "a closed circle of tourism representation", in which the imagery used by the tourism industry to attract tourists is the object of the tourist's gaze and thus, the subject of the tourist's photography. As photography can reflect the image of a destination as well as text, it is meaningful for us to compare the similarities and differences between the texts and photography in TDI research. Wise and Farzin [34] used the user reviews of 'See You in Iran' on Facebook as the subjects to evaluate the UGC in terms of authentic inquiry (the need for unknown insight into a new awareness), authentic encounter (through relationships, connections, communities, and belonging), and authentic production (based on feelings, emotions, and sensations) to explore the tourism image. This perspective is one of the important references for us to consider when analyzing the UGC content in Ctrip. Smith et al. [97] used Blackberry technology to analyze the experiences and photographic images recorded by a group of students at various stages of their travels to examine how the image of the destination changed throughout the travel experience. On the basis of these theories, this study explores the text and photographs in online travelogues, data that have different connotations in tourism semiotics, and TDIs and tourism marketing.
Artificial intelligence technology provides abundant evidence and information for revealing the differences between text-based and photograph-based research in the tourists' perceptions and behavioral preferences for tourism destinations from the perspective of semiotics. The main discussion points are as follows: First, the overall perception of Tiantai, based on texts and photographs, shares major similarities, and its mixed use can highlight key points. When using more than one type of data to analyze the same issue, there are overlapping parts, and these are the aspects that need to be focused on for TDIs. Whether based on texts or photographs, the image formation shows that the iconic signs, such as Tiantai's magnificent scenery and Buddhist culture, leave the deepest impression on tourists. This is different from the city image of Prague, where the association of the city was more about the architectural features and buildings [93], while Tiantai County is considered more attractive for its natural beauty and culture. Meanwhile, within the context of tourism, the TDI is usually related to the perception of whether a destination has enough available resources to ensure tourists' comfort and safety [98] or the perception of whether it is more or less friendly, accessible, overcrowded, etc. [99]. Furthermore, the effective image is usually a subjective and perceptual response to the cognitive knowledge of a tourist destination. In other words, the indexical signs of tourism, such as food, accommodation, and transportation, are also very important for an image's construction. What is a little different, however, is that the TDI based on photographs mainly covers indexical signs, such as waterfalls and mountains, which cannot be as specific as the text-based research on iconic signs. Through the extraction of adjectives in the text content and the construction of the LDA topic model, the research on the TDI has been carried out from two aspects: the emotional tendency and theme clustering, and the destination image analyzed is more diverse. In other words, the photo-graph-based TDI research is reliable and representative. It can validate the textbased TDI research and can also be used as a supplementary data source when the sample size is too small.
Second, different data sources reflect different aspects of the TDI, and all of them are important components of the overall perception. For the overall perception, the number of travelogues released can reflect the popularity of the destination, the emotional tendency of text content can reflect the emotional situation of tourists, and the number of photographs can reflect the interest level of tourists. Meanwhile, the statistics and the arrangement of the number of texts and the average number of photographs per travelogue are simple and easy to organize, and the artificial intelligence technology, such as the sentiment of the text's content and LDA topic analysis, can be done efficiently and with high quality when based on a Python package. Therefore, in the future, the data from texts and photographs can be comprehensively used when measuring the overall perception, which not only ensures the authenticity, reliability, and integrity of the research results but also reduces the analysis time and research costs.
Based on the above analyses of the tourism image, we suggest that, although photography can be a good reflection of the tourist gaze, it is not enough to only rely on its content for TDI research. Meanwhile, for the selection of data sources for TDI research with a small number of travelogues, texts should be the main source, supplemented by photographs. Taking Tiantai as an example, even if the description of the photograph content can identify the landscape of "waterfall" and "mountain", it is almost impossible to identify further and distinguish whether it is Tiantai Mountain or ChiCheng Mountain. On the contrary, text-based TDI research can reflect-in a more fine-grained way-the multiple perspectives, such as hotness and co-occurrence. In other words, indexical signs can be revealed by both texts and photographs, but iconic signs can only be recognized through texts. Furthermore, image caption generation cannot reveal the specific scenic spots from photographs; therefore, future research on the TDI should focus on the textual content supplemented by photographs to verify and supplement the textual analysis results.
According to Urry's tourist gaze theory [12], the tourism economy and culture have become inseparable from photography: tourists gaze at the landscape through photographs, influencing potential tourists' image perceptions of that destination through social media. However, the differences between text-and photograph-based tourism destination image research have not been studied yet. This study not only compares the differences exhibited by the two forms of data but also provides a new big data-based methodological framework for the comprehensive perception of the tourism image. The theoretical implications are as follows: (1) Unlike the TDI research conducted by Cherem et al. [72,73,84], which was based on mere text or photograph data sources, our research focuses on both the textual and visual information of UGC travelogues to present a framework of summarizing TDI, thus enriching the data sources and methods. Similar to the content analysis methods by mining the user-generated textual and visual information of Pang et al. [33,55,99], our research considers two types of data in travelogues, namely texts and photographs, and compares their similarities and differences in TDI research. Somewhat differently, we used a machine learning approach and image caption technology to describe the content of a photograph in natural language, allowing for a more efficient and objective analysis of larger volumes of photographic data. The similarities and differences of these data sources in the content analysis, from the perspective of semiotics, show the effectiveness and science of the proposed TDI summarization framework for tourism destination image mining and marketing. (2) Meanwhile, as cities imposed COVID-19-related quarantine measures, long-distance travel came to a complete halt. Many scholars examined the tourists' perception of the destination image while stranded during the COVID-19 pandemic and analyzed the factors influencing the destination image and visit intention in the post-COVID-19 crisis recovery [100][101][102]. However, from the perspective of methods, traditional questionnaire surveys and expert interviews are still the mainstream. To fill this gap, we compared the heterogeneity and substitutability between text-based and photograph-based TDI research through NLP and image caption technologies, taking the niche destination Tiantai as an example. Hence, scholars can examine and construct TDIs during and after the COVID-19 pandemic with more objective and comprehensive methods.
In addition, based on the current state of tourism development during the epidemic, the findings of this paper have practical implications for scenic spots and related enterprises.
(1) In order to find their own brand image positioning, scenic spot operators and other relevant parties need to make reasonable use of UGC and other data sources through more effective data processing techniques. After summarizing the brand image perception of a location based on the results, especially image formation, it shows that the iconic signs, such as Tiantai's magnificent scenery and Buddhist culture, leave the deepest impression on tourists. Thus, marketing suggestions can be given to the relevant parties of the tourism location to improve the image of the tourism destination and further improve the willingness to travel: when branding in the local area, according to the current TDI of the public, the consistency and continuity of the brand image of the tourism destination should be kept [103]. (2) On the basis of respecting tradition and inheriting the original brand image, it is more important to create a new destination brand image. That is, developing and discovering potentially popular future brand attributes based on our research findings. The homogenization of the Chinese tourism market is serious, which will make tourist destinations less competitive. The changing times and the changing characteristics of the visitor group will bring new meanings to the destination brand image. It is especially important to learn from the successful experiences of other tourism destinations' development and find individual competitive advantages.
From the government's perspective, this study has policy implications.
(1) Government should pay more attention to the "peripheral tourism" market and build the associated infrastructure. In the era of big data, we should not only consider the resources and cultural characteristics of the tourist destination itself but also fully consider the tourists' perception of the destination [90]. From the results of the study, we fully affirm the importance of tourists' perceptions of TDIs. However, from an objective point of view, we cannot deny that, in the context of the epidemic, tourists will be more sensitive to the distance of the destination, the choice of transportation, and the decision about the length of stay. Self-driving and public transportation have replaced traditional tourist trunk transportation in the peripheral tour, and the role of tour guides will basically fade out of the peripheral tour market with the change in the high frequency and lifestyle of consumers' peripheral tours. At the same time, the accommodation factor has become the most important issue for peripheral tourism users. There will be a huge potential for precise guidance on food and beverages, tickets, entertainment, and leisure items based on accommodation. To revitalize the tourism industry, governments and stakeholders should meet the needs of "customers" based on the image of the destination. (2) It is significant for the government to play a role in precise marketing promotion and, thus, help build TDIs. For the government, tourism activities are mostly planned for specific scenic spots; thus, developing a five-stage location marketing and digital marketing strategy to enhance the overall competitiveness of local tourism will be effective [104,105].

Conclusions
Combined with the natural language processing (NLP) and image caption technology, this study compares the differences in TDIs between travelogue texts and photographs by carrying out social network analyses as well as LDA (latent Dirichlet allocation) and looks into the practical case of tourism destination image perception in Tiantai, a Taizhou city in Zhejiang Province.
Specifically, the Python packages were used to crawl the travelogues posted on Ctrip Travel, including the initial heading, text contents, and photographs in Tiantai. Then, the different data sources were compared from the perspective of semiotics, including the overall perception and image formation. As for the overall perception, the quantitative characteristics displayed by both texts and photographs at different time points can be summarized efficiently and conveniently from a macro perspective based on data analysis tools, such as Python. Then, from the perspective of semiotics, image formation can be analyzed more specifically with three types of tourism signs, where high-frequency words in texts are counted, and the LDA theme model is established. Meanwhile, the high-frequency image description in the photographs is counted. These methods can well-identify indexical signs as well as symbolic signs. Even if text-and photograph-based indexical and symbolic signs have a high similarity, the text is necessary for iconic signs-frequency analysis and social network analysis for specific scenic spots should be employed. As for the image caption technology, a deep learning model with an attention mechanism based on a large amount of sample data can be trained, and the photographs' content recognition can also be realized quickly by using the Image Caption model trained by others through the interface [88]. As for the other analysis methods, such as LDA theme analysis, they can basically be implemented quickly through the Python packages. As short-term and peripheral travel have become characteristic of tourism today due to the effect of COVID-19, reconstructing the image of those tourist destinations familiar to tourists, based on photographs and texts, will also become a hot topic for future TDI research. However, traditional questionnaire surveys and expert interviews are still the mainstream [101,[106][107][108]. Our method and framework of content analyses in this study provide a scientific basis for selecting data sources.
Above all, our research compares and analyzes TDIs through two data sources, travelogue texts and photographs, to verify their scientific representativeness. Our contribution is summarized as follows.
First, from a theoretical point of view, this study mainly expands the research methodology of TDI perception and provides theoretical implications for the effectiveness of marketing communication. Second, from a practical point of view, our results also contribute to the construction of an effective TDI for scenic spots and related companies. Finally, from a policy perspective, the findings also provide the government with relevant recommendations for revitalizing the tourism industry, especially in the context of the epidemic. In general, our study is meaningful for future research and tourism development.

Limitation and Future Work
Despite its contributions, this study is not without limitations. First, as for the description sentences after the image caption, this study mainly uses frequency statistics combined with literature analysis, which is highly subjective. Referring to Veronika et al. [109], unsupervised machine learning methods such as clustering can be considered for photo classification in the future, making the analysis results more objective. Moreover, with the rise of life-sharing social platforms such as Weibo and TikTok, the travelogue has now morphed into vlogs that articulate the many aspects of tourism, such as travel activities, accommodations, food, and adventures, which were once limited to static content [110]. Technological advancements make it possible for tourists to produce vlogs more efficiently, which has encouraged more and more travelers to record their travel experiences in the form of videos and share them on the Internet [111]. That is to say, the research based on the deep learning model, which analyzes the content of videos and identifies signs so as to examine and construct the TDI, will become hot research in the future.
In the future, with the development of deep learning and big data technology, the accurate recognition of massive photograph contents through artificial intelligence will continue to deepen the research of TDIs. Therefore, we plan to explore the tourism image from more perspectives: we will analyze the contents of photographs in a more specific way, such as identifying the gender and number of people [112][113][114] appearing in the photos, and combining the recognition of the photograph content with a geographical location for analysis, etc. Meanwhile, the tourism industry is at a low ebb due to COVID-19.
Furthermore, cutting-edge technologies, such as virtual technologies, artificial intelligence (AI), 5G technologies and robotic automation technologies, could benefit the tourism industry [115]. As for us, how to perceive the image of new concept tourism destinations, such as virtual tours [116], will become one of our future research directions. Additionally, it is expected that we analyze the new mode of online travelogues, such as a vlog, etc. This way, the number of research samples can be further expanded, and it is more suitable for today's people's entertainment mode of watching short videos in their leisure time.