Next Article in Journal
Sustainable Tourism and Regional Development Through Innovation in the Post-COVID-19 Era: The Case of Hersonissos and Chios
Previous Article in Journal
The Profile of Wine Tourists and the Factors Affecting Their Wine-Related Attitudes: The Case of Türkiye
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Chinese Tourist Motivations for Hokkaido, Japan: A Hybrid Approach Using Transformer Models and Statistical Methods

1
Department of Computer Science, Kitami Institute of Technology, Kitami 090-8507, Hokkaido, Japan
2
Department of Administrative Studies, Prefectural University of Kumamoto, Kumamoto 862-0920, Kyushu, Japan
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Tour. Hosp. 2025, 6(3), 133; https://doi.org/10.3390/tourhosp6030133
Submission received: 9 May 2025 / Revised: 11 June 2025 / Accepted: 20 June 2025 / Published: 11 July 2025

Abstract

The COVID-19 pandemic severely impacted Japan’s inbound tourism, but recent recovery trends highlight the growing importance of Chinese tourists. Understanding their motivations is crucial for revitalizing the industry. Building on our previous framework, this study applies Transformer-based natural language processing (NLP) models and principal component analysis (PCA) to analyze large-scale user-generated content (UGC) and identify key motivational factors influencing Chinese tourists’ visits to Hokkaido. Traditional survey-based approaches to tourism motivation research often suffer from response biases and small sample sizes. In contrast, we leverage a pre-trained Transformer model, RoBERTa, to score motivational factors like self-expansion, excitement, and cultural observation. PCA is subsequently used to extract the most significant factors across different destinations. Findings indicate that Chinese tourists are primarily drawn to Hokkaido’s natural scenery and cultural experiences, and the differences in these factors by season. While the model effectively aligns with manual scoring, it shows limitations in capturing more abstract motivations such as excitement and self-expansion. This research advances tourism analytics by applying AI-driven methodologies, offering practical insights for destination marketing and management. Future work can extend this approach to other regions and cross-cultural contexts, further enhancing AI’s role in understanding evolving traveler preferences.

1. Introduction

The COVID-19 pandemic brought significant disruptions to the global tourism industry, with Japan experiencing a dramatic decline in inbound visitors due to prolonged travel restrictions and public health concerns (UN Tourism, 2023). However, as borders reopened and international travel gradually resumed, Japan has witnessed a strong resurgence in tourism, particularly from China, which has historically been one of its largest and most influential tourist markets (JNTO, 2024). Within Japan, the Hokkaido region stands out as a major attraction, known for its natural beauty, seasonal festivals, hot springs, and distinct cultural offerings. As competition intensifies in the global tourism market, understanding what motivates Chinese tourists to visit specific destinations in Hokkaido has become a critical challenge for policymakers, local governments, and businesses seeking to revitalize and sustain the tourism sector.
Following the post-pandemic recovery, Japan has experienced a surge in inbound tourism, with Chinese visitors once again making up a significant portion of the international traveler population (The Japan Times, 2024). However, this growth has not been evenly distributed across the country. Government data released in mid-2024 revealed that tourism in Japan’s major metropolitan areas, namely Tokyo, Osaka, and Aichi, has seen notable growth since 2019, while rural regions have suffered a marked decline in visitor numbers (Japan Tourism Agency, 2024). This widening urban–rural divide underscores the critical need to better understand what drives tourists to explore less-traveled destinations like Hokkaido. Given the importance of enhancing destination marketing strategies and improving visitor experiences, a deeper understanding of Chinese tourists’ motivations for traveling to Japan, especially to regions outside the urban core, is essential for promoting more balanced and sustainable tourism development (Chul Oh et al., 1995; Yuan & Mcdonald, 1990).
Traditional approaches to tourism motivation research have primarily relied on surveys (Cohen, 1972; Crompton, 1979), structured interviews, and classic theoretical models such as push–pull theory (Dann, 1981) and means–end chain theory (Gutman, 1982). While these methods have contributed significantly to our understanding of tourist behavior, they are constrained by inherent limitations: small sample sizes, potential respondent biases, and difficulties in capturing dynamic and evolving preferences. In the digital era, user-generated content (UGC) from social media platforms and travel review sites presents a rich and underutilized resource for uncovering insights into tourist motivations (Ana & Istudor, 2019). However, manually analyzing such large-scale textual data is impractical, prompting the need for more sophisticated and automated solutions.
Transformer models, particularly pre-trained large language models (LLMs) based on the Transformer architecture like BERT (Devlin et al., 2019) and RoBERTa (Y. Liu et al., 2019), have achieved impressive results in understanding deep text semantics, yet their application to tourism motivation research remains relatively rare. In addition, principal component analysis (PCA) has proven useful in tourism analytics for dimensionality reduction, helping to distill key motivational factors from large datasets processed by Transformer models.
To address this gap, we build on our previous research (Z. Liu et al., 2025, 2023) by applying an automatic system to extract tourism motivations from online reviews, with a focus on identifying dominant motivational factors at different tourist spots. Specifically, we employ RoBERTa to semantically analyze and score reviews according to predefined motivational dimensions, such as self-expansion, excitement, natural experience, local interaction, cultural observation, and health recovery. These dimensions are grounded in the established tourism motivation literature and refined through our earlier studies. After computing individual motivation scores for each review, we apply principal component analysis (PCA) to reduce data dimensionality and identify the most prominent motivational factor for each destination.
The goal of this study is to apply and validate an AI-driven framework capable of automatically identifying and analyzing tourism motivations from large-scale textual data, with a focus on Chinese reviews of Hokkaido. By integrating advanced natural language processing (NLP) with statistical modeling techniques, we seek to improve the efficiency and depth of motivation analysis in tourism research.
This study makes several key contributions:
  • We develop an end-to-end framework that combines LLM-based motivation scoring with principal component analysis (PCA) and clustering to automatically extract latent motivational factors from user-generated tourism reviews.
  • We apply this framework to Chinese tourist reviews of Hokkaido, uncovering spatial and seasonal patterns in motivational drivers, while evaluating the model’s interpretive performance against expert-coded annotations to assess its strengths and limitations in capturing abstract, context-sensitive constructs.
  • We translate these findings into data-driven recommendations for tourism practitioners to inform destination branding, enhance visitor experience design, and tailor marketing strategies more effectively.
By bridging the gap between AI technologies and tourism studies, this research highlights the potential of automated text analysis in capturing evolving traveler preferences and contributes to the growing body of research on AI-enhanced tourism analytics.
This study aims to understand how tourism motivations vary across different seasons and locations in Hokkaido, and how a hybrid computational approach can uncover meaningful behavioral patterns. We pose the following research questions:
  • RQ1: What are the dominant motivational themes expressed by Chinese tourists across key destinations in Hokkaido?
  • RQ2: How do these motivational patterns shift by season, and what spatial–temporal trends emerge?
  • RQ3: How can transformer-based models and statistical techniques be integrated to reveal latent motivation structures?
  • RQ4: What implications do these seasonal and locational motivation profiles have for targeted tourism development and rural revitalization strategies in Japan?
These questions are rooted in classical tourism motivation theories (e.g., push–pull dynamics and socio-psychological motives) and are extended through data-driven methods that allow for detailed analysis across different seasons and geographic locations.

2. Previous Research

2.1. Tourism Motivation

Understanding the motivations underlying tourist behavior has long been a central concern within tourism studies. Motivation is typically defined as the set of internal and external forces that initiate, direct, and sustain travel-related behavior. Foundational frameworks in this area include the push–pull model proposed by Dann (1981), which distinguishes between internal “push” factors (e.g., escape, relaxation, self-exploration) and external “pull” factors (e.g., destination attributes, cultural attractions). This dichotomy has served as a basis for numerous studies investigating how motivations influence destination choice and tourist behavior.
Pearce’s Travel Career Ladder (TCL), developed through the extension of Maslow’s hierarchy of needs, posits that travel motivations evolve with a tourist’s level of experience (Pearce, 2012). For instance, novice travelers may prioritize safety and relaxation, whereas experienced tourists are more likely to seek self-actualization and culturally enriching experiences. Similarly, Crompton’s typology of socio-psychological motives, comprising dimensions such as novelty, social interaction, and cultural enrichment, remains influential in understanding the diversity of motivational orientations among tourists (Crompton, 1979).
Empirical research continues to substantiate the relationship between motivation and tourist behavior. For example, Devesa et al. (2010) and Dunn Ross and Iso-Ahola (1991) found that motivations related to knowledge acquisition and social interaction significantly influenced visitor satisfaction. A segmentation study by Park and Yoon (2009) revealed that Korean rural tourists could be classified into four motivational types, with relaxation emerging as the most prevalent. In Vietnam, Su et al. (2020) reported that tourist motivation influenced levels of engagement and experience quality, which in turn affected satisfaction and behavioral intentions.
Recent work has also emphasized the mediating role of motivation in shaping other aspects of travel behavior. Chi and Phuong (2022), for instance, demonstrated that motivation mediated the influence of time orientation and destination image on intention to visit urban destinations. In the context of global disruptions such as COVID-19, Annika Aebli and Taplin (2022) found that mental well-being and social connectivity became prominent motivators, even as health and safety concerns limited mobility.
Studies focusing on Chinese outbound tourists have identified culturally specific motivational patterns. For example, Wen et al. (2019) and Jiang et al. (2020) have highlighted motivations such as self-enhancement, spiritual fulfillment, and familial harmony. For Japanese tourists, motivations are similarly shaped by cultural preferences, including interest in seasonal phenomena (e.g., cherry blossoms), shopping, and cultural events (Zeng, 2021; Zeng & He, 2019). These motivational profiles differ significantly between first-time and repeat visitors and are influenced by logistical and contextual factors such as transportation access, political climate, and environmental conditions (Lin et al., 2017).

2.2. Data-Driven Methods in Tourism Research

Traditional approaches to studying tourist motivation have primarily relied on surveys, interviews, and statistical techniques such as factor analysis and cluster analysis. While effective for generating insights into motivational typologies and segmentation, these methods are inherently constrained by sample size, cost, and scalability.
For example, Hayashi and Fujihara (2008) surveyed over 1000 Japanese travelers, identifying seven motivational dimensions: self-expansion, excitement, unexpectedness, natural engagement, local immersion, cultural appreciation, and health recovery. Similarly, Carvache-Franco et al. (2020) used multivariate statistical techniques to segment international tourists in Ecuador into three motivational clusters: beach-oriented, eco-coastal, and multi-purpose travelers. A study by Valverde-Roda et al. (2022) on gastronomic tourism in Granada identified three tourist types, survivors, enjoyers, and experiencers, based on their culinary motivations and preferences.
Research on Last Chance Tourism (LCT), a phenomenon where tourists are motivated by the impending loss of natural or cultural heritage, further illustrates the diversity of motivational constructs. Salim and Ravanel (2023) examined LCT motivations at Montenvers-Mer-de-Glace, France, revealing key drivers such as a desire to witness environmental change and learn about glacial retreat.
Shayegan and Dastan (2024) developed a multilingual sentiment analysis model for TripAdvisor hotel reviews using an extended ABCDM deep neural network with LASER embeddings and transfer learning. Their model achieved strong performance across nine languages, outperforming their CNN and BiLSTM baselines. While focused on sentiment classification rather than motivational analysis, the study highlights the effectiveness of Transformer-based architectures in multilingual tourism contexts.
Despite their contributions, these studies are limited by their reliance on time-consuming data collection procedures and their focus on specific destinations or tourist demographics. This has led to increasing interest in data-driven and computational approaches that can analyze tourist motivations at scale.

2.3. Large Language Models in Tourism Research

The emergence of large language models (LLMs) and other AI-based tools has opened new avenues for tourism research, particularly in the analysis of unstructured data such as user-generated content. Recent studies have begun to leverage these models for extracting latent motivational themes from large corpora of online reviews, many concentrating on tourism forecasting and sentiment analysis.
Transformer-based models have been employed for tourism demand forecasting, such as the Tsformer model by Yi et al. (2021), which enhances performance in long-range time series prediction. Additionally, Wu et al. (2023) proposed an interpretable forecasting model, ADE-TFT, which optimizes the Temporal Fusion Transformer (TFT) using an adaptive differential evolution algorithm to improve both predictive performance and interpretability. By incorporating historical tourism data, COVID-19 case statistics, and sentiment information extracted from travel forums and search engines, the study demonstrated that combining quantitative and emotion-based features significantly enhances forecasting accuracy during crisis periods. In a similar manner, Diao et al. (2025) improved forecasting accuracy using virtual data augmentation and enhanced Transformer architectures. While these models demonstrate strong performance in behavioral prediction, they operate primarily at the macro-level of observed activity (e.g., arrivals, occupancy rates) and do not engage with the underlying psychological or motivational states that drive tourist behavior.
Viñán-Ludeña and de Campos (2022) analyzed large-scale Instagram and Twitter data to extract key places and perceptions associated with tourism in Granada, Spain. Using deep learning models, including a Spanish-Tourism-BERT and Tweeteval, the study identified both positive and negative sentiments linked to destinations, offering practical tools for tourism management and marketing improvements.
Further advancing sentiment analysis applications, Srinivasan et al. (2025) introduced the Hybrid Transformer-Attention Model (HTAM), combining transformer-based contextual learning with attention mechanisms to achieve superior classification accuracy in tourist reviews. HTAM demonstrated improved performance and interpretability, contributing to smarter, more adaptive tourism service strategies.
Sentiment analysis using Transformers has also been explored in tourism by Alamsyah et al. (2024). They applied a BERT-based model to classify tourist reviews from Indonesia and Thailand, revealing high satisfaction levels and identifying entertainment as the dominant experiential factor. While these studies demonstrate the effectiveness of Transformer models in processing tourism-related text, their focus remains on emotional polarity and perception, rather than the multi-dimensional motivational constructs grounded in psychological theory that our study addresses.
Ramadhani et al. (2025) conducted a large-scale, cross-cultural tourism study using BERT-based Transformer models and network analysis on over 387,000 TripAdvisor reviews across Indonesia, Thailand, and Vietnam. Their research explored how cultural backgrounds influence perceptions of entertainment experiences and travel mobility. The study implemented a three-stage classification pipeline to analyze sentiment, experience dimensions, and entertainment subtypes, achieving high accuracy. While it did not explicitly model motivational factors, the work demonstrates how LLMs can uncover cultural preferences and movement patterns in tourism contexts, offering insights for targeted destination marketing and policy planning. Similarly, Lan et al. (2025) proposed a dual-method framework that combines unsupervised LLM-based expectation mining with survey-guided fine-tuning. While their study identifies expectation patterns, it does not ground its categories in psychologically validated motivation taxonomies, nor does it attempt multi-dimensional scoring of tourism motivation.
Additionally, in our previous research, we proposed an automated pipeline for detecting dominant tourist motivations from Chinese-language reviews of attractions in Hokkaido (Z. Liu et al., 2023). The approach involved extracting key n-grams and scoring them against a predefined lexicon based on the motivational dimensions articulated by Hayashi and Fujihara (2008). The resulting scores were visualized using principal component analysis (PCA) and subsequently clustered to identify patterns across tourist sites.
Furthermore, we used a pre-trained transformer model (RoBERTa) to automatically assess reviews across seven motivation factors (Z. Liu et al., 2025). The model closely matches manual assessments and significantly reduces human effort in analyzing preferences of tourists. This method allows for a scalable and replicable analysis of underlaying motivational trends instead of concentrating simply on the sentiment of visitors and is particularly well-suited to examining temporal and spatial dynamics in tourist behavior. The study exemplifies further potential of LLMs in addressing some of the limitations of conventional survey-based methods, such as limited generalizability and manual data coding.
While recent studies have employed Transformer-based models for tasks such as sentiment classification (Alamsyah et al., 2024; Srinivasan et al., 2025; Viñán-Ludeña & de Campos, 2022) and demand forecasting (Diao et al., 2025; Wu et al., 2023; Yi et al., 2021), the application of LLMs to tourism motivation remains sparse. For instance, Ramadhani et al. (2025) applied a BERT-based framework to examine cross-cultural perceptions and entertainment-related preferences across Southeast Asian destinations, combining text mining with network analysis. Although their study provided valuable insights into cultural sentiment and experiential dimensions, it did not model underlying psychological motivations. Similarly, Lan et al. (2025) proposed a dual-method LLM framework to extract tourist expectations from online content, guided in part by survey-informed fine-tuning. While their approach addressed expectation-level analysis, it did not explicitly engage with theory-grounded motivational constructs or offer multidimensional scoring. To our knowledge, few, if any, existing works have fine-tuned LLMs to model multidimensional, psychologically grounded motivation constructs. Our study is among the first to do so, using a RoBERTa model to predict seven motivation categories grounded in established theory, and validating results through both PCA and expert scoring. This represents a significant advance in applying LLMs to higher-order cognitive constructs within tourism behavior research.

2.4. Research Gap

While tourism motivation has been extensively studied, much of the existing work remains grounded in traditional methodologies that are constrained by sample size, generalizability, and temporal relevance. Studies such as those by Devesa et al. (2010), Park and Yoon (2009), and Su et al. (2020) provide important insights but are typically limited to narrowly defined populations and geographic contexts. Moreover, the manual nature of these approaches limits their capacity to keep pace with evolving tourist behaviors in real time.
Recent advances in data-driven tourism research have introduced text mining and machine learning techniques, yet few have fully leveraged the capabilities of Transformer-based language models for motivational analysis. Existing applications often rely on generic sentiment analysis or topic modeling, which fall short of capturing psychological constructs such as self-expansion or cultural curiosity.
This study addresses these limitations by applying pre-trained Transformer models, specifically RoBERTa, to extract fine-grained motivational signals from large-scale, user-generated content in Chinese. In doing so, it not only circumvents the limitations of manual coding and fixed survey instruments but also introduces a scalable and replicable framework for understanding tourist motivations. The integration of principal component analysis and clustering techniques further enhances interpretability, enabling the identification of latent motivational structures across destinations. We further validate the method by conducting a temporal analysis, examining tourist motivation factors across different seasons and comparing human and model-predicted scores.
By aligning computational precision with theoretical constructs from motivation theory, this research offers a methodological advancement over previous studies and establishes a foundation for applying AI-driven tools to broader cross-cultural tourism analysis.

3. Method and Experiment

This study adopts a hybrid methodology that integrates advanced natural language processing (NLP) techniques with statistical modeling to identify and interpret the underlying motivations of Chinese tourists visiting Hokkaido. The approach builds on prior work in computational tourism research and aims to overcome the limitations of traditional survey-based methods by leveraging large-scale user-generated content (UGC). The overall workflow is depicted in Figure 1 and comprises four major phases: (1) data collection, (2) motivation score prediction using a fine-tuned Transformer model, (3) dimensionality reduction through principal component analysis (PCA), and (4) validation and analysis of motivational patterns. Each step is described in detail in Section 3.1, Section 3.2, Section 3.3 and Section 3.4.

3.1. Data Collection

The first step involved the collection of large-scale textual data from Chinese tourists who had visited Hokkaido, following the approach of Z. Liu et al. (2023). Data were sourced from https://ctrip.com/ (accessed on 8 May 2025), one of the most prominent Chinese travel platforms, known for its rich repository of user-generated reviews and travel narratives. We selected Hokkaido due to its seasonal diversity, cultural distinctiveness, and popularity among Chinese tourists, making it an ideal case study for motivation analysis.
A total of 500 review entries were curated to ensure diversity in content, length, and tourist destinations. The dataset included reviews spanning different cities (e.g., Sapporo, Otaru, Furano), attractions (e.g., Shiroi Koibito Park, Noboribetsu Hell Valley, Hokkaido Shrine), and from different seasons. This ensured that both temporal (seasonal) and spatial (location-specific) variations could be explored.
For model training and evaluation purposes, the collected reviews were manually annotated according to seven motivation dimensions based on the framework proposed by Hayashi and Fujihara (2008). These dimensions, stimulation, cultural observation, local communication, health recovery, experiencing nature, unexpectedness, and self expansion, were chosen due to their grounding in the tourism psychology literature and their prior validation in the context of Japanese tourism. Each review was independently evaluated and scored by five Chinese annotators to ensure reliability. The definitions of the motivation factors are presented in Table 1, and the scoring criteria are detailed in Table 2. Five native Chinese participants were asked to independently score each review based on their judgment, using the provided motivation definitions and the perceived relationship between the review content and each factor. To ensure inter-rater reliability, we calculated the coefficient scores between the participants. The results confirmed a good level of agreement, and the final score for each review was obtained by averaging the scores from all five participants.

3.2. Motivation Score Prediction

The second phase aimed to automatically infer the motivational content of each review using Transformer-based NLP models. We employed RoBERTa, a robustly optimized variant of BERT, known for its superior performance in semantic understanding tasks. RoBERTa was pre-trained on a large corpus of Chinese text and subsequently fine-tuned on our manually annotated dataset.
RoBERTa (Y. Liu et al., 2019) is an improved variant of BERT designed to strengthen language representation through modifications to its pre-training strategy. It achieves this by eliminating the Next Sentence Prediction objective, applying a dynamic masking scheme during masked language modeling, and training on a significantly larger and more diverse corpus. These enhancements result in a more stable and effective model for capturing contextual information in downstream NLP tasks.
In contrast to conventional machine learning techniques, which often depend on manual extraction of features and are limited by fixed input lengths, Transformer models like RoBERTa utilize self-attention mechanisms that evaluate the significance of every token in the sequence relative to others. This structure allows them to effectively model complex, long-range dependencies that traditional models like SVMs or RNNs struggle to handle (Vaswani et al., 2017). Additionally, because Transformers are pre-trained on massive text datasets and later adapted through fine-tuning, they offer enhanced flexibility and performance across a wide range of language-related tasks. Their adaptability and power have made them the core architecture for leading models in tasks such as text generation, translation, and summarization.
Our decision to use a fine-tuned RoBERTa model is grounded in both theoretical and empirical considerations. RoBERTa outperforms earlier transformer architectures such as BERT in multiple NLP benchmarks due to its optimized pretraining strategy, including dynamic masking, longer training, and the removal of Next Sentence Prediction (Y. Liu et al., 2019). In our previous research (Z. Liu et al., 2025), RoBERTa consistently outperformed alternative models on our dataset, reinforcing its suitability for fine-grained motivation modeling.
The fine-tuning process treated the task as a multi-output regression problem, following the approach of Z. Liu et al. (2025), in which the model was trained to predict seven continuous scores—each representing the intensity of a specific motivational dimension—on a 1 to 5 scale. This setup enabled the model to capture the relative salience of multiple motivations within a single review, rather than constraining the task to single-label classification. Input reviews were tokenized using RoBERTa’s Byte-Pair tokenizer. All models were initialized from Hugging Face’s Transformers library and fine-tuned using the PyTorch (2.6.0) framework. Given the limited size of the labeled dataset, we adopted a conservative training strategy that included a small learning rate (e.g., 4 × 10 6 ), a larger number of epochs (typically >10), and early stopping based on validation loss.
The model was applied to a total of 500 reviews not seen during training, generating a seven-dimensional motivation score vector for each entry. These vectors represent the semantic alignment of each review with the target motivational constructs, enabling quantitative analysis of otherwise subjective tourist sentiments.

3.3. Principal Component Analysis

To synthesize the high-dimensional motivational data and extract key patterns, we applied principal component analysis (PCA) to the predicted score vectors. PCA is a widely used unsupervised method for identifying latent structure in multivariate data and has been employed in tourism analytics for purposes such as segmentation, trend detection, and behavior clustering. It is commonly used in data analysis to simplify high-dimensional datasets by identifying a smaller set of new variables—known as principal components—that capture the most significant patterns of variation (Abdi & Williams, 2010). In other words, PCA is a method for simplifying complex data by combining overlapping variables into a smaller number of summary dimensions. It is useful for identifying the most important patterns in data without losing key information.
In this study, PCA served two primary purposes. First, it reduced redundancy among the correlated motivation scores, providing a compressed representation that emphasizes the most salient variance in the dataset. Second, it enabled visualization and interpretation of dominant motivational trends across different reviews and destinations.
We focus on the first principal component (PC1) because it captures the largest share of total variance in the multidimensional motivation–score space. In principal component analysis (PCA), each principal component is defined as an orthogonal linear combination of the original variables, ordered by the amount of variance they explain in the data; PC1 corresponds to the eigenvector of the covariance matrix with the largest eigenvalue, thereby maximizing the projected variability. In geometric terms, it defines the direction in the data space along which the observations vary most. This makes it the most informative axis for summarizing the dominant patterns in the motivation data. The coefficients, or loadings, associated with PC1 indicate how much each original variable contributes to this composite dimension. In this case, the loadings of PC1, which are the coefficients that associate each original motivation factor with this component, indicate how strongly each factor contributes to the dominant pattern of variation in the data. Consequently, the motivation factor with the highest absolute loading on PC1 is interpreted as the primary driver of tourist behavior in our dataset. By extracting and examining PC1 loadings from the RoBERTa-scored motivation vectors, we thus identify the single most salient motivation dimension (e.g., Nature, Culture) that underlies the greatest amount of variation across all reviews. These PC1 loadings were interpreted as composite motivation scores for each tourist destination. For instance, if the “health recovery” factor had the highest loading on PC1 for a given site, this would suggest that visitors primarily associated that location with relaxation, well-being, or restorative experiences in natural or calming environments. This approach provides a clear, statistically grounded method for isolating the main tourist motivation without arbitrary thresholds or subjective weighting.
While non-linear dimensionality reduction techniques such as t-SNE (Van der Maaten & Hinton, 2008) and UMAP (McInnes et al., 2018) are valuable for visualizing clusters, they lack the interpretability of PCA, particularly in terms of loadings that quantify how each original motivation dimension contributes to a principal component. By analyzing PCA loadings, we are able to identify dominant motivational combinations and assess how these patterns shift across seasons. Our use of PCA builds on our prior work in tourism analytics (Z. Liu et al., 2023), which applied PCA to expert-annotated review data. In contrast, our study extends this methodology by applying PCA to outputs generated by a fine-tuned large language model (RoBERTa), enabling scalable, automated motivation profiling while retaining psychological interpretability and seasonal resolution.
We conducted principal component analysis on the motivation scores of 500 reviews to identify the main tourism motivations in Hokkaido. In addition, the reviews were grouped by season to compare variations in tourist motivations across different times of the year.

3.4. Tourist Motivation Analysis

In the final phase of the study, we analyzed the motivational patterns derived from the PCA results to gain insights into the dominant factors influencing Chinese tourists’ visits to Hokkaido. This analysis aimed to assess the real-world applicability of the automated framework by identifying seasonal trends and spatial variations in tourist motivations across different attractions.
To evaluate the performance of the RoBERTa-based motivation scoring system, we compared PCA results derived from the model-generated scores with those based on human-annotated scores. This comparison enabled us to assess the extent to which the model could replicate expert judgment in identifying key motivational dimensions. The reviews were segmented by season—spring, summer, fall, and winter—to explore how tourist motivations vary throughout the year. By analyzing principal component loadings for each seasonal subset, we identified temporal patterns in motivational salience, thereby validating the model’s ability to capture context-sensitive shifts in tourist behavior.

4. Results

4.1. Principal Motivational Factors

We analyzed a total of 500 Chinese-language reviews of major tourist attractions in Hokkaido. Using the RoBERTa model, each review was scored across seven defined tourist motivation factors. Principal component analysis (PCA) was then conducted on the scored dataset to identify the dominant motivations underlying the reviews.
The PCA results indicate that the first principal component (PC1) consistently captures the Nature dimension as the most influential motivational factor for overall tourism in Hokkaido. This trend is illustrated in Figure 2, where Nature displays the highest loading on PC1.
To further explore seasonal differences, the reviews were grouped into four seasonal subsets based on their posting dates: spring, summer, fall, and winter. Separate PCA analyses were performed for each season. As shown in Figure 3, Figure 4, Figure 5 and Figure 6, the Nature motivation remained the most dominant factor across all seasons. However, the PCA result for the fall season (Figure 5) reveals a relatively elevated loading for the Culture dimension, indicating a seasonal shift in tourist focus toward cultural experiences during autumn. These results suggest that while natural scenery is the primary attraction for tourists visiting Hokkaido throughout the year, cultural motivation also becomes salient during the fall season.

4.2. Model Performance

To evaluate the reliability and consistency of the RoBERTa-based automated scoring system for assessing tourist motivation factors in online reviews, we compared the model-generated scores with human-annotated scores across ten major tourist spots in Hokkaido. The comparison was quantified using the Intraclass Correlation Coefficient (ICC) for seven key motivation dimensions: stimulation, culture, local communication, health recovery, nature, unexpectedness, and self-expansion. The results are summarized in Table 3.
The average ICC values across all motivation factors ranged from 0.48 to 0.76. Notably, the highest ICC was observed in the “Culture” category (ICC = 0.76), followed by “Nature” (ICC = 0.63), and “Health Recovery” (ICC = 0.51). These relatively high ICC values indicate a strong agreement between the RoBERTa model and human raters, particularly in capturing culturally and environmentally related motivations, which are central to the tourism experience in Hokkaido.
The consistency across different tourist spots further confirms that the model can effectively generalize its scoring across various types of attractions, including natural landmarks like Noboribetsu Hell Valley and cultural venues such as the Otaru Music Box Hall. For instance, spots with strong cultural or natural themes showed close alignment between human and machine scores, suggesting that RoBERTa is particularly sensitive to the textual features related to these dimensions. These results are in line with our previous research Z. Liu et al. (2025).
We also compared the seasonal PCA results derived from manual scores with those obtained from RoBERTa-predicted scores. The PC1 loadings for Hokkaido and the different seasons, based on manual scores, are shown in Figure 7, Figure 8, Figure 9, Figure 10 and Figure 11. We found that the results are generally consistent with the RoBERTa-predicted scores; however, some differences remain. This is likely because the RoBERTa model performs well in identifying clear, keyword-based motivations but may be less accurate in capturing subjective feelings or subtle human judgments.
Overall, these findings support the validity of using the RoBERTa model for large-scale annotation of tourist motivation from user-generated content. It significantly reduces the time and labor cost involved in manual content analysis while maintaining a high level of interpretive accuracy. This approach not only streamlines the research workflow but also enables the processing of much larger datasets that would be impractical to score manually, thus offering a scalable solution for tourism motivation analysis in both academic and industry contexts.
To investigate seasonal variations in tourist motivations, we conducted a principal component analysis (PCA) on Chinese-language reviews for tourist attractions in Hokkaido. Reviews were grouped by season, and we analyzed PC1 loadings to identify the dominant motivation factors in each period. The PC1(Comp.1)–PC2(Comp.2) score plots and corresponding review contents were also examined to understand how specific motivational dimensions relate to individual reviews (Figure 12, Figure 13, Figure 14 and Figure 15). The numbers in the plot represent individual review IDs, and the arrows indicate motivation factors. Reviews located closer to a particular arrow exhibit a stronger association with that corresponding motivation factor.
These plots provide a two-dimensional visualization of the distribution of tourist reviews across the motivational landscape for each season. In each plot, motivation factors such as nature, culture, and self-expansion are represented as directional vectors, and the relative proximity of each review to these vectors indicates the strength of its association with the corresponding factor.
The PCA score plots clearly reveal seasonal variations in dominant motivations. For instance, in the spring plot (Figure 12), reviews cluster along the nature vector. In summer (Figure 13), review points are more widely dispersed, indicating a broader range of motivations, including stimulation, unexpectedness, and self-expansion. The fall season (Figure 14) shows a notable shift toward culture. Finally, winter (Figure 15) returns to a strong emphasis on nature.

5. Discussion

Our results show that the RoBERTa-based scoring system successfully approximates human annotations for key motivation dimensions. The Intraclass Correlation Coefficients (ICCs) indicate particularly strong alignment in concrete categories such as Nature and Culture, supporting the reliability of our automated pipeline. These results affirm that LLM-based scoring can replicate expert judgment across large datasets, offering a scalable alternative for tourism motivation analysis.
However, the model demonstrated more limited performance for abstract constructs such as self-expansion and excitement, which are often implicit or emotionally nuanced. This reflects a known challenge for LLMs in interpreting introspective or affective states, and suggests that further refinement, such as the integration of multimodal data, such as the additional usage of images or geotagged itineraries from SNS, or fine-grained contextual modeling like sequence-aware architectures (e.g., hierarchical transformers), may improve accuracy in these areas.
One of the study’s key contributions lies in its exploration of seasonal shifts in tourist motivations. By grouping reviews according to season and applying PCA to both human-annotated and model-predicted scores, we identified distinct patterns in what motivates Chinese tourists across different times of the year.
The key findings are discussed below:
1.
Spring Season—Nature Motivation with Seasonal Transition
Spring is characterized by a high loading on nature motivation, as reflected in many reviews. Tourists visiting in March and early April often encounter lingering snow in areas like Asahiyama Zoo and Noboribetsu, while those visiting in late April to May experience blooming sakura, particularly in Sapporo and Odori Park.
For example, Review No. 62 (Figure 12) states, “Hokkaido Jingu is a world of snow in winter. The snowy area is vast and highly recommended”. In contrast, Review No. 100 notes, “In spring, many Japanese people would have a picnic in Enoyama Park, and it was very crowded. Visitors can also enjoy the cherry blossoms planted next to the shrine”. These reviews support the conclusion that nature is the core draw in spring, but with seasonally segmented appeal depending on the timing of the trip.
2.
Summer Season—Diverse Motivations with Emphasis on Nature, Stimulation, and Self-Expansion
Summer reviews show high PC1 loadings for nature, as well as strong secondary loadings on stimulation, unexpectedness, health, and self-expansion. Tourists are exposed to Hokkaido’s vibrant natural beauty and outdoor activities, including lavender fields in Furano, flower gardens in Shiroi Koibito Park, and festivals like the Sapporo Summer Festival.
For instance, Review No. 69 (Figure 13) expresses, “You can visit the Chocolate Museum at the factory where the famous White Lover sweets are made. In addition to learning about the history of chocolate, there is also a nostalgic toy museum, a dessert buffet, and a workshop where you can try your hand at making Kotatsu biscuits. In summer, the rose garden in the park is filled with more than 1000 roses of 200 varieties, including English roses and old roses”. Likewise, Review No. 65 notes, “I thought it was a small park, but after walking around, I realized that it is a big park, and it actually takes about 30 min to walk from the beginning to the end. Along the way, there are not only beautiful trees, but also tulips and other unknown flowers for decoration, making it a very comfortable environment”. These comments explain the PCA results: visitors feel stimulated, mentally renewed, and enriched by their summer travels, contributing to the high loadings on both emotional engagement and self-growth.
3.
Fall Season—Strong Cultural and Natural Motivations
In the fall season, both culture and nature emerged as the dominant motivational factors, with cultural motivation noticeably higher compared to other seasons. This result reflects a shift in focus toward cultural appreciation, likely due to the seasonal decline in iconic natural elements such as flower blooms or deep snow.
From the PCA score plot, for example, reviews No. 96 and 35 (Figure 14) are located in the high-load region of cultural motivation. Review No. 96 mentions the following: “On the west side of Otaru Canal, crossing the street is Izubakoji, which is really small and narrow, as it was when it was built about a hundred years ago, and it is full of restaurants and small stores. Further west is an abandoned railroad, and you can imagine how the Hokkaido pioneers lived and worked here”. Similarly, Review No. 35 notes, “The kerosene street lamps and vintage masonry warehouses along the canals emphasize the city’s literacy, and there are many artists painting and taking pictures along the banks”. These examples show that in fall, tourists are drawn to culturally enriching experiences.
While nature is still valued in fall—e.g., the autumn foliage around Otaru lakes is frequently praised—tourists tend to focus more on human heritage and cultural context, possibly due to fewer seasonal natural spectacles compared to spring or winter.
4.
Winter Season—Strong Nature Motivation Dominated by Snow Experience
Winter in Hokkaido is widely known for its snowy landscapes, and PCA results confirm that nature is the strongest motivational factor. The reviews consistently highlight attractions like the Sapporo Snow Festival, snow-covered Otaru Canal, and hot springs near Noboribetsu Hell Valley.
From the PCA plot, Review No. 184 (Figure 15) says, “In the snow-covered, the shrine has a touch of romance, look at the snow under the shrine pedestrians, hurried footsteps, more than a leisurely. Here, I also met, a cute little squirrel, because the winter, its hair grows especially much, especially fluffy, fat and round with the cartoon almost, really cute”. These examples confirm that Hokkaido’s winter identity is deeply rooted in natural aesthetics, and tourists primarily visit during this season to experience snow in a clean, beautiful, and photogenic environment.
These seasonal findings not only demonstrate the framework’s capacity to detect temporal variations but also provide valuable guidance for seasonally targeted marketing and experience design. For example, tourism boards could emphasize cultural heritage tours in the fall, wellness and stimulation experiences in the summer, and natural beauty throughout the year, tailored to the most salient motivations of the season. This insight is valuable for developing seasonally targeted tourism strategies, ensuring that promotional materials, tour packages, and cultural programming align with tourists’ evolving motivations.

Theoretical Implications and Integration with Prior Literature

The findings of this study provide a contemporary lens through which classical tourism motivation theories can be revisited. For example, the consistent prominence of “Nature” and seasonal shifts toward “Culture” reflect Dann’s push–pull model (Dann, 1981), where external attributes (e.g., snow, festivals) serve as strong pull factors depending on temporal context.
The relevance of Pearce’s Travel Career Ladder model (Pearce, 2012) is more tentative in this context. While some higher-order motivations such as self-expansion appear seasonally (e.g., in summer), the absence of demographic or travel experience data limits its direct application. Therefore, we treat this alignment as suggestive rather than confirmatory. Our model’s capacity to detect such abstract motivations may reflect latent stages of travel interest, but it cannot directly infer tourist maturity or experience.
Notably, our PCA-based clustering of review-derived motivation scores reveals granular distinctions in tourist behavior that go beyond the capabilities of traditional survey-based segmentation. This supports Crompton’s notion of socio-psychological motives (Crompton, 1979) but provides a scalable way to track their prevalence across destinations and timeframes.
These findings also extend the work of Hayashi and Fujihara (2008) by showing how their seven motivation categories manifest not just statically, but dynamically across seasons and locations. The hybrid NLP + PCA framework thus serves as a bridge between motivational theory and scalable behavioral analytics, addressing long-standing challenges in generalizability and context sensitivity within motivation research.

6. Conclusions

This study explored the motivations of Chinese tourists visiting Hokkaido by combining Transformer-based natural language processing techniques with statistical analysis. Specifically, a RoBERTa model was fine-tuned to automatically score user-generated reviews according to predefined tourism motivation categories. The results were further analyzed using principal component analysis (PCA) to extract key motivational dimensions. By analyzing 500 Chinese-language reviews across ten popular destinations and four seasons, we identified how specific motivational dimensions vary temporally and spatially. Our findings demonstrate that the Transformer-based automated scoring method can effectively replicate manual annotation results, significantly reducing the need for labor-intensive human coding. This integrated approach proves to be both efficient and reliable for large-scale tourism motivation analysis.
We examined seasonal differences by dividing the reviews into four time periods: spring, summer, fall, and winter. The PCA results revealed that Hokkaido’s appeal to Chinese tourists is primarily driven by its natural beauty and Japanese culture. Our findings show that “Nature” is a dominant year-round draw, while “Culture” emerges more prominently during the fall. The hybrid approach revealed distinct clusters of motivation profiles, demonstrating how large-scale user-generated content can be used to extract latent motivation structures with greater precision than manual surveys. These results validate the usefulness of combining RoBERTa-based scoring with PCA as a methodologically rigorous and scalable alternative to traditional motivation analysis.
Together, these findings highlight the dual utility of our approach: not only can it reliably replicate expert annotation, but it can also uncover nuanced temporal patterns that are crucial for destination management, marketing, and experience design. This research underscores the value of AI-driven tools in both academic and applied tourism contexts.

6.1. Implications

Theoretically, this work strengthens the empirical applicability of classical motivation frameworks by grounding them in large-scale behavioral data. The seasonally segmented PCA results provide empirical support for the push–pull model, while only tentatively aligning with the Travel Career Ladder due to lack of experiential data. This nuanced treatment ensures theoretical alignment without overreaching claims.
The findings provide valuable implications for tourism practitioners and destination marketers in Hokkaido. First, the automated method enables scalable analysis of tourist motivations, supporting more timely and data-driven decision-making. Second, understanding the seasonal variations in motivation allows tourism authorities to tailor promotional strategies accordingly. For instance, nature-oriented marketing may be more effective in winter and summer, while cultural content should be emphasized in spring and fall campaigns. Additionally, the demonstrated success of the Transformer-PCA framework supports its application to other destinations seeking to analyze visitor motivations using digital text data.
For data-driven decision-making, the study demonstrates a replicable, low-cost approach to generating strategic insights. By reducing reliance on manual content analysis, tourism boards and travel companies can evaluate trends at scale, increasing responsiveness to market demands. For broader regional policy and branding, these results affirm that Hokkaido’s strength lies in the interplay between its natural and cultural assets. A balanced promotional strategy that shifts emphasis with the seasons could help sustain year-round tourism while distributing tourist flow more evenly across time and location.

6.2. Future Work

Future research can build upon this study in several ways. First, incorporating additional data sources such as images, social media check-ins, or travel itineraries may enrich the understanding of tourist behavior. Second, extending the analysis to other linguistic or demographic groups could provide comparative insights across nationalities or cultural backgrounds. Third, a longitudinal approach would help track how motivations evolve over time in response to external factors such as pandemics or economic shifts. Finally, closer collaboration with tourism stakeholders could facilitate the practical integration of these findings into experience design and policy planning.
To elaborate, integrating multimodal data refers to analyzing not only textual reviews but also associated images and geospatial check-ins. For example, images can be processed using vision–language models to infer motivational cues like awe, relaxation, or novelty. Geotagged check-ins and itinerary sequences can reveal travel pacing, diversity of experiences, and implicit preferences. These can complement textual motivation scores and help identify hidden behavioral patterns. Fine-grained contextual modeling would involve applying more advanced NLP techniques such as hierarchical transformers or discourse-aware models to capture motivational cues expressed over multiple sentences or embedded in complex narrative structures. This would enhance the model’s ability to detect abstract motivations like self-expansion or unexpectedness, which may be conveyed more subtly or indirectly.
These advancements would deepen the granularity and contextual sensitivity of motivation analysis, enabling a more holistic understanding of the tourist experience.
In conclusion, this study demonstrates how AI-powered approaches can enhance both the scale and depth of motivation research, providing actionable intelligence for tourism management while contributing to the theoretical advancement of the field.

Author Contributions

Conceptualization, Z.L. and J.E.; methodology, Z.L. and J.E.; software, Z.L. and J.E.; validation, Z.L. and J.E.; formal analysis, Z.L.; investigation, Z.L.; data curation, Z.L. and J.E.; writing—original draft preparation, Z.L. and J.E.; writing—review and editing, Z.L. and J.E.; visualization, Z.L.; supervision, F.M. and M.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data analyzed during this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Abdi, H., & Williams, L. J. (2010). Principal component analysis. WIREs Computational Statistics, 2(4), 433–459. [Google Scholar] [CrossRef]
  2. Alamsyah, A., Fajriananda, M. N., & Ramadhani, D. P. (2024, July 4–6). Digital traces in tourism: Leveraging NLP to evaluate tourist experiences across southeast Asian destinations. 2024 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT) (pp. 86–92), Bali, Indonesia. [Google Scholar] [CrossRef]
  3. Ana, M.-I., & Istudor, L.-G. (2019). The role of Social Media and user-generated-content in Millennials travel behavior. Management Dynamics in the Knowledge Economy, 7(23), 87–104. [Google Scholar] [CrossRef]
  4. Annika Aebli, M. V., & Taplin, R. (2022). A two-dimensional approach to travel motivation in the context of the COVID-19 pandemic. Current Issues in Tourism, 25(1), 60–75. [Google Scholar] [CrossRef]
  5. Carvache-Franco, M., Carvache-Franco, W., Carvache-Franco, O., Hernández-Lara, A. B., & Buele, C. V. (2020). Segmentation, motivation, and sociodemographic aspects of tourist demand in a coastal marine destination: A case study in Manta (Ecuador). Current Issues in Tourism, 23(10), 1234–1247. [Google Scholar] [CrossRef]
  6. Chi, N. T. K., & Phuong, V. H. (2022). Studying tourist intention on city tourism: The role of travel motivation. International Journal of Tourism Cities, 8(2), 497–512. [Google Scholar] [CrossRef]
  7. Chul Oh, H., Uysal, M., & Weaver, P. A. (1995). Product bundles and market segments based on travel motivations: A canonical correlation approach. International Journal of Hospitality Management, 14(2), 123–137. [Google Scholar] [CrossRef]
  8. Cohen, E. (1972). Toward a sociology of international tourism. Social Research, 39(1), 164–182. Available online: http://www.jstor.org/stable/40970087 (accessed on 22 April 2025).
  9. Crompton, J. L. (1979). Motivations for pleasure vacation. Annals of Tourism Research, 6(4), 408–424. [Google Scholar] [CrossRef]
  10. Dann, G. M. (1981). Tourist motivation an appraisal. Annals of Tourism Research, 8(2), 187–219. [Google Scholar] [CrossRef]
  11. Devesa, M., Laguna, M., & Palacios, A. (2010). The role of motivation in visitor satisfaction: Empirical evidence in rural tourism. Tourism Management, 31(4), 547–552. [Google Scholar] [CrossRef]
  12. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019, June). BERT: Pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: Human language technologies, volume 1 (long and short papers) (pp. 4171–4186). Association for Computational Linguistics. [Google Scholar] [CrossRef]
  13. Diao, Y., Sun, Z., Zhao, J., Qiu, W., & Liu, S. (2025). Enhancing tourism demand forecasting via virtual sample generation and improved transformer models. arXiv, arXiv:2503.19423. [Google Scholar]
  14. Dunn Ross, E. L., & Iso-Ahola, S. E. (1991). Sightseeing tourists’ motivation and satisfaction. Annals of Tourism Research, 18(2), 226–237. [Google Scholar] [CrossRef]
  15. Gutman, J. (1982). A means-end chain model based on consumer categorization processes. Journal of Marketing, 46(2), 60–72. [Google Scholar] [CrossRef]
  16. Hayashi, Y., & Fujihara, T. (2008). Sightseeing motives of Japanese overseas tourists as a function of destination, tour type and age. The Japanese Journal of Experimental Social Psychology, 48(1), 17–31. [Google Scholar] [CrossRef]
  17. Japan Tourism Agency. (2024). White paper on tourism in Japan, 2024. Ministry of Land, Infrastructure, Transport and Tourism. Available online: https://www.mlit.go.jp/kankocho/news02_000517_00001.html (accessed on 1 May 2025).
  18. Jiang, S., Scott, N., Tao, L., & Ding, P. (2020). Chinese tourists’ motivation and their relationship to cultural values. In Culture and cultures in tourism (pp. 202–214). Routledge. [Google Scholar]
  19. JNTO. (2024). 2024 visitor arrivals and Japanese overseas travelers. Japan National Tourism Organization. Available online: https://www.jnto.go.jp/statistics/data/_files/20240821_1530-1.pdf (accessed on 1 May 2025).
  20. Lan, Z., Yang, X., Li, H., & Wang, R. (2025). Understanding tourist expectations using large language models: A dual-method framework. arXiv, arXiv:2505.16118. [Google Scholar]
  21. Lin, P. M., Qiu Zhang, H., Gu, Q., & Peng, K.-L. (2017). To go or not to go: Travel constraints and attractiveness of travel affecting outbound Chinese tourists to Japan. Journal of Travel & Tourism Marketing, 34(9), 1184–1197. [Google Scholar]
  22. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv, arXiv:1907.11692. [Google Scholar]
  23. Liu, Z., Eronen, J., Masui, F., & Ptaszynski, M. (2025). Automated evaluation of tourism motivation from chinese tourists in japan using transformers. Jxiv Preprint. [Google Scholar] [CrossRef]
  24. Liu, Z., Masui, F., Eronen, J., Terashita, S., & Ptaszynski, M. (2023). A New Approach to Extracting Tourism Focus Points from Chinese Inbound Tourist Reviews after COVID-19. Sustainability, 15(11), 8748. [Google Scholar] [CrossRef]
  25. McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXiv, arXiv:1802.03426. [Google Scholar]
  26. Park, D.-B., & Yoon, Y.-S. (2009). Segmentation by motivation in rural tourism: A Korean case study. Tourism Management, 30(1), 99–108. [Google Scholar] [CrossRef]
  27. Pearce, P. (2012). The ulysses factor: Evaluating visitors in tourist settings. Springer Science & Business Media. [Google Scholar]
  28. Ramadhani, D. P., Alamsyah, A., Febrianta, M. Y., Fajriananda, M. N., Nada, M. S., & Hasanah, F. (2025). Large-scale cross-cultural tourism analytics: Integrating transformer-based text mining and network analysis. Computers, 14(1), 27. [Google Scholar] [CrossRef]
  29. Salim, E., & Ravanel, L. (2023). Last chance to see the ice: Visitor motivation at Montenvers-Mer-de-Glace, French Alps. Tourism Geographies, 25(1), 72–94. [Google Scholar] [CrossRef]
  30. Shayegan, M. J., & Dastan, Y. (2024). A multilingual sentiment analysis model in tourism. Journal of Algorithms and Computation, 56(1), 1–14. [Google Scholar]
  31. Srinivasan, J., Niranjanee, M., Nandhana, C., & Azhagiri, M. (2025). A text classification in tourism reviews with the hybrid transformer-attention model in the information management of smart tourism. In International conference on cognitive computing and cyber physical systems (pp. 479–488). Springer. [Google Scholar]
  32. Su, D. N., Nguyen, N. A. N., Nguyen, Q. N. T., & Tran, T. P. (2020). The link between travel motivation and satisfaction towards a heritage destination: The role of visitor engagement, visitor experience and heritage destination image. Tourism Management Perspectives, 34, 100634. [Google Scholar] [CrossRef]
  33. The Japan Times. (2024). Inbound tourism numbers hit record high, with Japan set to achieve 2025 goal. The Japan Times. Available online: https://www.japantimes.co.jp/news/2024/04/17/japan/society/record-high-inbound-travelers/ (accessed on 1 May 2025).
  34. UN Tourism. (2023). International tourism and COVID-19. Available online: https://www.unwto.org/tourism-data/global-and-regional-tourism-performance (accessed on 1 May 2023).
  35. Valverde-Roda, J., Viruel, M., Castaño Prieto, L., & Sánchez, M. (2022). Interests, motivations and gastronomic experiences in the world heritage site destination of Granada (Spain): Satisfaction analysis. British Food Journal, 125, 61–80. [Google Scholar] [CrossRef]
  36. Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(11), 2579–2605. [Google Scholar]
  37. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30(5998), 6008. [Google Scholar]
  38. Viñán-Ludeña, M. S., & de Campos, L. M. (2022). Discovering a tourism destination with social media data: BERT-based sentiment analysis. Journal of Hospitality and Tourism Technology, 13(5), 907–921. [Google Scholar] [CrossRef]
  39. Wen, J., Huang, S. S., & Ying, T. (2019). Relationships between Chinese cultural values and tourist motivations: A study of Chinese tourists visiting Israel. Journal of Destination Marketing & Management, 14, 100367. [Google Scholar] [CrossRef]
  40. Wu, B., Wang, L., & Zeng, Y.-R. (2023). Interpretable tourism demand forecasting with temporal fusion transformers amid COVID-19. Applied Intelligence, 53(11), 14493–14514. [Google Scholar] [CrossRef] [PubMed]
  41. Yi, X., Zhang, K., Zheng, Y., Xu, Y., & Qin, Z. (2021). Tsformer: Transformer for time-series forecasting. arXiv, arXiv:2107.10977. [Google Scholar]
  42. Yuan, S., & Mcdonald, C. (1990). Motivational determinates of international pleasure time. Journal of Travel Research, 29(1), 42–44. [Google Scholar] [CrossRef]
  43. Zeng, B. (2021). Pattern of Chinese tourist flows in Japan: A social network analysis perspective. In Tourism spaces (pp. 42–64). Routledge. [Google Scholar]
  44. Zeng, B., & He, Y. (2019). Factors influencing Chinese tourist flow in Japan—A grounded theory approach. Asia Pacific Journal of Tourism Research, 24(1), 56–69. [Google Scholar] [CrossRef]
Figure 1. Flowchart of the proposed method.
Figure 1. Flowchart of the proposed method.
Tourismhosp 06 00133 g001
Figure 2. PC1 loadings of Hokkaido (model prediction).
Figure 2. PC1 loadings of Hokkaido (model prediction).
Tourismhosp 06 00133 g002
Figure 3. PC1 loadings of spring season (model prediction).
Figure 3. PC1 loadings of spring season (model prediction).
Tourismhosp 06 00133 g003
Figure 4. PC1 loadings of summer season (model prediction).
Figure 4. PC1 loadings of summer season (model prediction).
Tourismhosp 06 00133 g004
Figure 5. PC1 loadings of fall season (model prediction).
Figure 5. PC1 loadings of fall season (model prediction).
Tourismhosp 06 00133 g005
Figure 6. PC1 loadings of winter season (model prediction).
Figure 6. PC1 loadings of winter season (model prediction).
Tourismhosp 06 00133 g006
Figure 7. PC1 loadings of Hokkaido (manual scores).
Figure 7. PC1 loadings of Hokkaido (manual scores).
Tourismhosp 06 00133 g007
Figure 8. PC1 loadings of spring season (manual scores).
Figure 8. PC1 loadings of spring season (manual scores).
Tourismhosp 06 00133 g008
Figure 9. PC1 loadings of summer season (manual scores).
Figure 9. PC1 loadings of summer season (manual scores).
Tourismhosp 06 00133 g009
Figure 10. PC1 loadings of fall season (manual scores).
Figure 10. PC1 loadings of fall season (manual scores).
Tourismhosp 06 00133 g010
Figure 11. PC1 loadings of winter season (manual scores).
Figure 11. PC1 loadings of winter season (manual scores).
Tourismhosp 06 00133 g011
Figure 12. PCA score plot of spring season (S = Stimulation, C = Cultural Observation, L = Local Communication, H = Health Recovery, N = Experiencing Nature, U = Unexpectedness, and E = Self Expansion).
Figure 12. PCA score plot of spring season (S = Stimulation, C = Cultural Observation, L = Local Communication, H = Health Recovery, N = Experiencing Nature, U = Unexpectedness, and E = Self Expansion).
Tourismhosp 06 00133 g012
Figure 13. PCA score plot of summer season (S = Stimulation, C = Cultural Observation, L = Local Communication, H = Health Recovery, N = Experiencing Nature, U = Unexpectedness, and E = Self Expansion).
Figure 13. PCA score plot of summer season (S = Stimulation, C = Cultural Observation, L = Local Communication, H = Health Recovery, N = Experiencing Nature, U = Unexpectedness, and E = Self Expansion).
Tourismhosp 06 00133 g013
Figure 14. PCA score plot of fall season (S = Stimulation, C = Cultural Observation, L = Local Communication, H = Health Recovery, N = Experiencing Nature, U = Unexpectedness, and E = Self Expansion).
Figure 14. PCA score plot of fall season (S = Stimulation, C = Cultural Observation, L = Local Communication, H = Health Recovery, N = Experiencing Nature, U = Unexpectedness, and E = Self Expansion).
Tourismhosp 06 00133 g014
Figure 15. PCA score plot of winter season (S = Stimulation, C = Cultural Observation, L = Local Communication, H = Health Recovery, N = Experiencing Nature, U = Unexpectedness, and E = Self Expansion).
Figure 15. PCA score plot of winter season (S = Stimulation, C = Cultural Observation, L = Local Communication, H = Health Recovery, N = Experiencing Nature, U = Unexpectedness, and E = Self Expansion).
Tourismhosp 06 00133 g015
Table 1. Explanation of Tourist Motivation Scale.
Table 1. Explanation of Tourist Motivation Scale.
The Tourism Motivation ScaleExplanation
Stimulation (Stimul.)Experiencing novelty and change during travel
Cultural observation (Culture)Interest in the local culture of the destination
Local communication (Local)Engaging in communication with local residents
Health recovery (Health)Relieving fatigue and stress through travel
Experiencing nature (Nature)Direct interaction with natural environments
Unexpectedness (Unexpect.)Encountering surprising or unplanned experiences
Self-expansion (Self-exp.)Personal growth or transformation through travel
Table 2. The scoring criteria.
Table 2. The scoring criteria.
Degree of JudgementScore
Strongly related5
Closely related4
Moderately related3
Somewhat related2
Unrelated1
Table 3. ICC scores of each motivation factor for each spot.
Table 3. ICC scores of each motivation factor for each spot.
SpotStimul.CultureLocalHealthNatureUnexpect.Self-Exp.
Asahiyama Zoo0.120.470.220.040.430.250.29
Tanukikoji
Shopping Street0.340.830.620.380.160.620.79
Hokkaido Shrine0.510.810.810.460.890.730.48
Former Hokkaido
Govt. Office0.280.690.470.310.550.140.13
Noboribetsu
Hell Valley0.510.710.470.670.720.560.63
Odori Park0.810.770.600.810.950.870.93
Otaru Canal0.840.930.780.900.970.710.89
Otaru
Music Box Hall0.350.820.250.340.240.420.33
Sapporo TV Tower0.100.710.450.300.540.41−0.03
Shiroi Koibito Park0.890.900.760.840.890.720.94
Average0.480.760.540.510.630.540.54
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Z.; Eronen, J.; Masui, F.; Ptaszynski, M. Chinese Tourist Motivations for Hokkaido, Japan: A Hybrid Approach Using Transformer Models and Statistical Methods. Tour. Hosp. 2025, 6, 133. https://doi.org/10.3390/tourhosp6030133

AMA Style

Liu Z, Eronen J, Masui F, Ptaszynski M. Chinese Tourist Motivations for Hokkaido, Japan: A Hybrid Approach Using Transformer Models and Statistical Methods. Tourism and Hospitality. 2025; 6(3):133. https://doi.org/10.3390/tourhosp6030133

Chicago/Turabian Style

Liu, Zhenzhen, Juuso Eronen, Fumito Masui, and Michal Ptaszynski. 2025. "Chinese Tourist Motivations for Hokkaido, Japan: A Hybrid Approach Using Transformer Models and Statistical Methods" Tourism and Hospitality 6, no. 3: 133. https://doi.org/10.3390/tourhosp6030133

APA Style

Liu, Z., Eronen, J., Masui, F., & Ptaszynski, M. (2025). Chinese Tourist Motivations for Hokkaido, Japan: A Hybrid Approach Using Transformer Models and Statistical Methods. Tourism and Hospitality, 6(3), 133. https://doi.org/10.3390/tourhosp6030133

Article Metrics

Back to TopTop