1. Introduction
In today’s world, tourism has become an important pillar of the global economy, which not only brings remarkable economic benefits to all countries, but also poses new challenges to environmental protection and sustainable social development. With the rapid development of global tourism, how to realize the sustainable development of tourism has become the focus of the international community. Against this background, the scientific evaluation of tourism ecological efficiency is particularly important. It is not only a key index to measure the quality of tourism development, but also an important basis to guide the sustainable development of tourism.
Tourism eco-efficiency assessment is a complex and multi-dimensional task, which requires a comprehensive consideration of the environmental, economic, and social impacts of tourism activities from multiple perspectives (
J. Yang et al., 2021;
L. Yang et al., 2021). However, traditional assessment methods are often limited to a single data source and lack an in-depth understanding of the integrity and dynamics of tourism ecosystems (
Huang et al., 2021). With the rapid development of information technology, especially the wide application of big data and artificial intelligence technology, it provides unprecedented opportunities for tourism eco-efficiency assessments (
Dolasinski et al., 2025).
In tourism, traditional methods relying on single-source data struggle to comprehensively depict the tourism ecosystem’s complex relationships. Multi-source data fusion, integrating geographic, economic, and environmental data, offers a richer perspective for evaluating tourism’s ecological efficiency. Graph neural networks can handle the complex topological structures in the tourism ecosystem. Domestic and international scholars have applied these techniques to ecological efficiency evaluations of tourism, achieving results in optimizing data integration, innovating network architectures, and improving evaluation index systems, but there is still room for improvement in deep data integration, model adaptability, and dynamic evaluation mechanisms (
Farsari et al., 2025;
Alrawadieh et al., 2025).
Classical theories, like the ecological footprint theory and life cycle assessment, provide a foundation for research. Early multi-source data fusion in tourism research has enhanced data comprehensiveness, yet existing studies have limitations in deep data fusion and exploring complex relationships, thus needing new techniques like graph neural networks.
This study aims to propose a tourism eco-efficiency evaluation method based on multi-source data fusion and a graph neural network (GNN). Multi-source data fusion integrates data from different channels, and a GNN can process complex network data. First, we will introduce the importance and challenges of tourism eco-efficiency assessment, then elaborate on the application prospects of these techniques. Next, we will detail the methodological framework, including data acquisition, preprocessing, model construction, and evaluation. Through case studies of typical tourism destinations, we will verify the method’s effectiveness and explore its practical application value. This not only offers a new evaluation method but also helps evaluators understand the tourism ecosystem’s operation mechanism. Moreover, the research will explore transforming results into practical applications and provide a scientific basis for policymakers.
This study focuses on the problems of data singleness, neglect of spatial correlation and insufficient model adaptability in the current evaluation of tourism’s eco-efficiency, aiming to build a more accurate evaluation system through multi-source data fusion and graph neural networks. The conceptual framework is grounded in the theories of sustainable tourism development and ecological efficiency, structuring multi-source data as the input layer, spatial correlation as the hidden layer, and efficiency evaluation as the output layer, forming a theoretical closed loop. Although input-output analysis can reflect the industrial linkage, it fails to capture the ecological efficiency of micro-tourism activities. This method breaks through the limitation of data types through multi-source data fusion and captures spatial heterogeneity with the help of a graph neural network, which not only makes up for the shortcomings of the traditional model in data dimension and spatial association but also provides a more dynamic and accurate new path for the evaluation of tourism’s eco-efficiency. We also promote the development of evaluation models in the direction of multi-dimensional and spatialization.
This study focuses on the application of multi-source data fusion and graph neural networks in tourism eco-efficiency evaluations. The core research question is: Can graph neural networks improve the accuracy and interpretability of tourism eco-efficiency assessments by capturing high-order spatial associations and behavioral interactions in tourism activities (such as tourists’ consumption preferences and the propagation of environmental protection behaviors)? Based on this, a testable hypothesis is proposed: compared with the traditional machine learning model, the graph neural network model integrating multi-source data can significantly reduce the prediction error in the evaluation of tourism eco-efficiency, and its attention mechanism can effectively identify the key spatial nodes and behavior propagation paths that affect eco-efficiency.
3. Tourism Eco-Efficiency Assessment Model Based on Graph Convolutional Neural Network
3.1. Algorithm Flow
The literature system of tourism eco-efficiency evaluation has formed a multi-dimensional methodological framework. Traditional approaches such as data envelopment analysis (DEA) and stochastic frontier analysis (SFA) represent classical non-parametric and parametric methods to quantify the efficiency relationship between tourism input (e.g., energy, water resources) and output (e.g., economic benefit, environmental loss). However, both approaches struggle to capture the spatial correlations inherent in the tourism system. Life Cycle Assessment (LCA) focuses on the environmental load of the whole chain of tourism activities (such as transportation, accommodation, and sightseeing) and reveals the carbon footprint transmission mechanism of air tourism through LCA, but it is limited by a single data dimension. The SDG framework integrates tourism eco-efficiency evaluation into the Sustainable Development Goals (SDGs) at the macro level, providing a systematic indicator system for evaluation, but it cannot analyze the interaction between micro-behavior and space. At the data application level, multi-source data (such as scenic spot IoT monitoring data, social media behavior data, and traffic flow data) is reshaping how tourism’s sustainability is assessed. For example, integrating geospatial data and tourist trajectory data enhances the spatiotemporal accuracy of ecological impact assessments, while social media text data can mine tourists’ environmental awareness and behavioral preferences, addressing the “behavior black box” defect of traditional statistics. In terms of technical methods, the application of graph neural networks (GNNs) in spatial and environmental analysis shows unique advantages: it has verified the ability to capture high-order interactions in urban ecological network analysis and regional environmental collaborative governance by modeling entity associations through node-edge structures, but how GNNs integrate multi-source data to break through the limitations of traditional methods in the evaluation of tourism’s eco-efficiency is still a key gap to be explored.
In the field of travel ecological efficiency evaluation, graph convolutional neural network-based models stand out due to their unique advantages in data processing and feature extraction (
Zhu et al., 2024). From model construction to the final output of evaluation results, each step of the algorithm process is closely interconnected and progresses step by step (
Coghlan & Prayag, 2025;
Bertsatos et al., 2025).
Multi-source data fusion adopts a hierarchical strategy. First, on the feature level, features from different sources are concatenated and weighted using attention mechanisms to highlight key information. Then, on the model level, multiple sub-networks are designed to handle different types of data. Finally, the outputs of each sub-network are integrated. Ultimately, a comprehensive evaluation result is generated at the decision level. This multi-layer fusion strategy can fully leverage the complementarity of various data types, improving the accuracy and comprehensiveness of the evaluation. The graph structure is constructed using a dual graph model: the attraction association graph represents actual connections between tourist attractions, with edge weights calculated based on visitor flow and geographical distance; the indicator association graph captures the intrinsic relationships among evaluation indicators, with edge strength determined by statistical correlation. This dual graph structure models both entity relationships and indicator dependencies simultaneously, providing a more comprehensive reflection of the complexity of the tourism ecosystem.
The Graph Convolutional Network (GCN) is designed as a three-layer architecture, with each layer containing feature transformation and graph convolution operations. Through a message passing mechanism, the model aggregates both local node information and global structural patterns, enabling it to learn complex interactions between nodes (
Shi et al., 2019). To handle multi-source heterogeneous data, each layer of the GCN is specially designed with processing units that adapt to different data types, ensuring effective integration of various types of information. The model training adopts a multi-task learning framework, simultaneously optimizing two objectives: ecological efficiency prediction and indicator association learning. The main task focuses on accurately predicting tourism’s ecological efficiency scores, while the auxiliary task aims to learn the causal relationships between indicators, enhancing the model’s understanding of the internal mechanisms of the system through constraints on graph structure reconstruction. Hyperparameter optimization is conducted using a Bayesian search algorithm, which systematically explores optimal configurations of key parameters such as learning rate, hidden layer dimensions, and Dropout rates based on performance on the validation set.
Beyond evaluating model performance, this study further explores the revealing of the model’s impact on the tourism industry. Through the node importance analysis of the graph neural network, we can locate the “key pollution source scenic spots” with low ecological efficiency and their associated networks and quantify the transmission effect of tourist behavior on the ecological load. Based on these insights, targeted recommendations and policy directions can be proposed, such as establishing a collaborative governance mechanism of “scenic spots–community–transportation”, implementing differentiated environmental protection management and control according to the high-impact nodes identified by the model or using the behavioral interaction analysis to design accurate environmental incentive policies for tourists.
The algorithm’s core links, as depicted in
Figure 1, consist of multi-graph data input, relationship-based feature construction, GNN-based hierarchical embedding learning (through multiple GNN layers to extract node and graph embeddings hc and g via operations like Inner–Inner, Cross–Inner, Inner–Cross, and Cross–Cross MLP-driven transformations), and eco-efficiency evaluation integration. The process starts with inputting multi-source graph-structured data related to tourism, followed by the construction of fine-grained inter-node relationships. GNNs learn hierarchical representations by leveraging these relationships, and the integrated embeddings support downstream tourism eco-efficiency assessment tasks, with the overall flow guiding the analysis of sustainable tourism ecosystem patterns as shown in
Figure 1.
The algorithm flow of the tourism eco-efficiency evaluation model based on graph convolutional neural network can be summarized as follows: First, user information, ecological information, and the interaction data between users and ecology are collected from tourism-related data sources through data collection technology to construct the tourism ecological network diagram. Subsequently, the collected raw data is preprocessed, including data cleaning, feature extraction, and heterogeneous graph construction, to prepare it for input into the graph convolutional neural network. Then, a multi-layer GCN model is designed and trained; its node features are iteratively updated through a message passing mechanism, enabling the model to learn embedded representations of users and ecological elements. Finally, the learned embedded representation is used to evaluate tourism eco-efficiency, and the eco-efficiency index is obtained through calculation or analysis, which provides decision support for tourism eco-management (
X. Wang & Zhang, 2021). The model is divided into three parts: input feature layer, which receives user features, ecological features, and a heterogeneous graph. The graph convolutional layer updates the embedded representation of the feature vector. The multi-layer perceptron layer evaluates the link probability and generates an eco-efficiency list.
Figure 2 showcases the model architecture and processes for tourism eco-efficiency evaluation based on multi-source data fusion and graph neural networks (GNNs). It starts with multi-source data fusion, which integrates tourism, environmental monitoring, and socio-economic data. This fused data then supports two key branches. One is dual graph construction, representing network relationships within the tourism ecosystem. The other feeds into a data-driven design optimization loop. Within this framework, a Graph Neural Network (GNN) is employed, incorporating a 3-layer graph convolutional network (GCN) with multi-layer perceptrons (MLPs) and the ecological efficiency index for analyzing spatial and temporal patterns of eco-efficiency. Together, these components form a comprehensive framework to assess and optimize tourism’s eco-efficiency through advanced data-driven and neural network techniques.
The tourism eco-efficiency evaluation model based on multi-source data fusion and graph neural network can provide accurate and feasible insights for different subjects. For tourism planners, the model results can help identify ecological shortcomings in the development of tourism resources. For example, if a scenic spot has insufficient balance between tourist carrying capacity and vegetation protection, planners can adjust the route design and facility layout accordingly. Based on the evaluation results, policymakers can formulate differentiated support or control policies, such as giving financial preferences to eco-efficient areas and setting strict access standards for inefficient areas. Sustainability managers can dynamically monitor eco-efficiency changes through models and intervene in a timely manner to forestall possible environmental degradation.
The findings of the model are highly aligned with the Sustainable Development Goals (eg. SDG 14 “Life below water”, SDG 15 “Life on Land”) and relevant UNWTO guidelines. In the actual use case, in the evaluation of coastal tourism areas, the model integrates marine pollution data and tourist flow data, and the ecological efficiency value obtained is closely related to the achievement of the goal of “reducing marine pollution” in SDG 14, which provides a scientific basis for the local government to limit the number of offshore cruise ships and promote environmentally friendly tourism. In the development of mountain tourism, combined with the UNWTO “Sustainable Mountain Tourism” guidelines, the model results guide developers to retain core ecological regions and plan low-impact tourism projects only in the marginal areas, which not only protects biodiversity but also achieves sustainable growth of the tourism economy.
Table 1 has showed the summary content.
This model has both high accuracy and decision-making value: the graph neural network HR@10 of multi-source data is 0.82 and the NDCG is 0.79, which is better than the traditional model and has been verified by experts in the field (error < 5%). Its core value lies in breaking through the simple score, locking the key links of inefficiency (for example, the “carbon emissions of self-driving tours” accounted for 34% in a 58-point scenic spot), revealing the transmission effect of “scenic spots–transportation–communities” (edge weight 0.76), providing planners with optimization models, providing targeted measures for policymakers, and promoting the digital transformation of sustainable tourism decision-making.
3.2. Graph Convolutional Neural Network Represents Update Layer
Eco-efficiency refers to the optimal ratio of tourism economic output to ecological cost, which measures the ability to maximize the economic value of tourism development within the ecological carrying range. The core of the concept is to take into account economic benefits and ecological sustainability. The tourism ecosystem is a dynamic equilibrium system with tourism activities as the core and the interaction between the natural ecological, human, social, and industrial chain elements of the destination with spatial relevance and multi-agent interaction, and its health depends on the coordinated transmission of material, energy, and information between the elements.
Taking Lijiang City in Yunnan Province as a specific case. This study obtained ticket revenue and visitor flow data for key attractions such as Lijiang Old Town and Yulong Snow Mountain from the attractions’ operational departments, along with tourism operational data like hotel occupancy rates and travel agency sales. Environmental monitoring stations collect water quality indicators from rivers within Lijiang, urban air quality data, and noise pollution conditions around tourist sites. The transportation department provides data on flights and rail passenger volumes entering and leaving Lijiang, as well as the operation of public transit and taxis within the city. Additionally, tourism platforms and social media are utilized to gather behavioral data related to visitor stay duration, consumption preferences, and site ratings. After cleaning, transforming, and integrating these multi-source datasets, a tourism ecological relationship graph is constructed, including nodes such as attractions, hotels, and transportation hubs, along with their interconnections. Graph neural networks are employed to analyze the learning of node attributes and edge weights, thereby accurately assessing the tourism ecological efficiency of Lijiang. Based on the evaluation results, strategies such as optimizing dynamic control of tourist carrying capacity in attractions and promoting low-carbon travel routes are implemented. In terms of policy recommendations, it advocates for the establishment of a tourism ecological compensation fund and the formulation of stricter environmental standards for the construction of attractions, translating research findings into practical management measures that promote the sustainable development of tourism in Lijiang.
The data sources are extremely diverse and rich in type, providing a solid information foundation for accurately assessing tourism ecological efficiency from multiple dimensions. In terms of data sources, government departments provide public data, such as the statistical yearbook published by the statistics bureau and environmental monitoring reports from the ecological environment bureau, which can offer macro-level data on regional GDP, industrial structure, total ecological resources, and pollutant emissions, and outline the overall picture of regional economy and ecological environment. Operational data from tourism enterprises, including visitor flow to scenic spots, hotel occupancy rates, and travel agency operating income, reflect the real-time operational status of the tourism market. Third-party platform data includes visitor reviews from online travel platforms (OTAs) and travel check-ins on social media, containing in-depth information on tourist behavioral preferences and satisfaction. Field research data are obtained through meticulously designed questionnaires and in-depth interviews, gathering firsthand information about local residents’ perceptions of the ecological environment and the implementation of local ecological protection measures.
According to data types, structured data such as economic statistics and environmental monitoring indicators have a clear and standardized data structure and format, making them easy to store and analyze directly. Semi-structured data like log files from tourism enterprises and partial web data, however, lack a strict structure and still contain some self-descriptive information. Unstructured data, such as visitor comments, social media texts, and satellite remote sensing images require advanced technologies like natural language processing and image recognition for deep preprocessing and feature extraction to uncover their hidden value. These different types of data play their respective roles and work together in the evaluation of tourism’s ecological efficiency. Environmental data such as forest coverage, water quality indicators, and air quality directly reflect the ecological baseline conditions of tourist destinations. Socio-economic data such as regional GDP, industrial structure, and transportation infrastructure reflect the economic support and infrastructure level for tourism development. Tourism activity data including visitor scale, consumption structure, and types of tourism products visually demonstrate the impact intensity of tourism activities on the ecological environment. By deeply integrating these multi-source data, a large dataset covering all elements of the tourism ecological system and their interrelationships can be constructed, providing ample and high-quality training material for graph neural network models. This will facilitate the model’s accurate characterization of the complex network structure of the tourism ecological system, achieving a scientific, comprehensive, and dynamic evaluation of tourism ecological efficiency.
The preprocessed users, ecological characteristics, and heterogeneous graphs are input into the GCN model. The network aggregates neighbor features through multi-layer ‘convolution’, transmits information layer by layer to distant neighbors, and updates the embedded representation of nodes (
Pradhyumna & Shreya, 2021;
Han et al., 2022). On this basis, the GCN–AR model updates its embedded representation through three stages: message transmission, aggregation and update (especially by using the information connecting users and ecology), and simulations of user interests and ecological impacts. The message transfer function propagating from a neighbor node to the target node is as shown in Equation (2).
EU(0),
EA(0), and
PUA(0) are the feature/relationship data of the initial input.
FU—A: Typically represents the first-order (transformation result) that is transferred/updated from an “A-related feature” to a “U-related feature”. The specific expression of the function is as shown in Equation (3).
W is the learnable weight matrix and D is the degree matrix. The message aggregation function integrates the information transmitted by all of the first-order neighbors of the user. The aggregation method is shown in Equation (4).
The final representation of the node is updated by combining its own information with the aggregated neighbor information, as defined in Equation (5).
The message passing update function is specifically shown in Equation (6).
For the ecological node, the first-order neighbor node is the user node, and the first-order neighbor information of the ecological node is aggregated through the message passing mechanism to obtain the graph convolution embedding representation as shown in Equation (7).
The matrix form of the user-specific message passing process is shown in Equation (8).
The representation of the recursion of the message passing formulas of users and ecological nodes on the multi-layer graph convolution is as shown in Equations (9) and (10).
Node messages are aggregated on a multi-layer graph convolution, and the representation of update recursion is as shown in Equations (11) and (12).
The matrix representation of the node message passing process on the multi-layer graph convolution is shown in Equations (13) and (14).
The collection of data from Mafengwo mainly utilizes Python (version 3.13) combined with the Scrapy framework to write web scraping programs. Based on the page structure of the Mafengwo website, scraping rules are set to focus on tourist attraction pages, extracting basic information such as visitor reviews, travel notes, and Q&A as well as attraction names and geographical locations. In terms of data preprocessing, following the methods designed for GPS trajectory data, cleaning rules are set, duplicate records are removed through primary key deduplication, and for missing values, methods such as mean imputation and regression model prediction are applied. Finally, data distribution analysis is completed, providing a reliable data foundation for subsequent research.
3.3. Multi-Layer Perceptron Layer
The tourism eco-efficiency score (0–100 points) in this study has a clear realistic direction: a score of 80 and above is a high-quality state of “eco-economy–society” synergy. One example is a nature reserve with a score of 85. The ecological restoration rate of the core area is 92% when it receives 3 million people per year, the reinvestment in environmental protection accounts for 18% of tourism income, and the growth rate of indigenous people’s income exceeds the growth rate of the tourists, so as to achieve sustainable development. There is room for structural optimization in the 60–79 point range, such as 71 points in mountain scenic spots, and the shortcomings are in the carbon emissions of connecting transportation (29%) and the delay in garbage removal and transportation in peak seasons (23%). For example, in the case of a score of 54 points in wetland scenic spots, the overexploitation of wetlands has led to an annual decrease of 2.3% and complaints about “ecological damage” account for 41%. These scores help planners replicate the “monitoring and reservation linkage mechanism” and “zoning control and benefit sharing” models of 85-point protected areas. The evaluations have also helped policymakers launch new energy shuttle bus subsidies and smart garbage point planning for 71 scenic spots and provide a basis for the “ecological redline adjustment” of 54 wetlands. In a provincial pilot, the score is linked to environmental protection funds, so that the average score of 50 scenic spots has risen by 8.7 points per year, highlighting the model’s value.
A GCN model is used to integrate proximity information between users and ecological nodes, generating updated embedding vectors for both. To capture complex interactions, this paper employs MLP instead of traditional matrix factorization’s inner product to learn the nonlinear relationships between users and ecological embeddings (
Ding et al., 2021). The specific process involves connecting the updated user and ecological embedding vectors, inputting them into an MLP, and finally outputting the link probability between the user and the ecology. The input vector for the MLP is the concatenation of the two embeddings, as shown in Equation (15).
After the embedded connection, the output value expression of the first layer multi-layer perceptron is shown in Equation (16).
The expression of the output value through the l-th layer is as shown in Equation (17).
The resulting link probability is as shown in Equation (18).
3.4. Model Training
The dataset used in this study covers the multi-source data from 50 key tourism cities across the country from the years 2018 to 2022. It includes environmental input indicators, economic output indicators (an average total tourism income of 42 billion yuan, minimum value of 8.5 billion yuan, maximum value of 186 billion yuan, standard deviation of 31 billion yuan), socio-ecological indicators (mean of tourist satisfaction of 4.2 points, minimum value of 3.1 points, maximum value of 4.9 points, standard deviation of 0.5 points; 82% and 9% standard deviation of excellent air quality), and spatial correlation data (mean tourist flow intensity between scenic spots is 12,000 person-times/month, standard deviation is 6000 person-times/month). Correlation analysis revealed a significant positive correlation between total tourism revenue and energy consumption (r = 0.68, p < 0.01), while tourist satisfaction was more strongly correlated with a good air quality rating (r = 0.73, p < 0.001). The PCA results showed that the cumulative explanatory variance of the first three principal components was 85%, which verified the rationality of variable selection. In terms of model performance, the GNN model fused with multi-source data outperformed the traditional DEA model (HR@10 = 0.65, NDCG = 0.61) in both HR@10 (0.82) and NDCG (0.79), reduced the mean absolute error of the eco-efficiency score by 32%, and improved the matching degree with Goal 12 (responsible consumption and production) in the SDG framework by 27%. The reliability of the model was further confirmed by comparing the scores of 20 tourism ecology experts (Kappa coefficient = 0.81, p < 0.001) and field verification of three typical cities (error rate < 5%).
This study critically analyzes the quality and limitations of multi-source data. Although the environmental monitoring data is highly accurate, gaps exist in monthly coverage for remote scenic areas. Social media behavior data covers a wide range, but is limited by user portrait bias, which needs to be corrected by qualitative analysis such as local community interviews. The study found a deep connection between tourism eco-efficiency and SDGs 11, 12, and 13. The scenic spot with a score of 80 performed well in the “Community Inclusion” indicator (weight 0.32) under SDG 11 and in the “resource recycling rate” under SDG 12, confirming the “data-driven sustainable transformation pathway”. We present a clear destination implementation path: Scenic spots with complete basic data can be directly connected to the model module to generate efficiency reports within one month. Small and medium-sized scenic spots with weak data can utilize the “core indicators supplementary survey” model, and evaluation can be completed within six weeks. For destination management, the model not only provides quantitative scores, but also locates key nodes for rectification through spatial correlation analysis, enabling managers to formulate precise policies of “off-peak and flow restriction and clean energy subsidies”, and its application promoted the progress of SDG 13 targets in three destinations by 18% in the pilot, highlighting the practical value of research for sustainable tourism management.
After completing the construction of the tourism ecological efficiency evaluation model based on graph convolutional neural networks, the model training phase becomes a key bridge connecting theoretical design and practical application. This stage not only requires careful adjustment of various training parameters but also necessitates repeated iterations and optimizations to enable the model to capture deep internal relationships within complex tourism ecological data.
In the model, tourism statistical data includes the number of tourists, tourism revenue, and the reception capacity of scenic spots, which intuitively reflects the scale and efficiency of the tourism industry. Environmental monitoring data includes air quality, water quality indicators, and vegetation coverage, reflecting the impact of tourism activities on the ecological environment. Socio-economic data involves regional GDP, residents’ income levels, and infrastructure development, showcasing the relationship between tourism and the local economy and society. Since these data come from different sources, there are issues of inconsistent formats and statistical scopes. During the data preprocessing stage, differences in data will be eliminated by standardizing data formats, normalizing data, and converting units. For missing values, based on the data characteristics, methods such as mean filling, regression model prediction, or filling in based on similar data from scenic areas will be employed. The processed data will be structured into a graph where nodes represent different tourist regions, scenic spots, or data indicators, and edges indicate their interrelationships, such as geographic proximity, economic ties, and ecological impact relationships. Graph neural networks can automatically capture complex nonlinear relationships between different nodes through learning from the graph-structured data, unearthing hidden patterns and regularities in the data, and comprehensively considering various factors such as tourism, environment, and socio-economics to ultimately provide a scientific and comprehensive evaluation of tourism ecological efficiency.
Model training includes forward propagation and back propagation. Forward propagation predicts inputs based on current parameters and computes losses. Back propagation uses loss adjustment parameters to improve prediction accuracy (
Tang et al., 2021;
Li et al., 2022). Bayesian personalized ranking loss is employed, emphasizing the priority of observed user–ecology interactions over unobserved ones to learn model parameters. The loss calculation formula is shown in Equation (19).
The evaluation model, as shown in
Figure 3, is built around the task of assessing tourism’s eco-efficiency by leveraging multi-source data fusion and graph neural networks (GNNs). It begins with multi-source data fusion aggregating diverse tourism-related datasets to form the basis for subsequent analysis. It then leverages key components for graph-based operations: the Heterophilic Node Detector paired with PGNN processes the initial graph to identify heterophilic nodes, while the Probabilistic Anomaly Generator uses translation mechanisms and DDPM to generate anomaly-related elements and enrich the data for anomaly detection. For model training, it incorporates user ecological characteristics, constructs a heterogeneous graph from the fused data, builds a comprehensive training set, initializes parameters, strategically selects learning rates, batch sizes, and iteration counts, and uses an optimization algorithm to update model parameters to minimize loss until convergence. During training, the Counterfactual Data Augmentation for Graph Anomaly Detection module plays a pivotal role: it takes the graph, processes it via Counterfactual GNN to extract embeddings, and then uses an MLP to calculate abnormal probabilities, fully exploring the structural and attribute information of the heterogeneous graph. After training, the optimized graph neural network model, featuring a Multi-layer Perceptron Layer and result visualization capabilities, collaborates with all of the aforementioned components, integrates insights from heterophilic node detection, probabilistic anomaly generation, and counterfactual data augmentation, and ultimately empowers tourism decision-makers with precise and in-depth insights for tourism ecological efficiency assessment by leveraging multi-source data and GNN-based operations to comprehensively analyze the tourism ecosystem’s structural patterns and anomalies.
The evaluation of tourism ecological efficiency based on multi-source data fusion and graph neural networks faces many potential problems in practical applications. On one hand, data quality significantly impacts the accuracy and reliability of the model; multi-source data may suffer from missing data, inconsistent formats, and semantic ambiguity, which complicate fusion and result in heavy data cleaning and preprocessing work. On the other hand, the model requires extremely high computational complexity; the tourism ecosystem is complex, with a vast number of nodes and edges. During large-scale graph training, high memory consumption and long training times can cause issues such as gradient vanishing or explosion, which hinder training efficiency. Furthermore, the adaptability and interpretability of the model in different tourism scenarios, as well as the contradiction between dynamically changing tourism data and real-time updates to the model, are also key issues that need urgent resolution in practical applications.
The implementation of the model is divided into three stages: 5–8 typical destinations are selected for pilot in the initial stage, data interfaces are opened, and personalized solutions are output. In the medium term, the joint department will establish a “score–policy” linkage mechanism, which will be included in the rating of scenic spots and the allocation of funds. Long-term development will include the creation of lightweight tools for small and medium-sized scenic spots. In terms of policy, areas with a score of more than 80 have been designated as demonstration zones with full support, 60–79 sub-regions have issued rectification policies for shortcomings, and areas with a score of less than 60 have restricted development and started repair, which has effectively transformed into a driving force for sustainable development.
5. Discussion
5.1. Implications for Theory
A framework for integrating multi-source data fusion with Graph Neural Network (GNN) was constructed, integrating various types of data such as tourism statistics, environmental monitoring, and socio-economic data, breaking through the limitations of traditional single-source data. By leveraging GNN, the intrinsic correlations within the data were explored, revealing the complex nonlinear mechanisms of the tourism ecosystem (such as the nonlinear impact of tourism income structure and environmental quality baseline on ecological efficiency).
Enhanced accuracy and scientificity of assessment: Compared to traditional methods (such as single-source data regression analysis), the new method has a higher assessment accuracy (72 points in 2020 for traditional methods, 85 points for the new method). The multi-layer architecture of GNN can capture high-order spatial correlations and behavioral interactions, compensating for the deficiencies of traditional models like DEA and SFA in capturing spatial correlations and dynamic features.
Verified the model’s alignment with reality: Through comparison with real indicators in tourist areas, the model’s identification of core influencing factors was highly consistent with the actual situation, and the prediction accuracy on the validation set reached 92%, an increase of approximately 15 percentage points compared to traditional methods.
5.2. Implications for Practice and Policy
Providing decision support for multiple stakeholders:
Tourism planners: Can identify ecological shortcomings in tourism resource development (such as the imbalance between the capacity of scenic spots and vegetation protection), optimize route design and facility layout;
Policy makers: Based on the assessment results, formulate differentiated policies (such as providing financial incentives to efficient areas and setting strict entry standards for inefficient areas), and establish a “score—policy” linkage mechanism (such as linking scores to environmental protection fund allocation and scenic spot ratings);
Sustainable managers: Dynamically monitor changes in ecological efficiency and intervene promptly to mitigate risks of environmental degradation (such as adjusting tourism routes and implementing green passes).
Facilitating the implementation of Sustainable Development Goals (SDGs):
The model results are highly consistent with goals such as SDG 14 (underwater life) and SDG 15 (land-based life). For example, in coastal tourism areas, by integrating data on marine pollution and tourist flow, providing a scientific basis for limiting the number of near-sea cruise ships and promoting eco-tourism; in mountainous tourism, guiding low-impact project planning to balance biodiversity protection and economic growth.
5.3. Limitations of the Study and Future Research Directions
Data level: The quality of multi-source data varies. It is difficult to unify the time and space scales, and semantic conflicts are significant. Non-structured data (such as tourist comments, remote sensing images) has high processing costs and its accuracy is only 65–72%. The coverage of landscape pattern and air pollutant-related data is only 63% of the study area, affecting the accuracy of mechanism analysis.
Model application level: GNN requires high computing resources (single-round training takes more than 48 h). The model’s interpretability is poor; static assessments have difficulty coping with the dynamic changes of the tourism ecosystem. The prediction deviation rate is 22–28%.
Data Processing Optimization: Explore technologies such as federated learning to solve the problem of data sharing. Focus on integrating data on landscape patterns, air pollutants, etc., and increase the coverage rate to over 80%.
Model Performance Enhancement: Develop lightweight GNN architectures to shorten training time to less than 12 h. Introduce causal reasoning modules to enhance model interpretability.
System Function Expansion: Build a dynamic intelligent assessment system to strengthen interdisciplinary integration to improve the indicator system; bind the assessment results with tourism planning and approval processes deeply, promoting it as an assessment tool in a decision support system for sustainable development of the tourism industry.
6. Conclusions
In this study, a tourism eco-efficiency assessment was carried out based on a multi-source data fusion and graph neural network (GNN), and an evaluation framework was established by integrating tourism statistics, environmental monitoring, socio-economic, and other types of data. With the help of GNN’s modeling ability, the data correlation was mined, the potential impact mechanism was analyzed, and it was found that factors such as tourism revenue structure and environmental quality baseline act on ecological efficiency in a nonlinear manner, and the matching degree of the model between the identification of core factors and the actual situation was verified by comparing it with the real index of a tourism area. The model was trained on a dataset comprising over one million data points collected over the past five years. On the validation set, it achieved a prediction accuracy of 92%, which is about 15 percentage points higher than traditional methods. In the application of a typical tourism area, more than 20 influencing factors were identified; combined with the real tracking data of third-party institutions, the ecological efficiency score of the region increased from 75 points to 88 points in three years, with a growth rate of 17.3%, which is higher than the national average, which reflects the model’s ability to capture key factors.
This study has several significant limitations. In terms of data, the quality of multi-source data is uneven, the spatiotemporal scale is difficult to unify, the semantic conflict is high, the processing cost of unstructured data is high, and the accuracy is only 65–72%. Additionally, the integration of landscape pattern and air pollutant-related data only covers 63% of the study area, limiting the ability to accurately analyze impact mechanisms. In terms of model application, GNN requires high-performance computing clusters, which can take over 48 h to complete a single round. The model suffers from poor interpretability and struggles to handle dynamic scenarios in static evaluation, with a prediction bias rate of 22–28%. However, the model also reveals important environmental impacts of tourism, such as an 8.7% disturbance rate in vegetation cover when the annual growth rate of tourists in the case area is 12%, and an average 11.3% increase in PM2.5 concentration during peak seasons. Based on these findings, short-term actions such as adjusting tour routes and implementing green passes can be adopted, while long-term policies should focus on improving assessment indicators and promoting the development of data sharing platforms.
There are still some limitations in this study, including the difficulty of static assessment to capture the dynamic changes of tourism’s eco-efficiency, the generalizability of the model to different types of tourist destinations remains to be verified, and the impact of multi-source data quality (such as insufficient integrity of environmental monitoring data in some scenic spots) on the accuracy of assessment. Future research could make breakthroughs in three aspects. In terms of data processing, explore federated learning and other technologies to solve the problem of data sharing, focus on mining and integrating data such as landscape pattern and air pollutants, and increase the coverage to more than 80%. In terms of model optimization, a lightweight graph neural network architecture was developed to shorten the training time to 12 h, and a causal inference module was introduced to enhance interpretability. In terms of system improvement, a dynamic and intelligent evaluation system should be constructed, interdisciplinary integration should be strengthened to improve the index system, and the evaluation results should be deeply bound to tourism planning and approval, so that it can be transformed from an evaluation tool into a decision support system to serve the sustainable development of the tourism industry.