Tourism Ecological Efficiency Assessment Based on Multi-Source Data Fusion and Graph Neural Network

Lin, Luoyanzi; Lv, Jiehua

doi:10.3390/admsci15090334

Open AccessArticle

Tourism Ecological Efficiency Assessment Based on Multi-Source Data Fusion and Graph Neural Network

by

Luoyanzi Lin

and

Jiehua Lv

^*

School of Economics and Management, Northeast Forestry University, Harbin 150006, China

^*

Author to whom correspondence should be addressed.

Adm. Sci. 2025, 15(9), 334; https://doi.org/10.3390/admsci15090334

Submission received: 17 June 2025 / Revised: 14 August 2025 / Accepted: 19 August 2025 / Published: 27 August 2025

Download

Browse Figures

Versions Notes

Abstract

Current research on evaluating tourism’s ecological efficiency using multi-source data fusion and graph neural networks has notable limitations. At the data level, integrating diverse sources is difficult due to differences in format, quality, and meaning. Data cleaning and preprocessing can lead to information loss, and relying on a single source often fails to reflect the complexity of tourism ecosystems. At the model level, traditional methods struggle to identify unreliable data and lack scientific rigor in handling expected and unexpected outcomes. These issues reduce the accuracy and practical value of evaluation results. This paper introduces a new method for assessing tourism’s ecological efficiency based on multi-source data fusion and graph neural networks. First, we integrate tourism statistics, environmental monitoring, and socio-economic data into a comprehensive dataset. Then, we apply a graph neural network (GNN) model to uncover hidden relationships and patterns, enabling a more accurate assessment of tourism’s environmental impact. The method also analyzes how tourism’s ecological efficiency varies across time and regions. We validate the method through case studies of representative tourist destinations and discuss its application in tourism planning. Regression analysis based on a single data source yields a 2020 tourism ecological efficiency score of 72. In contrast, using multi-source data fusion and GNN, the score rises to 85—an improvement of 13 points. This study offers a new approach to evaluating tourism’s ecological efficiency, enhances our understanding of tourism ecosystems, and supports sustainable tourism development.

Keywords:

efficiency assessment; graph neural network; multi-source data fusion; tourism ecology

1. Introduction

In today’s world, tourism has become an important pillar of the global economy, which not only brings remarkable economic benefits to all countries, but also poses new challenges to environmental protection and sustainable social development. With the rapid development of global tourism, how to realize the sustainable development of tourism has become the focus of the international community. Against this background, the scientific evaluation of tourism ecological efficiency is particularly important. It is not only a key index to measure the quality of tourism development, but also an important basis to guide the sustainable development of tourism.

Tourism eco-efficiency assessment is a complex and multi-dimensional task, which requires a comprehensive consideration of the environmental, economic, and social impacts of tourism activities from multiple perspectives (J. Yang et al., 2021; L. Yang et al., 2021). However, traditional assessment methods are often limited to a single data source and lack an in-depth understanding of the integrity and dynamics of tourism ecosystems (Huang et al., 2021). With the rapid development of information technology, especially the wide application of big data and artificial intelligence technology, it provides unprecedented opportunities for tourism eco-efficiency assessments (Dolasinski et al., 2025).

In tourism, traditional methods relying on single-source data struggle to comprehensively depict the tourism ecosystem’s complex relationships. Multi-source data fusion, integrating geographic, economic, and environmental data, offers a richer perspective for evaluating tourism’s ecological efficiency. Graph neural networks can handle the complex topological structures in the tourism ecosystem. Domestic and international scholars have applied these techniques to ecological efficiency evaluations of tourism, achieving results in optimizing data integration, innovating network architectures, and improving evaluation index systems, but there is still room for improvement in deep data integration, model adaptability, and dynamic evaluation mechanisms (Farsari et al., 2025; Alrawadieh et al., 2025).

Classical theories, like the ecological footprint theory and life cycle assessment, provide a foundation for research. Early multi-source data fusion in tourism research has enhanced data comprehensiveness, yet existing studies have limitations in deep data fusion and exploring complex relationships, thus needing new techniques like graph neural networks.

This study aims to propose a tourism eco-efficiency evaluation method based on multi-source data fusion and a graph neural network (GNN). Multi-source data fusion integrates data from different channels, and a GNN can process complex network data. First, we will introduce the importance and challenges of tourism eco-efficiency assessment, then elaborate on the application prospects of these techniques. Next, we will detail the methodological framework, including data acquisition, preprocessing, model construction, and evaluation. Through case studies of typical tourism destinations, we will verify the method’s effectiveness and explore its practical application value. This not only offers a new evaluation method but also helps evaluators understand the tourism ecosystem’s operation mechanism. Moreover, the research will explore transforming results into practical applications and provide a scientific basis for policymakers.

This study focuses on the problems of data singleness, neglect of spatial correlation and insufficient model adaptability in the current evaluation of tourism’s eco-efficiency, aiming to build a more accurate evaluation system through multi-source data fusion and graph neural networks. The conceptual framework is grounded in the theories of sustainable tourism development and ecological efficiency, structuring multi-source data as the input layer, spatial correlation as the hidden layer, and efficiency evaluation as the output layer, forming a theoretical closed loop. Although input-output analysis can reflect the industrial linkage, it fails to capture the ecological efficiency of micro-tourism activities. This method breaks through the limitation of data types through multi-source data fusion and captures spatial heterogeneity with the help of a graph neural network, which not only makes up for the shortcomings of the traditional model in data dimension and spatial association but also provides a more dynamic and accurate new path for the evaluation of tourism’s eco-efficiency. We also promote the development of evaluation models in the direction of multi-dimensional and spatialization.

This study focuses on the application of multi-source data fusion and graph neural networks in tourism eco-efficiency evaluations. The core research question is: Can graph neural networks improve the accuracy and interpretability of tourism eco-efficiency assessments by capturing high-order spatial associations and behavioral interactions in tourism activities (such as tourists’ consumption preferences and the propagation of environmental protection behaviors)? Based on this, a testable hypothesis is proposed: compared with the traditional machine learning model, the graph neural network model integrating multi-source data can significantly reduce the prediction error in the evaluation of tourism eco-efficiency, and its attention mechanism can effectively identify the key spatial nodes and behavior propagation paths that affect eco-efficiency.

2. Related Theories and Technologies

2.1. Graph Convolutional Neural Network

A graph convolutional neural network is a deep learning model designed specifically to process graph structure data (Z. Liu et al., 2021). In graph theory, a graph is a structure composed of nodes and edges. GCNs update the feature representation of nodes by efficiently aggregating information from their neighbors.

The core idea of GCNs is to extend the convolution operation from traditional Euclidean domains (such as image or video data arranged in regular grids) to non-Euclidean domains such as graph-structured data. In conventional convolutional neural networks (CNNs), convolution kernels slide over fixed-sized meshes to extract local features. In GCNs, the convolution operation depends on the topology of the graph to refine the feature representation of the node by aggregating the feature information of the node’s neighbors (L. Wang et al., 2021; Zhao et al., 2021).

The basic architecture of GCNs typically includes several graph convolution layers, each of which receives graph structure data and features of nodes as inputs, and outputs updated node representations (J. Meng et al., 2023). In each graph convolutional layer, a node updates its own feature representation by aggregating the feature information of its neighbors. This process can be repeated for multiple layers to capture a wider range of neighbor information in the graph. Ultimately, GCNs’ outputs facilitate diverse graph-based downstream tasks: node classification, graph classification, and link prediction (Gao et al., 2021). Equation (1) illustrates the frequency response function’s filtering capability, formulated as a higher-order polynomial. This approach integrates graph the data’s structural and attribute features, feeding the combined vector into downstream tasks for final outcomes.

H^{(l + 1)} = σ ({\tilde{D}}^{- \frac{1}{2}} {\tilde{A D}}^{- \frac{1}{2}} H^{(l)} W^{(l)})

(1)

2.2. Multi-Source Data Fusion Technology

Multi-source data fusion refers to the collection and processing of heterogeneous data to generate more comprehensive and accurate information. This process enhances the consistency and reliability of information and supports accurate evaluations and decision-making (Gu & Shakaraliyeva, 2025; L. Zhang et al., 2025). The data it processes comes from different systems or sensors. Data fusion is mainly divided into three categories:

(1): Data-level fusion: The original data is directly merged to retain data integrity, but it is affected by the uncertainty of the original data, with low robustness and few applications.
(2): Decision-level fusion: Combines decision outputs from various data sources; offers a high fault tolerance rate but low accuracy.
(3): Feature-level fusion: Extracts the features from the raw data, then fuses the feature vectors to provide a comprehensive and consistent description. This method is flexible and widely used in practice.

2.2.1. Generative Adversarial Network Based on Wasserstein Distance Improvement

The GAN comprises a generator and a discriminator. The generator mimics real data to create realistic samples, while the discriminator distinguishes between real and generated data (Chen et al., 2021). These two models improve each other in confrontation training until the generated data closely resemble the real data, achieving an optimal balance. Generators are usually built from multi-layer neural networks, aiming to continuously improve the quality of generated virtual data and obfuscate discriminators.

2.2.2. LSTM and Self-Attention Mechanism

The LSTM regulates information flow using a memory cell and three gates mechanism of forgetting, input, and output. The attention mechanism focuses on important information and improves the accuracy of the model, especially the self-attention mechanism, to maximize information interaction (Yasunaga et al., 2021; Xu et al., 2022). The self-attention mechanism allows the model to pay attention to all other elements in the sequence when processing each element of the sequence, thus capturing the dependencies between elements. By calculating the correlation between any two elements in the sequence, the self-attention mechanism can dynamically adjust the attention weight, making the model more focused on the information useful to the current task (Yu et al., 2022; Réau et al., 2023). This mechanism simulates humans focusing on key points and ignoring irrelevant details to fully understand the goal.

3. Tourism Eco-Efficiency Assessment Model Based on Graph Convolutional Neural Network

3.1. Algorithm Flow

The literature system of tourism eco-efficiency evaluation has formed a multi-dimensional methodological framework. Traditional approaches such as data envelopment analysis (DEA) and stochastic frontier analysis (SFA) represent classical non-parametric and parametric methods to quantify the efficiency relationship between tourism input (e.g., energy, water resources) and output (e.g., economic benefit, environmental loss). However, both approaches struggle to capture the spatial correlations inherent in the tourism system. Life Cycle Assessment (LCA) focuses on the environmental load of the whole chain of tourism activities (such as transportation, accommodation, and sightseeing) and reveals the carbon footprint transmission mechanism of air tourism through LCA, but it is limited by a single data dimension. The SDG framework integrates tourism eco-efficiency evaluation into the Sustainable Development Goals (SDGs) at the macro level, providing a systematic indicator system for evaluation, but it cannot analyze the interaction between micro-behavior and space. At the data application level, multi-source data (such as scenic spot IoT monitoring data, social media behavior data, and traffic flow data) is reshaping how tourism’s sustainability is assessed. For example, integrating geospatial data and tourist trajectory data enhances the spatiotemporal accuracy of ecological impact assessments, while social media text data can mine tourists’ environmental awareness and behavioral preferences, addressing the “behavior black box” defect of traditional statistics. In terms of technical methods, the application of graph neural networks (GNNs) in spatial and environmental analysis shows unique advantages: it has verified the ability to capture high-order interactions in urban ecological network analysis and regional environmental collaborative governance by modeling entity associations through node-edge structures, but how GNNs integrate multi-source data to break through the limitations of traditional methods in the evaluation of tourism’s eco-efficiency is still a key gap to be explored.

In the field of travel ecological efficiency evaluation, graph convolutional neural network-based models stand out due to their unique advantages in data processing and feature extraction (Zhu et al., 2024). From model construction to the final output of evaluation results, each step of the algorithm process is closely interconnected and progresses step by step (Coghlan & Prayag, 2025; Bertsatos et al., 2025).

Multi-source data fusion adopts a hierarchical strategy. First, on the feature level, features from different sources are concatenated and weighted using attention mechanisms to highlight key information. Then, on the model level, multiple sub-networks are designed to handle different types of data. Finally, the outputs of each sub-network are integrated. Ultimately, a comprehensive evaluation result is generated at the decision level. This multi-layer fusion strategy can fully leverage the complementarity of various data types, improving the accuracy and comprehensiveness of the evaluation. The graph structure is constructed using a dual graph model: the attraction association graph represents actual connections between tourist attractions, with edge weights calculated based on visitor flow and geographical distance; the indicator association graph captures the intrinsic relationships among evaluation indicators, with edge strength determined by statistical correlation. This dual graph structure models both entity relationships and indicator dependencies simultaneously, providing a more comprehensive reflection of the complexity of the tourism ecosystem.

The Graph Convolutional Network (GCN) is designed as a three-layer architecture, with each layer containing feature transformation and graph convolution operations. Through a message passing mechanism, the model aggregates both local node information and global structural patterns, enabling it to learn complex interactions between nodes (Shi et al., 2019). To handle multi-source heterogeneous data, each layer of the GCN is specially designed with processing units that adapt to different data types, ensuring effective integration of various types of information. The model training adopts a multi-task learning framework, simultaneously optimizing two objectives: ecological efficiency prediction and indicator association learning. The main task focuses on accurately predicting tourism’s ecological efficiency scores, while the auxiliary task aims to learn the causal relationships between indicators, enhancing the model’s understanding of the internal mechanisms of the system through constraints on graph structure reconstruction. Hyperparameter optimization is conducted using a Bayesian search algorithm, which systematically explores optimal configurations of key parameters such as learning rate, hidden layer dimensions, and Dropout rates based on performance on the validation set.

Beyond evaluating model performance, this study further explores the revealing of the model’s impact on the tourism industry. Through the node importance analysis of the graph neural network, we can locate the “key pollution source scenic spots” with low ecological efficiency and their associated networks and quantify the transmission effect of tourist behavior on the ecological load. Based on these insights, targeted recommendations and policy directions can be proposed, such as establishing a collaborative governance mechanism of “scenic spots–community–transportation”, implementing differentiated environmental protection management and control according to the high-impact nodes identified by the model or using the behavioral interaction analysis to design accurate environmental incentive policies for tourists.

The algorithm’s core links, as depicted in Figure 1, consist of multi-graph data input, relationship-based feature construction, GNN-based hierarchical embedding learning (through multiple GNN layers to extract node and graph embeddings hc and g via operations like Inner–Inner, Cross–Inner, Inner–Cross, and Cross–Cross MLP-driven transformations), and eco-efficiency evaluation integration. The process starts with inputting multi-source graph-structured data related to tourism, followed by the construction of fine-grained inter-node relationships. GNNs learn hierarchical representations by leveraging these relationships, and the integrated embeddings support downstream tourism eco-efficiency assessment tasks, with the overall flow guiding the analysis of sustainable tourism ecosystem patterns as shown in Figure 1.

The algorithm flow of the tourism eco-efficiency evaluation model based on graph convolutional neural network can be summarized as follows: First, user information, ecological information, and the interaction data between users and ecology are collected from tourism-related data sources through data collection technology to construct the tourism ecological network diagram. Subsequently, the collected raw data is preprocessed, including data cleaning, feature extraction, and heterogeneous graph construction, to prepare it for input into the graph convolutional neural network. Then, a multi-layer GCN model is designed and trained; its node features are iteratively updated through a message passing mechanism, enabling the model to learn embedded representations of users and ecological elements. Finally, the learned embedded representation is used to evaluate tourism eco-efficiency, and the eco-efficiency index is obtained through calculation or analysis, which provides decision support for tourism eco-management (X. Wang & Zhang, 2021). The model is divided into three parts: input feature layer, which receives user features, ecological features, and a heterogeneous graph. The graph convolutional layer updates the embedded representation of the feature vector. The multi-layer perceptron layer evaluates the link probability and generates an eco-efficiency list.

Figure 2 showcases the model architecture and processes for tourism eco-efficiency evaluation based on multi-source data fusion and graph neural networks (GNNs). It starts with multi-source data fusion, which integrates tourism, environmental monitoring, and socio-economic data. This fused data then supports two key branches. One is dual graph construction, representing network relationships within the tourism ecosystem. The other feeds into a data-driven design optimization loop. Within this framework, a Graph Neural Network (GNN) is employed, incorporating a 3-layer graph convolutional network (GCN) with multi-layer perceptrons (MLPs) and the ecological efficiency index for analyzing spatial and temporal patterns of eco-efficiency. Together, these components form a comprehensive framework to assess and optimize tourism’s eco-efficiency through advanced data-driven and neural network techniques.

The tourism eco-efficiency evaluation model based on multi-source data fusion and graph neural network can provide accurate and feasible insights for different subjects. For tourism planners, the model results can help identify ecological shortcomings in the development of tourism resources. For example, if a scenic spot has insufficient balance between tourist carrying capacity and vegetation protection, planners can adjust the route design and facility layout accordingly. Based on the evaluation results, policymakers can formulate differentiated support or control policies, such as giving financial preferences to eco-efficient areas and setting strict access standards for inefficient areas. Sustainability managers can dynamically monitor eco-efficiency changes through models and intervene in a timely manner to forestall possible environmental degradation.

The findings of the model are highly aligned with the Sustainable Development Goals (eg. SDG 14 “Life below water”, SDG 15 “Life on Land”) and relevant UNWTO guidelines. In the actual use case, in the evaluation of coastal tourism areas, the model integrates marine pollution data and tourist flow data, and the ecological efficiency value obtained is closely related to the achievement of the goal of “reducing marine pollution” in SDG 14, which provides a scientific basis for the local government to limit the number of offshore cruise ships and promote environmentally friendly tourism. In the development of mountain tourism, combined with the UNWTO “Sustainable Mountain Tourism” guidelines, the model results guide developers to retain core ecological regions and plan low-impact tourism projects only in the marginal areas, which not only protects biodiversity but also achieves sustainable growth of the tourism economy. Table 1 has showed the summary content.

This model has both high accuracy and decision-making value: the graph neural network HR@10 of multi-source data is 0.82 and the NDCG is 0.79, which is better than the traditional model and has been verified by experts in the field (error < 5%). Its core value lies in breaking through the simple score, locking the key links of inefficiency (for example, the “carbon emissions of self-driving tours” accounted for 34% in a 58-point scenic spot), revealing the transmission effect of “scenic spots–transportation–communities” (edge weight 0.76), providing planners with optimization models, providing targeted measures for policymakers, and promoting the digital transformation of sustainable tourism decision-making.

3.2. Graph Convolutional Neural Network Represents Update Layer

Eco-efficiency refers to the optimal ratio of tourism economic output to ecological cost, which measures the ability to maximize the economic value of tourism development within the ecological carrying range. The core of the concept is to take into account economic benefits and ecological sustainability. The tourism ecosystem is a dynamic equilibrium system with tourism activities as the core and the interaction between the natural ecological, human, social, and industrial chain elements of the destination with spatial relevance and multi-agent interaction, and its health depends on the coordinated transmission of material, energy, and information between the elements.

Taking Lijiang City in Yunnan Province as a specific case. This study obtained ticket revenue and visitor flow data for key attractions such as Lijiang Old Town and Yulong Snow Mountain from the attractions’ operational departments, along with tourism operational data like hotel occupancy rates and travel agency sales. Environmental monitoring stations collect water quality indicators from rivers within Lijiang, urban air quality data, and noise pollution conditions around tourist sites. The transportation department provides data on flights and rail passenger volumes entering and leaving Lijiang, as well as the operation of public transit and taxis within the city. Additionally, tourism platforms and social media are utilized to gather behavioral data related to visitor stay duration, consumption preferences, and site ratings. After cleaning, transforming, and integrating these multi-source datasets, a tourism ecological relationship graph is constructed, including nodes such as attractions, hotels, and transportation hubs, along with their interconnections. Graph neural networks are employed to analyze the learning of node attributes and edge weights, thereby accurately assessing the tourism ecological efficiency of Lijiang. Based on the evaluation results, strategies such as optimizing dynamic control of tourist carrying capacity in attractions and promoting low-carbon travel routes are implemented. In terms of policy recommendations, it advocates for the establishment of a tourism ecological compensation fund and the formulation of stricter environmental standards for the construction of attractions, translating research findings into practical management measures that promote the sustainable development of tourism in Lijiang.

The data sources are extremely diverse and rich in type, providing a solid information foundation for accurately assessing tourism ecological efficiency from multiple dimensions. In terms of data sources, government departments provide public data, such as the statistical yearbook published by the statistics bureau and environmental monitoring reports from the ecological environment bureau, which can offer macro-level data on regional GDP, industrial structure, total ecological resources, and pollutant emissions, and outline the overall picture of regional economy and ecological environment. Operational data from tourism enterprises, including visitor flow to scenic spots, hotel occupancy rates, and travel agency operating income, reflect the real-time operational status of the tourism market. Third-party platform data includes visitor reviews from online travel platforms (OTAs) and travel check-ins on social media, containing in-depth information on tourist behavioral preferences and satisfaction. Field research data are obtained through meticulously designed questionnaires and in-depth interviews, gathering firsthand information about local residents’ perceptions of the ecological environment and the implementation of local ecological protection measures.

According to data types, structured data such as economic statistics and environmental monitoring indicators have a clear and standardized data structure and format, making them easy to store and analyze directly. Semi-structured data like log files from tourism enterprises and partial web data, however, lack a strict structure and still contain some self-descriptive information. Unstructured data, such as visitor comments, social media texts, and satellite remote sensing images require advanced technologies like natural language processing and image recognition for deep preprocessing and feature extraction to uncover their hidden value. These different types of data play their respective roles and work together in the evaluation of tourism’s ecological efficiency. Environmental data such as forest coverage, water quality indicators, and air quality directly reflect the ecological baseline conditions of tourist destinations. Socio-economic data such as regional GDP, industrial structure, and transportation infrastructure reflect the economic support and infrastructure level for tourism development. Tourism activity data including visitor scale, consumption structure, and types of tourism products visually demonstrate the impact intensity of tourism activities on the ecological environment. By deeply integrating these multi-source data, a large dataset covering all elements of the tourism ecological system and their interrelationships can be constructed, providing ample and high-quality training material for graph neural network models. This will facilitate the model’s accurate characterization of the complex network structure of the tourism ecological system, achieving a scientific, comprehensive, and dynamic evaluation of tourism ecological efficiency.

The preprocessed users, ecological characteristics, and heterogeneous graphs are input into the GCN model. The network aggregates neighbor features through multi-layer ‘convolution’, transmits information layer by layer to distant neighbors, and updates the embedded representation of nodes (Pradhyumna & Shreya, 2021; Han et al., 2022). On this basis, the GCN–AR model updates its embedded representation through three stages: message transmission, aggregation and update (especially by using the information connecting users and ecology), and simulations of user interests and ecological impacts. The message transfer function propagating from a neighbor node to the target node is as shown in Equation (2).

E_{U \leftarrow A}^{(1)} = F_{1}^{(1)} (E_{U}^{(0)}, E_{A}^{(0)}, P_{U A}^{(0)})

(2)

E_U⁽⁰⁾, E_A⁽⁰⁾, and P_UA⁽⁰⁾ are the feature/relationship data of the initial input. F_U—A: Typically represents the first-order (transformation result) that is transferred/updated from an “A-related feature” to a “U-related feature”. The specific expression of the function is as shown in Equation (3).

E_{U \leftarrow A}^{(1)} = W_{1}^{(1)} D^{- \frac{1}{2}} E_{A}^{(0)} + W_{2}^{(1)} D^{- \frac{1}{2}} (E_{A}^{(0)} ⊙ E_{U}^{(0)})

(3)

W is the learnable weight matrix and D is the degree matrix. The message aggregation function integrates the information transmitted by all of the first-order neighbors of the user. The aggregation method is shown in Equation (4).

E_{N_{U}}^{(1)} = F_{2}^{(1)} ({E_{U \leftarrow A}^{(1)}}, A \in N_{U})

(4)

The final representation of the node is updated by combining its own information with the aggregated neighbor information, as defined in Equation (5).

E_{U}^{(1)} = F_{3}^{(1)} (E_{U}^{(1)}, E_{N_{U}}^{(1)})

(5)

The message passing update function is specifically shown in Equation (6).

E_{U}^{(1)} = L e a k y R e L U (E_{U \leftarrow U}^{(1)} + \sum_{A \in N_{U}} E_{U \leftarrow A}^{(1)})

(6)

For the ecological node, the first-order neighbor node is the user node, and the first-order neighbor information of the ecological node is aggregated through the message passing mechanism to obtain the graph convolution embedding representation as shown in Equation (7).

E_{A}^{(1)} = L e a k y R e L U (E_{A \leftarrow A}^{(1)} + \sum_{U \in N_{A}} E_{A \leftarrow U}^{(1)})

(7)

The matrix form of the user-specific message passing process is shown in Equation (8).

E_{U}^{(1)} = L e a k y R e L U (W_{1}^{(1)} D^{- \frac{1}{2}} A D^{- \frac{1}{2}} E_{A}^{(0)} + W_{2}^{(1)} D^{- \frac{1}{2}} A D^{- \frac{1}{2}} (E_{A}^{(0)} ⊙ E_{U}^{(0)}))

(8)

The representation of the recursion of the message passing formulas of users and ecological nodes on the multi-layer graph convolution is as shown in Equations (9) and (10).

E_{U \leftarrow A}^{(l)} = W_{1}^{(l)} D^{- \frac{1}{2}} E_{A}^{(l - 1)} + W_{2}^{(l)} D^{- \frac{1}{2}} (E_{A}^{(l - 1)} ⊙ E_{U}^{(l - 1)})

(9)

E_{A \leftarrow U}^{(l)} = W_{1}^{(l)} D^{- \frac{1}{2}} E_{U}^{(l - 1)} + W_{2}^{(l)} D^{- \frac{1}{2}} (E_{U}^{(l - 1)} ⊙ E_{A}^{(l - 1)})

(10)

Node messages are aggregated on a multi-layer graph convolution, and the representation of update recursion is as shown in Equations (11) and (12).

E_{U}^{(l)} = L e a k y R e L U (E_{U \leftarrow U}^{(l)} + \sum_{A \in N_{U}} E_{U \leftarrow A}^{(l)})

(11)

E_{A}^{(l)} = L e a k y R e L U (E_{A \leftarrow A}^{(l)} + \sum_{U \in N_{A}}^{A \in N_{U}} E_{A \leftarrow U}^{(l)})

(12)

The matrix representation of the node message passing process on the multi-layer graph convolution is shown in Equations (13) and (14).

E_{U}^{(l)} = L e a k y R e L U (W_{1}^{(l)} {\bar{D}}^{- \frac{1}{2}} \bar{A} {\bar{D}}^{- \frac{1}{2}} E_{A}^{(l - 1)} + W_{2}^{(l)} D^{- \frac{1}{2}} A D^{- \frac{1}{2}} (E_{A}^{(l - 1)} ⊙ E_{U}^{(l - 1)}))

(13)

E_{A}^{(l)} = L e a k y R e L U (W_{1}^{(l)} {\bar{D}}^{- \frac{1}{2}} \bar{A} {\bar{D}}^{- \frac{1}{2}} E_{U}^{(l - 1)} + W_{2}^{(l)} D^{- \frac{1}{2}} A D^{- \frac{1}{2}} (E_{U}^{(l - 1)} ⊙ E_{A}^{(l - 1)}))

(14)

The collection of data from Mafengwo mainly utilizes Python (version 3.13) combined with the Scrapy framework to write web scraping programs. Based on the page structure of the Mafengwo website, scraping rules are set to focus on tourist attraction pages, extracting basic information such as visitor reviews, travel notes, and Q&A as well as attraction names and geographical locations. In terms of data preprocessing, following the methods designed for GPS trajectory data, cleaning rules are set, duplicate records are removed through primary key deduplication, and for missing values, methods such as mean imputation and regression model prediction are applied. Finally, data distribution analysis is completed, providing a reliable data foundation for subsequent research.

3.3. Multi-Layer Perceptron Layer

The tourism eco-efficiency score (0–100 points) in this study has a clear realistic direction: a score of 80 and above is a high-quality state of “eco-economy–society” synergy. One example is a nature reserve with a score of 85. The ecological restoration rate of the core area is 92% when it receives 3 million people per year, the reinvestment in environmental protection accounts for 18% of tourism income, and the growth rate of indigenous people’s income exceeds the growth rate of the tourists, so as to achieve sustainable development. There is room for structural optimization in the 60–79 point range, such as 71 points in mountain scenic spots, and the shortcomings are in the carbon emissions of connecting transportation (29%) and the delay in garbage removal and transportation in peak seasons (23%). For example, in the case of a score of 54 points in wetland scenic spots, the overexploitation of wetlands has led to an annual decrease of 2.3% and complaints about “ecological damage” account for 41%. These scores help planners replicate the “monitoring and reservation linkage mechanism” and “zoning control and benefit sharing” models of 85-point protected areas. The evaluations have also helped policymakers launch new energy shuttle bus subsidies and smart garbage point planning for 71 scenic spots and provide a basis for the “ecological redline adjustment” of 54 wetlands. In a provincial pilot, the score is linked to environmental protection funds, so that the average score of 50 scenic spots has risen by 8.7 points per year, highlighting the model’s value.

A GCN model is used to integrate proximity information between users and ecological nodes, generating updated embedding vectors for both. To capture complex interactions, this paper employs MLP instead of traditional matrix factorization’s inner product to learn the nonlinear relationships between users and ecological embeddings (Ding et al., 2021). The specific process involves connecting the updated user and ecological embedding vectors, inputting them into an MLP, and finally outputting the link probability between the user and the ecology. The input vector for the MLP is the concatenation of the two embeddings, as shown in Equation (15).

x_{0} = [E_{u_{i}}^{(l)} \oplus E_{a_{j}}^{(l)}]

(15)

After the embedded connection, the output value expression of the first layer multi-layer perceptron is shown in Equation (16).

x_{1} = h (W_{1} x_{0} + b_{1}^{'})

(16)

The expression of the output value through the l-th layer is as shown in Equation (17).

x_{l} = h (W_{l} x_{l - 1} + b_{1}^{'})

(17)

The resulting link probability is as shown in Equation (18).

{\hat{y}}_{u_{i} a_{j}} = σ (x_{i})

(18)

3.4. Model Training

The dataset used in this study covers the multi-source data from 50 key tourism cities across the country from the years 2018 to 2022. It includes environmental input indicators, economic output indicators (an average total tourism income of 42 billion yuan, minimum value of 8.5 billion yuan, maximum value of 186 billion yuan, standard deviation of 31 billion yuan), socio-ecological indicators (mean of tourist satisfaction of 4.2 points, minimum value of 3.1 points, maximum value of 4.9 points, standard deviation of 0.5 points; 82% and 9% standard deviation of excellent air quality), and spatial correlation data (mean tourist flow intensity between scenic spots is 12,000 person-times/month, standard deviation is 6000 person-times/month). Correlation analysis revealed a significant positive correlation between total tourism revenue and energy consumption (r = 0.68, p < 0.01), while tourist satisfaction was more strongly correlated with a good air quality rating (r = 0.73, p < 0.001). The PCA results showed that the cumulative explanatory variance of the first three principal components was 85%, which verified the rationality of variable selection. In terms of model performance, the GNN model fused with multi-source data outperformed the traditional DEA model (HR@10 = 0.65, NDCG = 0.61) in both HR@10 (0.82) and NDCG (0.79), reduced the mean absolute error of the eco-efficiency score by 32%, and improved the matching degree with Goal 12 (responsible consumption and production) in the SDG framework by 27%. The reliability of the model was further confirmed by comparing the scores of 20 tourism ecology experts (Kappa coefficient = 0.81, p < 0.001) and field verification of three typical cities (error rate < 5%).

This study critically analyzes the quality and limitations of multi-source data. Although the environmental monitoring data is highly accurate, gaps exist in monthly coverage for remote scenic areas. Social media behavior data covers a wide range, but is limited by user portrait bias, which needs to be corrected by qualitative analysis such as local community interviews. The study found a deep connection between tourism eco-efficiency and SDGs 11, 12, and 13. The scenic spot with a score of 80 performed well in the “Community Inclusion” indicator (weight 0.32) under SDG 11 and in the “resource recycling rate” under SDG 12, confirming the “data-driven sustainable transformation pathway”. We present a clear destination implementation path: Scenic spots with complete basic data can be directly connected to the model module to generate efficiency reports within one month. Small and medium-sized scenic spots with weak data can utilize the “core indicators supplementary survey” model, and evaluation can be completed within six weeks. For destination management, the model not only provides quantitative scores, but also locates key nodes for rectification through spatial correlation analysis, enabling managers to formulate precise policies of “off-peak and flow restriction and clean energy subsidies”, and its application promoted the progress of SDG 13 targets in three destinations by 18% in the pilot, highlighting the practical value of research for sustainable tourism management.

After completing the construction of the tourism ecological efficiency evaluation model based on graph convolutional neural networks, the model training phase becomes a key bridge connecting theoretical design and practical application. This stage not only requires careful adjustment of various training parameters but also necessitates repeated iterations and optimizations to enable the model to capture deep internal relationships within complex tourism ecological data.

In the model, tourism statistical data includes the number of tourists, tourism revenue, and the reception capacity of scenic spots, which intuitively reflects the scale and efficiency of the tourism industry. Environmental monitoring data includes air quality, water quality indicators, and vegetation coverage, reflecting the impact of tourism activities on the ecological environment. Socio-economic data involves regional GDP, residents’ income levels, and infrastructure development, showcasing the relationship between tourism and the local economy and society. Since these data come from different sources, there are issues of inconsistent formats and statistical scopes. During the data preprocessing stage, differences in data will be eliminated by standardizing data formats, normalizing data, and converting units. For missing values, based on the data characteristics, methods such as mean filling, regression model prediction, or filling in based on similar data from scenic areas will be employed. The processed data will be structured into a graph where nodes represent different tourist regions, scenic spots, or data indicators, and edges indicate their interrelationships, such as geographic proximity, economic ties, and ecological impact relationships. Graph neural networks can automatically capture complex nonlinear relationships between different nodes through learning from the graph-structured data, unearthing hidden patterns and regularities in the data, and comprehensively considering various factors such as tourism, environment, and socio-economics to ultimately provide a scientific and comprehensive evaluation of tourism ecological efficiency.

Model training includes forward propagation and back propagation. Forward propagation predicts inputs based on current parameters and computes losses. Back propagation uses loss adjustment parameters to improve prediction accuracy (Tang et al., 2021; Li et al., 2022). Bayesian personalized ranking loss is employed, emphasizing the priority of observed user–ecology interactions over unobserved ones to learn model parameters. The loss calculation formula is shown in Equation (19).

L = \sum_{(u_{i}, a_{i}, a_{m}) \in R} - l n σ ({\hat{y}}_{u_{i} a_{j}} - {\hat{y}}_{u_{i} a_{m}}) + λ | | Θ | |_{2}^{2}

(19)

The evaluation model, as shown in Figure 3, is built around the task of assessing tourism’s eco-efficiency by leveraging multi-source data fusion and graph neural networks (GNNs). It begins with multi-source data fusion aggregating diverse tourism-related datasets to form the basis for subsequent analysis. It then leverages key components for graph-based operations: the Heterophilic Node Detector paired with PGNN processes the initial graph to identify heterophilic nodes, while the Probabilistic Anomaly Generator uses translation mechanisms and DDPM to generate anomaly-related elements and enrich the data for anomaly detection. For model training, it incorporates user ecological characteristics, constructs a heterogeneous graph from the fused data, builds a comprehensive training set, initializes parameters, strategically selects learning rates, batch sizes, and iteration counts, and uses an optimization algorithm to update model parameters to minimize loss until convergence. During training, the Counterfactual Data Augmentation for Graph Anomaly Detection module plays a pivotal role: it takes the graph, processes it via Counterfactual GNN to extract embeddings, and then uses an MLP to calculate abnormal probabilities, fully exploring the structural and attribute information of the heterogeneous graph. After training, the optimized graph neural network model, featuring a Multi-layer Perceptron Layer and result visualization capabilities, collaborates with all of the aforementioned components, integrates insights from heterophilic node detection, probabilistic anomaly generation, and counterfactual data augmentation, and ultimately empowers tourism decision-makers with precise and in-depth insights for tourism ecological efficiency assessment by leveraging multi-source data and GNN-based operations to comprehensively analyze the tourism ecosystem’s structural patterns and anomalies.

The evaluation of tourism ecological efficiency based on multi-source data fusion and graph neural networks faces many potential problems in practical applications. On one hand, data quality significantly impacts the accuracy and reliability of the model; multi-source data may suffer from missing data, inconsistent formats, and semantic ambiguity, which complicate fusion and result in heavy data cleaning and preprocessing work. On the other hand, the model requires extremely high computational complexity; the tourism ecosystem is complex, with a vast number of nodes and edges. During large-scale graph training, high memory consumption and long training times can cause issues such as gradient vanishing or explosion, which hinder training efficiency. Furthermore, the adaptability and interpretability of the model in different tourism scenarios, as well as the contradiction between dynamically changing tourism data and real-time updates to the model, are also key issues that need urgent resolution in practical applications.

The implementation of the model is divided into three stages: 5–8 typical destinations are selected for pilot in the initial stage, data interfaces are opened, and personalized solutions are output. In the medium term, the joint department will establish a “score–policy” linkage mechanism, which will be included in the rating of scenic spots and the allocation of funds. Long-term development will include the creation of lightweight tools for small and medium-sized scenic spots. In terms of policy, areas with a score of more than 80 have been designated as demonstration zones with full support, 60–79 sub-regions have issued rectification policies for shortcomings, and areas with a score of less than 60 have restricted development and started repair, which has effectively transformed into a driving force for sustainable development.

4. Experimental Results and Analysis

4.1. Experimental Setup

To ensure statistical rigor, this experiment uses classic methods such as two-tailed t-tests and analysis of variance (ANOVA) to examine differences between groups in tourism ecological efficiency evaluation results across different regions and time spans. By calculating p-values, the significance levels of various variables’ impacts are quantified to ensure that the results are not caused by random factors. Additionally, the Bootstrap resampling technique is applied to sample data multiple times with replacement, constructing a confidence interval for the result distribution to visually demonstrate the stability of the evaluation results. In terms of error analysis, we first quantify the deviation between the model’s predicted values and actual values using standard metrics such as mean square error (MSE) and mean absolute error (MAE), along with analyzing misclassification in efficiency level classification through a confusion matrix of the graph neural network. Secondly, sensitivity analysis is used to explore the impact weights of different variables in multi-source data on evaluation results, identifying sources of error caused by data noise and fluctuations in model parameters. Finally, cross-validation is used to compare error performances between the training set and the test set, avoiding overfitting and comprehensively assessing the reliability and generalization ability of the experimental results.

The experiment uses hornet’s honeycomb tourism data from 2015 to 2020. After preprocessing, including the removal of low-frequency users and ecological data points, the final dataset contains 14,517 users, 19,800 ecology, and 1,035,173 interaction records (Zhou et al., 2022). This experiment evaluates the eco-efficiency of eco-tourism, and adopts the TOP-K algorithm, which is suitable for ecological ensemble ranking. As a TOP-K evaluation index, the hit rate measures the success rate of test items in the evaluation list, as shown in Equation (20).

H R @ K = \frac{H i t s @ K}{| G T |}

(20)

Although the hit rate HR reflects the evaluation’s accuracy, it does not reflect the quality of the ranking. In practical applications, ecosystems with high user relevance should be displayed first. To address this, the normalized loss cumulative gain (NDCG) is used as a ranking quality metric, with emphasis on the correctness of the top ranked results. The calculation formula is shown in Equation (21). Among them, the loss cumulative gain (DCG) considers the position weight, as shown in Equation (22), and the ideal maximum DCG value evaluates the upper limit of the sorting effect, as shown in Equation (23).

N D C G @ K = \frac{D C G @ K}{I D C G @ K}

(21)

D C G @ K = \sum_{i = 1}^{K} \frac{2^{r e l_{i}} - 1}{l o g_{2} (i + 1)}

(22)

I D C G @ K = \sum_{i = 1}^{| R E L |} \frac{2^{r e l_{i}} - 1}{l o g_{2} (i + 1)}

(23)

4.2. Analysis of Influencing Factors

4.2.1. Effect of Number of Layers on Performance of GCN Model

The embedded representation of users and nodes in ecologically heterogeneous graphs is learned using graph convolutional neural networks. The core strategy involves aggregating first-order neighbor information at the initial layer, followed by a layer-by-layer expansion of the multi-layer GCN to capture and update high-order neighbor information of nodes by message passing. This process produces rich node embeddings (Y. Meng et al., 2021; Y. Liu et al., 2021). The choice of GCN layers directly affects the quality of users and ecological embedding and then affects the algorithm’s performance. Experiments explore the effects of different GCN depths (0 to 5 layers, 1 layer apart) on the hit rate and normalized loss cumulative gain of the evaluation algorithm.

As shown in Figure 4, with the increase of the GCN layers, both the HR and NDCG in the eco-tourism eco-efficiency assessment first rise and then decrease, reaching their peaks at the third layer. This shows that increasing the number of layers appropriately can effectively aggregate more neighbor information, improve node embedding quality, evaluate algorithm performance, and verify the effectiveness of high-order connectivity modeling. However, too many layers can lead to the over-aggregation of information, reduce differences between nodes, and degrade evaluation performance. Therefore, the experiment shows that the best effect is achieved when the number of GCN layers is 3.

In the research, the number of GCN layers had a significant impact on the model’s performance, with the best results observed when the number of GCN layers was increased to three. This happened because a single GCN layer can only capture first-order neighbor information, which limits its ability to represent the complex relationships within the tourism ecosystem. As the number of layers increases to two, the model can integrate second-order neighbor information, allowing it to explore the more complex relationships among factors such as tourism resources, ecological environment, and economic development. However, excessive layers can lead to overfitting and gradient disappearance issues. The three-layer GCN achieves a balance between information aggregation and model complexity, effectively merging multi-hop neighbor information to comprehensively characterize the interactions of various elements in the tourism ecosystem while avoiding overfitting and gradient issues, thus enabling accurate evaluations of tourism ecological efficiency. Moreover, the influence of hyperparameters on model performance is both nuanced and interdependent. For instance, a learning rate that is too high can make model training unstable, leading to excessive parameter updates and missing the optimal solution, while a learning rate that is too low can make the training process excessively slow, increasing training time. Batch size affects the stability and convergence speed of model training, with smaller batch sizes introducing more noise, causing greater fluctuations in the training process, but allowing adaptation to local changes in data distribution, whereas larger batch sizes can stabilize training but may get trapped in local optima; the regularization coefficient controls model complexity, with excessively large coefficients leading to underfitting and insufficient learning of data features. On the other hand, a very small coefficient may be insufficient to prevent overfitting, reducing the model’s ability to generalize to unseen data. These hyperparameters interact with each other, collectively determining the model’s final performance in the evaluation of tourism ecological efficiency.

4.2.2. Effects of Tourism Eco-Efficiency Assessment List Length on Tourism Eco-Efficiency Assessment Performance

The performance of a tourism eco-efficiency assessment model is affected by the length of the assessment list. In this experiment, other parameters were fixed and the list length was adjusted from 0 to 30 (interval 5). Figure 5 shows the trend of HR and NDCG as the list length increases.

Figure 6 shows that HR rises first and then decreases with the increase of list length, and then decreases after 10, indicating that after exceeding 10, the user interest ecology has mostly been included in the evaluation. The NDCG value is the lowest when the list length is 1, indicating that the evaluated ecology is not favorable to the user. As the length increases, the evaluation is closer to the users’ interest and the performance is improved. From a practical perspective, overly short lists may fail to satisfy diverse user needs, while excessively long lists may compromise the user experience. Therefore, considering both performance and usability, the list length was set to 10 in this experiment.

4.2.3. Algorithm Convergence

To verify the convergence of the model, the impact of the number of training iterations on the loss function and evaluation metrics was examined (Zhou et al., 2021; Sun et al., 2022). Other parameters were fixed, and the number of iterations was increased from 0 to 100 (interval 10). Figure 7 shows the convergence trend of the model with the increase of the number of iterations.

Figure 8 shows that as the number of iterations increases, the value of the loss function decreases significantly and tends to stabilize after 90 iterations, indicating that the model has converged. At the same time, HR and NDCG increase with iterations, especially in the first 30 times, and then stabilize. It shows that the proposed algorithm efficiently captures the interactions between users and the ecological system, thereby enhancing evaluation accuracy. In summary, the GCN–AR algorithm converges stably and effectively improves HR and NDCG.

4.2.4. Comparative Analysis of Algorithms

To validate the effectiveness of the proposed tourism eco-efficiency evaluation algorithm, a comparative analysis with traditional ranking methods, deep learning models, and other graph neural network approaches was performed. The key metrics HR and NDCG were used to assess the performance. As summarized in Table 2, our algorithm significantly outperforms the benchmark algorithms in both metrics, particularly by capturing complex relationships and effectively leveraging the graph-structured nature of the data. By inherently integrating both structural and attribute information, our GNN-based approach achieves substantial improvements in HR and NDCG scores. These findings highlight the critical role of utilizing graph topology in eco-efficiency evaluation tasks and demonstrate the strong potential of our proposed algorithm in delivering more accurate and insightful evaluations for decision-makers in the tourism sector, as shown in Table 2.

In the context of evaluating tourism ecological efficiency, matrix factorization (MF) methods can effectively mine the potential associations among tourist consumption behavior, resource input-output of scenic spots, and other related data by decomposing high-dimensional tourism-related data matrices into low-dimensional latent factor matrices. This is suitable for handling structured data to quantify the impact of different factors on ecological efficiency. The Graph Convolutional Multimodal Fusion (GC–MC) method leverages the powerful representational capabilities of graph neural networks to construct a graph structure of various entities in the tourism ecosystem (such as scenic spots, tourists, and local communities) and their interrelationships, combining multiple source data (such as geospatial data, environmental monitoring data, and tourism economic data) for node feature learning and information propagation. This method fully captures the complex network characteristics of the tourism ecosystem and the interactions between multimodal data. Other traditional methods such as Data Envelopment Analysis (DEA) provide an intuitive way to assess the relative ecological efficiency of different tourism regions or scenic spots by constructing an input-output index system, which is suitable for static efficiency analysis of tourism systems with clear input-output structures. In contrast, machine learning approaches such as Random Forest can conduct feature selection and model training based on a large amount of historical data to predict trends in changes in tourism ecological efficiency.

Figure 9 shows the analysis of the relationship between NMI and ARI. The experimental results indicate that as the data scale increases, both the HR and NDCG of this algorithm increase, indicating that it uses high-order connectivity to effectively alleviate data sparsity. MF performs the worst because its simple internal product has difficulty capturing the complex relationships between users and ecology (Hang et al., 2021; Wan et al., 2023). NeuMF improves MF through a multi-layer perceptron, improves performance, and proves the importance of learning nonlinear feature interaction. GC–MC introduces a graph neural network, but only one layer of neighbors is considered, and its representation ability is limited. In this paper, GCN–AR uses multi-layer GCN to model high-order connections in heterogeneous graphs and combines multi-layer perceptrons to capture nonlinear relationships, which significantly improves the performance of tourism ecological efficiency evaluation.

4.2.5. Optimal Dropout Value

Dropout is a widely used regularization technique designed to reduce overfitting by randomly deactivating a portion of neurons during training. When applied to models such as GCN and ERNIE-gram, Dropout can reduce the noise in the feature matrix and improve the classification accuracy. Some studies have shown that using Dropout in GCN models can significantly improve the classification accuracy, but using too much Dropout may weaken the expressive power of the model. Therefore, when using Dropout, it is necessary to perform appropriate parameter adjustment and verification to find the optimal Dropout ratio. To examine the Dropout-accuracy relation, five experiments with distinct Dropout values were conducted, yielding the accuracy results depicted in Figure 10.

To ensure the stability and robustness of the model, a validation set was employed to monitor the training process, with L2 regularization and early stopping preventing overfitting. The model’s performance was evaluated on both training and validation sets, refining the architecture and hyperparameters through iterative tuning. The optimized model, with a precise GCN layer, m, Dropout, and hyperparameters, achieved state-of-the-art results, emphasizing the significance of rigorous experimentation in developing accurate ML models. Optimal performance achieved with GCN layer = 2, m = 0.2, and Dropout = 0.1, outperforming the second best by 0.11%. The experiment was set with the following parameters: max sentence length 128, batch size 256, 50 epochs, 200 hidden units, GCN learning rate 1 × 10⁻³, ENIRE-gram learning rate 1 × 10⁻⁵.

4.2.6. Ablation Experiment

To assess E2G’s robustness and performance, a series of ablation experiments were conducted. Results confirm E2G’s effectiveness and superiority (X. Wang et al., 2021; Y. Zhang et al., 2022). This study also tested integrating GAT, a graph-based neural network with attention layers, into ERNIE and ERNIE-gram, addressing the limitations of graph convolutional models. GCN uses average sampling to compute node feature representations, which not only introduces noise but also leads to poor generalization performance of the model. Graph Attention Network (GAT), a variant of GCN, can effectively solve these problems. Figure 11 is a schematic diagram of ProtGNN generation in a Graphcycle dataset. Compared with GCN, GAT uses a multi-head attention mechanism to adaptively assign different weights to neighboring nodes, and aggregates and updates node features according to the features of neighboring nodes. The introduction of an attention mechanism allows GAT to assign different weights to each neighboring node, filter out irrelevant information, and select more important neighboring nodes.

As depicted in Figure 12, combining ERNIE with GAT resulted in performance that was 1.6% lower than that of GCN. Similarly, when combining ERNIE-gram with GAT, the effect lagged by 2.15% compared to GCN. This is attributed to the fact that while both GCN and GAT aggregate neighbor vertex features to the central vertex, GCN utilizes a Laplacian matrix, whereas GAT employs an attention mechanism, which may lead to better results for GAT in certain scenarios. However, experimental results indicate that fixed coefficients are better suited for Chinese tourism evaluation text data, which has also been confirmed in BERTGCN.

4.2.7. Comparison of Effects of Different Activation Functions

To assess performance differences and impacts on node/graph classification, this paper experiments with eight activation functions (RELU, RELU6, ELU, SELU, CELU, Leaky RELU, RRELU, GELU) in GCN’s second layer. As shown in Figure 13, when different activation functions are tested under the same conditions, RELU6 achieves the best effect on the Chinese ecological evaluation data set, 0.22% ahead of the second place. RELU6 is similar to RELU but caps output at 6. Without this limit, RELU’s output spans 0 to infinity, which may inadequately represent large, widely distributed values and thus compromise accuracy.

5. Discussion

5.1. Implications for Theory

A framework for integrating multi-source data fusion with Graph Neural Network (GNN) was constructed, integrating various types of data such as tourism statistics, environmental monitoring, and socio-economic data, breaking through the limitations of traditional single-source data. By leveraging GNN, the intrinsic correlations within the data were explored, revealing the complex nonlinear mechanisms of the tourism ecosystem (such as the nonlinear impact of tourism income structure and environmental quality baseline on ecological efficiency).

Enhanced accuracy and scientificity of assessment: Compared to traditional methods (such as single-source data regression analysis), the new method has a higher assessment accuracy (72 points in 2020 for traditional methods, 85 points for the new method). The multi-layer architecture of GNN can capture high-order spatial correlations and behavioral interactions, compensating for the deficiencies of traditional models like DEA and SFA in capturing spatial correlations and dynamic features.

Verified the model’s alignment with reality: Through comparison with real indicators in tourist areas, the model’s identification of core influencing factors was highly consistent with the actual situation, and the prediction accuracy on the validation set reached 92%, an increase of approximately 15 percentage points compared to traditional methods.

5.2. Implications for Practice and Policy

Providing decision support for multiple stakeholders:

Tourism planners: Can identify ecological shortcomings in tourism resource development (such as the imbalance between the capacity of scenic spots and vegetation protection), optimize route design and facility layout;

Policy makers: Based on the assessment results, formulate differentiated policies (such as providing financial incentives to efficient areas and setting strict entry standards for inefficient areas), and establish a “score—policy” linkage mechanism (such as linking scores to environmental protection fund allocation and scenic spot ratings);

Sustainable managers: Dynamically monitor changes in ecological efficiency and intervene promptly to mitigate risks of environmental degradation (such as adjusting tourism routes and implementing green passes).

Facilitating the implementation of Sustainable Development Goals (SDGs):

The model results are highly consistent with goals such as SDG 14 (underwater life) and SDG 15 (land-based life). For example, in coastal tourism areas, by integrating data on marine pollution and tourist flow, providing a scientific basis for limiting the number of near-sea cruise ships and promoting eco-tourism; in mountainous tourism, guiding low-impact project planning to balance biodiversity protection and economic growth.

5.3. Limitations of the Study and Future Research Directions

Data level: The quality of multi-source data varies. It is difficult to unify the time and space scales, and semantic conflicts are significant. Non-structured data (such as tourist comments, remote sensing images) has high processing costs and its accuracy is only 65–72%. The coverage of landscape pattern and air pollutant-related data is only 63% of the study area, affecting the accuracy of mechanism analysis.

Model application level: GNN requires high computing resources (single-round training takes more than 48 h). The model’s interpretability is poor; static assessments have difficulty coping with the dynamic changes of the tourism ecosystem. The prediction deviation rate is 22–28%.

Data Processing Optimization: Explore technologies such as federated learning to solve the problem of data sharing. Focus on integrating data on landscape patterns, air pollutants, etc., and increase the coverage rate to over 80%.

Model Performance Enhancement: Develop lightweight GNN architectures to shorten training time to less than 12 h. Introduce causal reasoning modules to enhance model interpretability.

System Function Expansion: Build a dynamic intelligent assessment system to strengthen interdisciplinary integration to improve the indicator system; bind the assessment results with tourism planning and approval processes deeply, promoting it as an assessment tool in a decision support system for sustainable development of the tourism industry.

6. Conclusions

In this study, a tourism eco-efficiency assessment was carried out based on a multi-source data fusion and graph neural network (GNN), and an evaluation framework was established by integrating tourism statistics, environmental monitoring, socio-economic, and other types of data. With the help of GNN’s modeling ability, the data correlation was mined, the potential impact mechanism was analyzed, and it was found that factors such as tourism revenue structure and environmental quality baseline act on ecological efficiency in a nonlinear manner, and the matching degree of the model between the identification of core factors and the actual situation was verified by comparing it with the real index of a tourism area. The model was trained on a dataset comprising over one million data points collected over the past five years. On the validation set, it achieved a prediction accuracy of 92%, which is about 15 percentage points higher than traditional methods. In the application of a typical tourism area, more than 20 influencing factors were identified; combined with the real tracking data of third-party institutions, the ecological efficiency score of the region increased from 75 points to 88 points in three years, with a growth rate of 17.3%, which is higher than the national average, which reflects the model’s ability to capture key factors.

This study has several significant limitations. In terms of data, the quality of multi-source data is uneven, the spatiotemporal scale is difficult to unify, the semantic conflict is high, the processing cost of unstructured data is high, and the accuracy is only 65–72%. Additionally, the integration of landscape pattern and air pollutant-related data only covers 63% of the study area, limiting the ability to accurately analyze impact mechanisms. In terms of model application, GNN requires high-performance computing clusters, which can take over 48 h to complete a single round. The model suffers from poor interpretability and struggles to handle dynamic scenarios in static evaluation, with a prediction bias rate of 22–28%. However, the model also reveals important environmental impacts of tourism, such as an 8.7% disturbance rate in vegetation cover when the annual growth rate of tourists in the case area is 12%, and an average 11.3% increase in PM2.5 concentration during peak seasons. Based on these findings, short-term actions such as adjusting tour routes and implementing green passes can be adopted, while long-term policies should focus on improving assessment indicators and promoting the development of data sharing platforms.

There are still some limitations in this study, including the difficulty of static assessment to capture the dynamic changes of tourism’s eco-efficiency, the generalizability of the model to different types of tourist destinations remains to be verified, and the impact of multi-source data quality (such as insufficient integrity of environmental monitoring data in some scenic spots) on the accuracy of assessment. Future research could make breakthroughs in three aspects. In terms of data processing, explore federated learning and other technologies to solve the problem of data sharing, focus on mining and integrating data such as landscape pattern and air pollutants, and increase the coverage to more than 80%. In terms of model optimization, a lightweight graph neural network architecture was developed to shorten the training time to 12 h, and a causal inference module was introduced to enhance interpretability. In terms of system improvement, a dynamic and intelligent evaluation system should be constructed, interdisciplinary integration should be strengthened to improve the index system, and the evaluation results should be deeply bound to tourism planning and approval, so that it can be transformed from an evaluation tool into a decision support system to serve the sustainable development of the tourism industry.

Author Contributions

Conceptualization, L.L. and J.L.; methodology, L.L.; software, L.L.; validation, L.L., J.L.; formal analysis, L.L.; investigation, L.L.; resources, L.L.; data curation, L.L.; writing—original draft preparation, L.L.; writing—review and editing, L.L.; visualization, L.L.; supervision, L.L.; project administration, L.L.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflict of interest.

References

Alrawadieh, Z., Prayag, G., Adie, B. A., & Alrawadieh, Z. (2025). Tourism impacts and well-being in heritage tourism: The role of discrete emotions and site sustainability characteristics. Journal of Heritage Tourism, 2, 1–19. [Google Scholar] [CrossRef]
Bertsatos, G., Tsounis, N., & Tsitouras, A. (2025). Tourism product life-cycle, growth, and environmental sustainability. Sustainability, 17(4), 1440. [Google Scholar] [CrossRef]
Chen, T., Zhang, X., You, M., Zheng, G., & Lambotharan, S. (2021). A GNN-based supervised learning framework for resource allocation in wireless IoT networks. IEEE Internet of Things Journal, 9, 1712–1724. [Google Scholar] [CrossRef]
Coghlan, A., & Prayag, G. (2025). “Ontological shocks”: New transformations for sustainability through psychedelic tourism. Journal of Sustainable Tourism, 8, 33–36. [Google Scholar] [CrossRef]
Ding, M., Kong, K., Li, J., Zhu, C., Dickerson, J., Huang, F., & Goldstein, T. (2021). VQ-GNN: A universal framework to scale up graph neural networks using vector quantization. Advances in Neural Information Processing Systems, 34, 6733–6746. [Google Scholar]
Dolasinski, M. J., Roberts, C., & Young, L. (2025). Sustainability, the balanced scorecard, and event tourism: The SBSC-ET model. Sustainability, 17(5), 2174. [Google Scholar] [CrossRef]
Farsari, I., Persson-Fischier, U., & Poort, M. E. (2025). An enabling approach to sustainability transformations in tourism. Leisure Sciences, 1–24. [Google Scholar] [CrossRef]
Gao, J., Chen, J., Li, Z., & Zhang, J. (2021). ICS-GNN: Lightweight interactive community search via graph neural network. Proceedings of the VLDB Endowment, 14, 1006–1018. [Google Scholar] [CrossRef]
Gu, S., & Shakaraliyeva, Z. A. (2025). Smart technologies in the sustainability and transformation of the tourism market. Interactive Learning Environments, 1–12. [Google Scholar] [CrossRef]
Han, P., Zhao, P., Lu, C., Huang, J., Wu, J., Shang, S., Yao, B., & Zhang, X. (2022). Gnn-retro: Retrosynthetic planning with graph neural networks. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 4014–4021. [Google Scholar] [CrossRef]
Hang, M., Neville, J., & Ribeiro, B. (2021, July 18–24). A collective learning framework to boost GNN expressiveness for node classification. International Conference on Machine Learning (pp. 4040–4050), Online. [Google Scholar]
Huang, K., Zhai, J., Zheng, Z., Yi, Y., & Shen, X. (2021, February 27–March 3). Understanding and bridging the gaps in current GNN performance optimizations. 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (pp. 119–132), Online. [Google Scholar]
Li, X. S., Liu, X., Lu, L., Hua, X. S., Chi, Y., & Xia, K. (2022). Multiphysical graph neural network (MP-GNN) for COVID-19 drug design. Briefings in Bioinformatics, 23, bbac231. [Google Scholar] [CrossRef]
Liu, Y., Ao, X., Qin, Z., Chi, J., Feng, J., Yang, H., & He, Q. (2021, April 19–23). Pick and choose: A GNN-based imbalanced learning approach for fraud detection. Proceedings of the Web Conference 2021 (pp. 3168–3177), New York, NY, USA. [Google Scholar]
Liu, Z., Nguyen, T. K., & Fang, Y. (2021, August 14–18). Tail-GNN: Tail-node graph neural networks. 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (pp. 1109–1119), Online. [Google Scholar]
Meng, J., Han, W., & Yuan, C. (2023). Seasonal and multi-scale difference of the relationship between built-up land landscape pattern and PM2.5 concentration distribution in Nanjing. Ecological Indicators, 156, 111079. [Google Scholar] [CrossRef]
Meng, Y., Zong, S., Li, X., Sun, X., Zhang, T., Wu, F., & Li, J. (2021). GNN-LM: Language modeling based on global contexts via GNN. arXiv, arXiv:2110.08743. [Google Scholar]
Pradhyumna, P., & Shreya, G. P. (2021, August 4–6). Graph neural network (GNN) in image and video understanding using deep learning for computer vision applications. 2021 Second International Conference on Electronics and Sustainable Communication Systems (pp. 1183–1189), Coimbatore, India. [Google Scholar]
Réau, M., Renaud, N., Xue, L. C., & Bonvin, A. M. (2023). DeepRank-GNN: A graph neural network framework to learn patterns in protein–protein interfaces. Bioinformatics, 39, btac759. [Google Scholar] [CrossRef]
Shi, Y., Ren, C., Lau, K. K. L., & Ng, E. (2019). Investigating the influence of urban land use and landscape pattern on PM2. 5 spatial variation using mobile monitoring and WUDAPT. Landscape and Urban Planning, 189, 15–26. [Google Scholar] [CrossRef]
Sun, R., Dai, H., & Yu, A. W. (2022). Does GNN pretraining help molecular representation? Advances in Neural Information Processing Systems, 35, 12096–12109. [Google Scholar]
Tang, S., Chen, D., Bai, L., Liu, K., Ge, Y., & Ouyang, W. (2021, June 20–25). Mutual CRF-GNN for few-shot learning. IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2329–2339), Nashville, TN, USA. [Google Scholar]
Wan, X., Xu, K., Liao, X., Xi, Y., Chen, K., & Xi, X. (2023). Scalable and efficient full-graph gnn training for large graphs. Proceedings of the ACM on Management of Data, 1, 143. [Google Scholar]
Wang, L., Yin, Q., Tian, C., Yang, J., Chen, R., Yu, W., Yao, Z., & Zhou, J. (2021, April 26–28). FlexGraph: A flexible and efficient distributed framework for GNN training. Sixteenth European Conference on Computer Systems (pp. 67–82), Edinburgh, UK. [Google Scholar]
Wang, X., Yen, K., Hu, Y., & Shen, H. W. (2021). DeepGD: A deep learning framework for graph drawing using GNN. IEEE Computer Graphics and Applications, 41, 32–44. [Google Scholar] [CrossRef]
Wang, X., & Zhang, M. (2021, May 3–7). GLASS: GNN with labeling tricks for subgraph representation learning. International Conference on Learning Representations, Online. [Google Scholar]
Xu, Y., Wang, L., Wang, Y., & Fu, Y. (2022, June 18–24). Adaptive trajectory prediction via transferable GNN. IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 6520–6531), New Orleans, LA, USA. [Google Scholar]
Yang, J., Liu, Z., Xiao, S., Li, C., Lian, D., Agrawal, S., Singh, A., Sun, G., & Xie, X. (2021). Graphformers: Gnn-nested transformers for representation learning on textual graph. Advances in Neural Information Processing Systems, 34, 28798–28810. [Google Scholar]
Yang, L., Liu, Z., Dou, Y., Ma, J., & Yu, P. S. (2021, July 11–15). Consisrec: Enhancing Gnn for social recommendation via consistent neighbor aggregation. 44th International ACM SIGIR Conference on Research and development in Information Retrieval (pp. 2141–2145), Online. [Google Scholar]
Yasunaga, M., Ren, H., Bosselut, A., Liang, P., & Leskovec, J. (2021). QA-GNN: Reasoning with language models and knowledge graphs for question answering. arXiv, arXiv:2104.06378. [Google Scholar]
Yu, H., Wang, L., Wang, B., Liu, M., Yang, T., & Ji, S. (2022, July 17–23). GraphFM: Improving large-scale GNN training via feature momentum. International Conference on Machine Learning (pp. 25684–25701), Baltimore, MD, USA. [Google Scholar]
Zhang, L., Wei, W., Fan, A., Milman, A., & King, B. E. M. (2025). Cultural sustainability in hospitality and tourism: Toward a holistic framework. International Journal of Contemporary Hospitality Management, 37(13), 20–38. [Google Scholar] [CrossRef]
Zhang, Y., Li, S., Weng, J., & Liao, B. (2022). GNN model for time-varying matrix inversion with robust finite-time convergence. IEEE Transactions on Neural Networks and Learning Systems, 35, 559–569. [Google Scholar] [CrossRef]
Zhao, L., Xi, W., Akoglu, L., & Shah, N. (2021). From stars to subgraphs: Uplifting any GNN with local structure awareness. arXiv, arXiv:2110.03753. [Google Scholar]
Zhou, H., Ren, D., Xia, H., Fan, M., Yang, X., & Huang, H. (2021). AST-GNN: An attention-based Spatio-temporal graph neural network for interaction-aware pedestrian trajectory prediction. Neurocomputing, 445, 298–308. [Google Scholar] [CrossRef]
Zhou, H., Zheng, D., Nisa, I., Ioannidis, V., Song, X., & Karypis, G. (2022). Tgl: A general framework for temporal GNN training on billion-scale graphs. arXiv, arXiv:2203.14883. [Google Scholar] [CrossRef]
Zhu, H., Zhang, P., Wang, N., Zhang, F., Ma, W., Wen, F., Li, M., Wang, Y., Fan, X., Hou, K., & Han, Y. (2024). Investigating the multiscale associations between urban landscape patterns and PM1 pollution in China using a new combined framework. Journal of Cleaner Production, 456, 142306. [Google Scholar] [CrossRef]

Figure 1. Multi-source data—graph neural network algorithm flow.

Figure 2. Model architecture and processes.

Figure 3. Evaluation model of tourism eco-efficiency.

Figure 4. Configuring velocity domain. (a): Configuration Domain l (b): Configuration Domain 2.

Figure 5. Effect of ASR across different GNN models. (a): ASR value in the range of 0–15 (b): ASR value in the range of 15–30.

Figure 6. Performance under the training model. (a): Average system AOI values (b): Average system power values.

Figure 7. Performance with different quantities. (a): Fl performance under 0–50 iterations (b): Fl performance under 50–l00 iterations.

Figure 8. Detection performance of Graph SAGE model. (a): Quantity statistics under different Fl values (b): Quantity statistics under different AUC-ROC values.

Figure 9. Proportional relationship between NMI and ARI. (a): Proportion of NMI relationships (b): ARI relationship proportion.

Figure 10. Convergence analysis of DIGRAF and baseline activation function. (a): Convergence of loss values under DIGRAF (b): Convergence of loss values under baseline activation function.

Figure 11. ProtGNN generation in Graphcycle dataset. (a): Original node construction (b): Optimized node construction.

Figure 12. ProtGNN generation in Graphcycle dataset. (a): Accuracy under 0–8 compression blocks (b): Accuracy under 8–15 compression blocks.

Figure 13. Function of fractional efficiency reduction.

Table 1. Summary table.

Category	Details
Dataset	Contains multi-source data from 2015–2020, including 14,517 users, 19,800 ecological nodes, and 1,035,173 interaction records. Integrates tourism statistics, environmental monitoring, and socio-economic data from platforms like Mafengwo, government statistics, and environmental monitoring stations.
Variable Definitions	- Tourism: tourist numbers, revenue, scenic area capacity (reflecting scale and benefit). - Environment: air quality, water quality, vegetation cover (reflecting impact). - Socio-economy: GDP, income, infrastructure (reflecting tourism-local development link).
Data Sources	- Government: statistics bureau (economic data), ecological environment bureau (monitoring). - Tourism enterprises: operational data, hotel occupancy. - Third-party platforms: Mafengwo (reviews, travelogs), social media (check-ins). - Field research: questionnaires and interviews on resident perceptions and ecological protection.
Roles	- Supports construction of tourism ecological network maps. - Provides training material for graph neural network models. - Enables dynamic assessment and multi-dimensional analysis of tourism ecological efficiency for sustainable development decisions.

Table 2. Comparison results.

Algorithm	40% HR @ 10	60% HR @ 10	80% HR @ 10	100% HR @ 10
MF	0.0321	0.0712	0.1156	0.1208
GC–MC	0.0701	0.2289	0.3922	0.5014
NeuMF	0.1254	0.4123	0.6015	0.6389
Our Method	0.1702	0.4765	0.6321	0.6543

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lin, L.; Lv, J. Tourism Ecological Efficiency Assessment Based on Multi-Source Data Fusion and Graph Neural Network. Adm. Sci. 2025, 15, 334. https://doi.org/10.3390/admsci15090334

AMA Style

Lin L, Lv J. Tourism Ecological Efficiency Assessment Based on Multi-Source Data Fusion and Graph Neural Network. Administrative Sciences. 2025; 15(9):334. https://doi.org/10.3390/admsci15090334

Chicago/Turabian Style

Lin, Luoyanzi, and Jiehua Lv. 2025. "Tourism Ecological Efficiency Assessment Based on Multi-Source Data Fusion and Graph Neural Network" Administrative Sciences 15, no. 9: 334. https://doi.org/10.3390/admsci15090334

APA Style

Lin, L., & Lv, J. (2025). Tourism Ecological Efficiency Assessment Based on Multi-Source Data Fusion and Graph Neural Network. Administrative Sciences, 15(9), 334. https://doi.org/10.3390/admsci15090334

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Tourism Ecological Efficiency Assessment Based on Multi-Source Data Fusion and Graph Neural Network

Abstract

1. Introduction

2. Related Theories and Technologies

2.1. Graph Convolutional Neural Network

2.2. Multi-Source Data Fusion Technology

2.2.1. Generative Adversarial Network Based on Wasserstein Distance Improvement

2.2.2. LSTM and Self-Attention Mechanism

3. Tourism Eco-Efficiency Assessment Model Based on Graph Convolutional Neural Network

3.1. Algorithm Flow

3.2. Graph Convolutional Neural Network Represents Update Layer

3.3. Multi-Layer Perceptron Layer

3.4. Model Training

4. Experimental Results and Analysis

4.1. Experimental Setup

4.2. Analysis of Influencing Factors

4.2.1. Effect of Number of Layers on Performance of GCN Model

4.2.2. Effects of Tourism Eco-Efficiency Assessment List Length on Tourism Eco-Efficiency Assessment Performance

4.2.3. Algorithm Convergence

4.2.4. Comparative Analysis of Algorithms

4.2.5. Optimal Dropout Value

4.2.6. Ablation Experiment

4.2.7. Comparison of Effects of Different Activation Functions

5. Discussion

5.1. Implications for Theory

5.2. Implications for Practice and Policy

5.3. Limitations of the Study and Future Research Directions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI