Next Article in Journal
Hidden Carbon: How Polymers Influence Soil Organic Matter and Carbon Cycling
Previous Article in Journal
Ecological Processes and Nature-Based Solutions in Urban Railway Corridors: Perth and Beijing
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Landscape Drivers of Trail Formation in Peri-Urban Mountains: Insights from an Explainable Machine Learning Approach

1
School of Design, Inner Mongolia Normal University, Hohhot 010022, China
2
School of Landscape Architecture, Beijing Forestry University, Beijing 100083, China
*
Author to whom correspondence should be addressed.
Land 2026, 15(5), 715; https://doi.org/10.3390/land15050715
Submission received: 13 March 2026 / Revised: 16 April 2026 / Accepted: 19 April 2026 / Published: 24 April 2026

Abstract

The rapid growth of hiking tourism presents a critical challenge for balancing visitor safety with the sustainable management of ecologically fragile mountain environments. Traditional models developed in urban settings struggle to capture the highly non-linear, heterogeneous, and zero-inflated characteristics of wilderness trekking behavior. In order to quantify the nonlinear and threshold-based effects of environmental variables on hikers’ spatial decisions in unstructured wilderness and to identify distinct behavioral regimes for segmented management, this study introduces an explainable machine learning framework to reconstruct hikers’ spatial decision-making in a complex mountainous system in Inner Mongolia, China. Random Forest (RF), XGBoost, and LightGBM were compared in predicting trail density and the Euclidean distance to the nearest trail. Results show that transforming behavioral traces into continuous proximity surfaces dramatically improves model performance, with XGBoost achieving the highest predictive accuracy for Trail_Dist. By integrating the SHapley Additive exPlanations framework, this study moves beyond black-box prediction to reveal the nonlinear mechanisms driving hiker behavior. Key findings include: (1) Nighttime light range exhibits a U-shaped threshold effect as the primary anthropogenic attractor. (2) Elevation shows an exponential inhibitory trend above 1238 m. (3) Strong spatial coupling exists between elevation and slope, alongside a landscape compensation effect where high Normalized Difference Vegetation Index (NDVI) areas attract off-trail movements. This research provides a robust methodological pathway for predicting behavior in unstructured outdoor environments. It offers a scientific foundation for smart scenic area management, including optimized route planning, precise ecological protection zoning, and targeted emergency rescue preparedness.

1. Introduction

With the global shift from mass tourism toward Special Interest Tourism, hiking has emerged as one of the fastest-growing and most resilient forms of outdoor recreation [1,2], valued for its low-carbon footprint, immersive experience, and restorative benefits. The surging demand for open natural spaces has positioned hiking as a catalyst for economic revitalization in peripheral regions and a critical pathway for reconnecting urban populations with nature [3,4]. Yet this rapid expansion has intensified tensions between humans and the environment [5]. Unregulated and concentrated hiking activities can lead to soil compaction [6,7], vegetation fragmentation [8,9], the spread of invasive species [10], and the disruption of wildlife habitats [11,12], resulting in potentially irreversible ecological impacts. Simultaneously, the inherent complexity of outdoor environments like rugged topography, microclimatic variability [13], and limited communication coverage elevates risks such as disorientation, hypothermia [14,15,16], and accidental falls [17]. Accurately predicting and decoding hikers’ spatial behavior patterns is therefore not merely an ecological imperative for safeguarding fragile ecosystems, but also a foundational requirement for initiative-taking risk governance and intelligent emergency management in nature-based recreational systems.
Although human mobility has been extensively modeled using multi-source big data such as mobile signaling records [18,19], GPS trajectories [20], and geotagged social media [21,22], most applications remain grounded in urban contexts [23,24]. Existing trajectory prediction models are developed within structured spatial frameworks [25], focusing on commuting flows, interactions between commercial centers [26], or public transport efficiency [27,28]. In contrast, studies addressing hiker behavior in unstructured wilderness environments remain limited. Unlike urban mobility, which is strongly constrained by road networks, wilderness movement unfolds in path-undefined and weakly regulated spaces [29]. Hikers’ trajectories thus exhibit greater spontaneity, stochasticity, and spatial deviation [30]. This transition from network-constrained space to continuous and open terrain fundamentally challenges conventional urban mobility models, which often lack the flexibility to capture how fine-scale environmental heterogeneity shapes individual decision-making in complex natural settings.
Early quantitative research in tourism geography largely relied on traditional statistical techniques such as ordinary least squares (OLS) regression [31,32]. However, these traditional models are inherently limited in their capacity to address multicollinearity, nonlinear relationships [33,34], and non-normally distributed variables. Environmental factors such as elevation, slope, light conditions, and resource accessibility rarely exert linear influences on hiking behavior; instead, their effects often manifest through threshold effects and nonlinear fluctuations [35,36]. And, when dependent variables exhibit strong spatial dispersion, zero inflation, or heavy-tailed distributions, the predictive accuracy of linear models declines markedly, rendering them incapable of capturing the underlying structure of recreational activity patterns [37]. From the perspective of Landscape Perception and Behavior Theory, these limitations ignore the reality that human spatial decision-making in wilderness environments is a complex process of cognition and response to environmental stimuli. Such landscape perception is inherently nonlinear, characterized by distinct threshold effects and landscape compensation mechanisms. Consequently, traditional models based on linear assumptions oversimplify this complex environmental cognitive process, rendering them incapable of accurately predicting hikers’ route choices in wilderness settings [38].
More recently, ensemble learning algorithms have provided an alternative paradigm for geographic behavioral modeling [39,40], owing to their capacity for nonlinear approximation and flexibility in handling high-dimensional heterogeneous data [41,42]. However, gains in predictive performance often come at the cost of interpretability. The black box nature of many machine learning models obscures the underlying contribution logic of explanatory variables [43], hindering interpretation of the actual behavioral mechanisms. Moreover, the measurement of the dependent variable itself critically shapes model performance [44]. Reliance on discrete density counts alone may overlook spatial continuity, constraining the model’s ability to learn the underlying spatial structure of recreational behavior.
To address the practical dilemma that existing algorithms cannot effectively decode how key environmental variables shape wilderness spatial decision-making—thereby failing to accurately predict hikers’ routes—the primary objective of this research is to reconstruct the environmental perception mechanisms of hikers. Therefore, a novel explainable machine learning framework (XGBoost-SHAP) and a spatial transformation approach were introduced, aiming to provide scientific support for mitigating the human–environment conflicts between hikers and fragile wilderness ecosystems, as well as reducing safety hazards. This study takes the Daqing Mountains in Inner Mongolia, China, as the study area and reconceptualizes hiker movement as an emergent spatial pattern generated by nonlinear environmental response functions within a heterogeneous and weakly structured landscape system. Rather than treating hiking distribution as a purely recreational phenomenon, it is framed as a complex decision surface shaped by interacting topographic constraints, environmental gradients, and anthropogenic signals.
Accordingly, this study addresses three interrelated research questions:
(1)
How do key environmental variables nonlinearly structure spatial decision-making in wilderness contexts? Specifically, do identifiable threshold effects give rise to physical exclusion zones or landscape-mediated compensation mechanisms?
(2)
How does the integration of continuous spatial representations with ensemble learning algorithms alter the model’s capacity to approximate underlying response functions and improve generalization across heterogeneous terrains?
(3)
Under varying environmental pressures, how do distinct behavioral regimes emerge, and what structural differences characterize topography-dominated versus resource-dependent decision logics?

2. Materials and Methods

2.1. Study Area

The study area is in central Inner Mongolia, China, encompassing the Daqing Mountain range and its adjacent piedmont plains (Figure 1), with Wuchuan County at its core and extending toward the urban fringes of Hohhot and Baotou. As the central segment of the Yinshan Mountains, Daqing Mountain stretches approximately 240 km from east to west and exhibits pronounced topographic gradients, with elevations ranging from 1800 to 2338 m. Designated as a National Nature Reserve in 2008, it serves as a critical ecological barrier in northern China, protecting vital water sources and rich biodiversity within a climatically fragile semi-arid zone.
Ecologically, Daqing Mountain represents one of the most intact forest–shrub–grassland mosaics in the Yinshan Range. Despite its ecological significance, the region’s low forest coverage (11.5%) underscores its acute ecological vulnerability [45]. This coexistence of extreme ecological fragility and dense urban proximity makes it an ideal geographic laboratory to examine the spatial tensions between human mobility and environmental carrying capacity. From a human-use perspective, the area has rapidly transformed into a peri-urban recreational belt, anchored by a 100-km national hiking trail system and supporting approximately 300,000 visitors annually. Following the government-led mining withdrawal policy implemented between 2018 and 2020, the region has undergone a functional transition from resource extraction to conservation-oriented recreation, intensifying spatial interactions between ecological restoration zones and emerging tourism flows. The resulting conservation-oriented recreation dictates that emerging tourism flows must be rigorously regulated to prioritize the restoration of fragile ecosystems over mass commercial development.

2.2. Variable Construction

2.2.1. Explanatory Variables

Outdoor hiking behavior and trail formation are widely recognized as outcomes of complex and multidimensional human–environment interactions [46]. Integrating geographic context with behavioral considerations of hiker decision-making, this study organizes the explanatory variables into four coherent environmental dimensions (Table 1).
First, topographic factors, including elevation (DEM) and slope, represent the fundamental physical constraints shaping hiking difficulty, energy expenditure, and route accessibility. As primary components of the terrain gradient, these variables structure the baseline feasibility of movement and are expected to exert strong nonlinear influences on spatial choice.
Second, hydrological factors, operationalized as distance to water bodies (Water_Dist), capture both functional and experiential dimensions of hiking. Water sources serve as critical resupply points in mountainous environments while simultaneously enhancing landscape attractiveness, thereby potentially generating localized clustering effects in spatial behavior.
Third, anthropogenic influence factors, measured through nighttime light intensity and its variability [47], reflect the gradient of human presence at the urban–wilderness interface. Nighttime illumination functions as a proxy for infrastructural accessibility and perceived safety, potentially exerting directional, or threshold-based attraction effects on movement patterns.
Fourth, ecological landscape factors, represented by vegetation indicators derived from NDVI, characterize habitat quality and scenic appeal. Areas with higher vegetation productivity and seasonal variability often correspond to visually attractive, thermally moderated environments, which may serve as preferred corridors for recreational activity.

2.2.2. Dependent Variables

The analysis utilizes crowdsourced geospatial trajectories retrieved from 2bulu (https://www.2bulu.com/), one of China’s most prominent digital platforms for outdoor hiking activities [48]. To quantify the spatial characteristics of hiker behavior, the raw trajectory data were processed into two dependent variables: Network Density and Euclidean Distance to the Network.
Unlike conventional cartographic road networks that primarily reflect officially designated trails, the 2bulu dataset captures both formal hiking routes and a large number of informally established paths formed through repeated on-the-ground exploration. These user-generated traces, therefore, record spontaneous spatial decisions made under environmental constraints, rather than pre-defined planning schemes. By preserving fine-scale deviations, detours, and off-trail movements, the dataset provides a more faithful representation of how hikers navigate complex wilderness terrain. This characteristic makes it particularly suitable for modeling nonlinear spatial responses and examining emergent patterns of human–environment interaction in weakly structured outdoor systems.
Conventional approaches typically rasterize trail data and calculate the total trail length or density within individual grid cells. However, hiking trajectories are inherently uneven in space, often exhibiting strong clustering alongside large areas with no recorded activity. When the dependent variable is defined as a simple discrete density measure, the resulting distribution tends to display substantial zero inflation and high dispersion.
Such extreme discreteness not only weakens statistical robustness but also impairs machine learning algorithms’ ability to approximate nonlinear response functions [49]. In practice, zero-dominated targets may slow convergence, reduce predictive stability [50], and obscure the continuous spatial gradients that underlie recreational potential across heterogeneous terrain. To address these limitations, this study reformulates the dependent variable using a continuous spatial metric based on Euclidean distance. Specifically, Trail_Dist is defined as the Euclidean distance from any given location within the study area to the nearest recorded hiking trail. This transformation converts originally discrete behavioral traces into a spatially continuous surface with clear statistical properties. By encoding proximity rather than localized counts, the metric reduces zero inflation and enhances sensitivity to fine-scale environmental variation.

2.3. Analytical Framework

2.3.1. Research Workflow

This study develops an integrated framework that progresses from data-driven modeling to mechanism-oriented interpretation (Figure 2). The workflow consists of four sequential stages.
First, hiking network data from the 2bulu platform were transformed into continuous spatial behavioral variables using Euclidean distance metrics. Corresponding environmental attributes were extracted for each spatial unit to construct a multidimensional feature space.
Before model analysis, violin plots were generated to visualize the distribution of dependent variables across categorical environmental strata. The normality of the continuous variables was assessed using D’Agostino’s K 2 test, which combines skewness and kurtosis to evaluate deviations from normality. Given the large sample size (n_total > 5000), this test provides a reliable diagnosis. The Kruskal–Wallis H test was then applied to assess whether significant differences in trail distance existed across categories of environmental factors. Subsequently, Spearman’s rank correlation coefficient was calculated to quantify bivariate associations among pairs of explanatory variables and between each explanatory variable and the dependent variable.
Next, the processed dataset was randomly split into training and test sets at an 8:2 ratio. Three ensemble learning models, Random Forest (RF), LightGBM, and XGBoost, were implemented for comparative evaluation. Hyperparameters were optimized via grid search, and model performance was assessed using standard metrics, including R 2 and RMSE, to determine their effectiveness in approximating spatial behavioral patterns.
Lastly, the best-performing model (XGBoost) was further analyzed using the SHAP (SHapley Additive exPlanations) framework. Environmental drivers were examined from three complementary perspectives: global feature importance, nonlinear dependence structures, and local instance-level contribution paths, enabling a detailed reconstruction of the environmental logic underlying hiker spatial decisions.

2.3.2. Predictive Model and Interpretability Framework

Three mainstream tree-based models, Random Forest (RF), LightGBM, and XGBoost, were employed for training. XGBoost was selected as the core predictive engine due to its strong nonlinear approximation capabilities and robust handling of heterogeneous geographic data [51]. As an advanced implementation of Gradient Boosted Decision Trees (GBDTs), XGBoost iteratively constructs additive tree models to minimize a regularized objective function [52]:
L ( ϕ ) = i l ( y ^ i , y i ) + k Ω ( f k )
where l ( y ^ i , y i ) denotes the loss function measuring the discrepancy between predicted and observed values, and Ω ( f k ) represents the regularization term controlling model complexity to prevent overfitting.
By employing a second-order Taylor expansion of the loss function, XGBoost achieves more precise gradient estimation, enhancing its ability to capture subtle nonlinear perturbations in environmental response surfaces. In addition, its optimized tree-splitting and built-in handling of missing values make it particularly suitable for modeling spatial data in complex mountainous environments.
To address the interpretability challenges inherent in machine learning models, this study employs the SHAP framework, grounded in cooperative game theory. SHAP quantifies the marginal contribution of each feature to a model prediction using Shapley values. For any observation, the prediction can be expressed as:
f ( x ) = ϕ 0 + i = 1 M ϕ i
where ϕ 0 represents the baseline expectation across all samples, and ϕ i denotes the contribution of the i-th feature (e.g., elevation or distance to water) to the predicted value.
At the global level, the sign and magnitude of SHAP values reveal the strength and direction of environmental influences, overcoming the limitations of traditional importance rankings, which lack directional interpretation [53]. At the local level, visualization tools such as waterfall and decision plots reconstruct the contribution pathway for individual spatial units, thereby translating algorithmic outputs into geographically interpretable decision logic [54]. Through this integration of predictive modeling and interpretable analysis, the study establishes a coherent link between nonlinear algorithmic learning and mechanism-based spatial explanation.

3. Results

3.1. Correlation

The study first divided the samples into five groups at equal intervals based on altitude, and then created violin scatter plots for inter-group comparison to obtain a preliminary data distribution (Figure 3 and Figure 4). These findings suggest that hiker activity is strongly concentrated in lower-elevation zones, whereas higher-altitude areas constrained by steeper terrain and reduced accessibility support only sparse trail development. Low-elevation areas (Group 1: 973.2–1238.4 m) exhibit the highest mean trail density ( μ = 0.02 ), with greater dispersion than higher-elevation groups and a noticeable concentration of high-density outliers. As elevation increases, trail density declines sharply and stabilizes at near-zero levels, with mean values for Groups 2–5 approaching 0.00.
The Kruskal–Wallis test confirms that differences in trail density across elevation groups are statistically significant ( χ 2 ( 4 ) = 3322.82 , p < 0.001 ). Post hoc comparisons further indicate that, except for the contrast between the highest elevation groups (e.g., Group 4 vs. Group 5, p = 0.172 ), all pairwise differences involving the lowest elevation group are highly significant ( p < 0.001 ).
Given the non-normal distribution of most variables, Spearman’s rank correlation was employed to assess bivariate associations [55] (Figure 5).
Trail_Dens is significantly negatively correlated with elevation (DEM) and with distance to water sources (Water_Dist). This pattern aligns with conventional outdoor planning logic, in which trails tend to develop in lower, water-accessible areas to reduce risk and enhance comfort. Trail_Dens exhibits a positive correlation with mean temperature (Temp), suggesting that warmer conditions support higher recreational intensity. Moreover, moderate positive correlations with nighttime light indicators imply that dense trail networks tend to cluster near settlement edges or peri-urban zones, reflecting an urban-proximate spatial configuration.
Trail_Dens shows robust positive associations with NDVI_Mean and NDVI_Max, indicating that high-quality vegetation cover constitutes a core landscape attractor. Hiker movement thus demonstrates a clear greenness-seeking tendency. Strong spatial coherence is observed among the two trail-related variables. Trail_Dist is strongly negatively correlated with density-based indicators, confirming that areas with dense trail networks correspond to minimal spatial resistance and represent mature recreational nodes.
Overall, the trail system in the study area is strongly constrained by topography and exhibits a distinct concentration toward low-elevation, vegetation, water-accessible, and human-adjacent zones.

3.2. Models Comparison

Table 2 presents the comparative performance of Random Forest, XGBoost, and LightGBM in predicting Trail_Dens and Trail_Dist under optimized hyperparameter settings. Clear performance differentiation emerges across the two target variables, reflecting their distinct spatial characteristics.
Model performance was evaluated using three standard regression metrics: Coefficient of Determination ( R 2 ), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE). The following criteria were adopted to interpret reliability: R 2 > 0.80 indicates excellent predictive power, 0.60–0.80 indicates good, 0.40–0.60 indicates moderate, and <0.40 indicates limited explanatory power. RMSE and MAE were assessed relative to the scale of the target variable (Trail_Dist: 0–12,000; Trail_Dens: 0–0.12), with lower absolute values indicating greater reliability.
For Trail_Dens, all three models exhibit moderate explanatory power. The Random Forest model achieves the highest R 2 (0.373), followed by XGBoost (0.327) and LightGBM (0.279). The low R 2 values across models indicate the inherent difficulty of predicting highly discrete, spatially heterogeneous density patterns. Although differences in RMSE and MAE among models are small, Random Forest demonstrates slightly greater stability, due to its robustness in handling high-variance and zero-inflated data distributions through deep tree structures (max_depth = 30). Overall, the limited predictability of Trail_Dens suggests that localized trail concentration is influenced by complex micro-scale factors that the selected environmental variables may not fully capture.
In contrast, model performance improves for Trail_Dist, a continuous spatial indicator. XGBoost achieves the best overall results, with an R 2 of 0.869 and the lowest RMSE (858.15) and MAE (568.79). Random Forest and LightGBM also show strong predictive capacity, indicating that trail accessibility exhibits more stable, learnable spatial gradients than density-based measures. The superior performance of XGBoost with the parameter configuration (n_estimators = 200, max_depth = 12, learning_rate = 0.05) suggests that gradient boosting frameworks are particularly effective at capturing nonlinear interactions embedded in environmental constraints.
To further assess the robustness of the XGBoost model’s Trail_Dist predictions, residual diagnostics were conducted (Figure 6). The residual distribution is centered at zero, with a mean of 2.96, suggesting no substantial systematic bias in the predictions. The overall dispersion pattern appears symmetric, indicating that prediction errors are not strongly skewed toward overestimation or underestimation across the study area.

3.3. SHAP

To further interpret the model’s internal decision logic, SHAP-based analysis was employed. To ensure the robustness of the identification of driving factors, two complementary metrics were employed: model-intrinsic weight-based importance and mean absolute SHAP values (Figure 7). As shown in Figure 7a, the model relies heavily on topographic variables for its structural decision-making. DEM and Slope are the top two contributors, indicating that elevation and steepness are the most frequently utilized split nodes in the tree-based model. Figure 7b presents the relative impact of each feature on the final prediction. Interestingly, NTL_Range and DEM emerge as the two most influential variables. While DEM remains high in both metrics, NTL_Range shows a disproportionately significant impact on the output magnitude despite its lower split frequency. Collectively, DEM, Slope, Water_Dist, and NTL_Range are identified as the core drivers. The high ranking of Prec and Temp across both metrics further underscores the influence of hydrometeorological conditions on hikers’ spatial behavior.
The SHAP decision (Figure 8) plots reveal that NTL_Range, DEM, and Water_Dist contribute most substantially to prediction variability. Specifically, variations in elevation and distance to water sources exert pronounced marginal effects on predicted trail distance, while nighttime light variability reflects the influence of anthropogenic proximity. Importantly, the directionality of SHAP contributions is consistent with prior Spearman correlation results, reinforcing the internal coherence of the analytical framework. Furthermore, the correlation analysis between predicted values and total SHAP contributions shows that all data points lie precisely on the y = x line, validating the mathematical rigor and consistency of the additive explanation framework.
The core drivers of hikers’ travel trajectories were identified through a SHAP feature summary analysis (Figure 9). Elevation (DEM) is the most prominent positive driver of trajectory distance. The SHAP summary plot reveals that high DEM values correspond to the wide span of SHAP contributions, positively shifting the predicted values by over 4000 units in extreme cases. Slope (Slope) also exerts a significant influence, characterized by complex nonlinear features. The Maximum Normalized Difference Vegetation Index (NDVI_Max) exhibits a pronounced negative inhibitory effect, indicating that higher vegetation coverage tends to decrease the predicted Trail_Dist. Conversely, Distance to Water (Water_Dist) primarily acts as a positive driver. Nighttime Light Range (NTL_Range) contributes to the variance in model predictions, serving as a key socio-economic indicator influencing trajectory distribution.
To further elucidate the complex impact logic of the primary variables, this study analyzed the feature dependence and density distributions of the three most influential predictors (Figure 10, Figure 11, Figure 12 and Figure A1). Despite a global correlation of −0.264, NTL_Range demonstrates a parabolic U-shaped influence. For the majority of samples clustered at low range values, the impact is marginally negative. However, as the range increases beyond a standardized threshold of 15, the contribution becomes intensely positive, indicating that extreme variations in human activity levels function as a powerful driver for longer trajectories.
DEM shows a strong positive correlation (0.627) and follows a distinct exponential growth pattern. For standardized elevation values below 0.5, the SHAP contribution remains flat and close to zero. However, as elevation crosses this threshold, the contribution rises sharply, with SHAP values exceeding 4000 in high-altitude regions.
With a correlation of 0.632, Water_Dist exhibits a nonlinear trend characterized by an initial rapid increase in positive SHAP contribution as the distance increases from its minimum. The impact reaches a peak at a standardized value of approximately 1.5 before slightly tapering off, suggesting a saturation effect of water distance on trajectory length. The dependence plot reveals an interaction with Slope, where higher slope values (blue points) tend to amplify the positive contribution of water distance.
Using Waterfall Plots to interpret localized logic for representative samples (Figure 13 and Figure A2), this study reveals the spatial heterogeneity of factors influencing trajectory distribution. In regions where predicted values significantly exceed the mean, physical geographic factors dominate. DEM (+2357.06), NDVI_Max (+728.95), and Temperature (Temp, +602.66) collectively form the primary positive drivers for long-distance trajectories. In regions with low predicted values, anthropogenic interference and environmental constraints become the primary limiters. For instance, in Sample 25, the low NTL_Range ( 1177.37 ) and close Water_Dist (−430.41) are the core reasons for the extremely low predicted distance.
The association analysis between feature contributions (Absolute SHAP) and prediction error (Absolute Error) (Figure 14) shows that the model maintains a stable Mean Absolute Error (MAE) even when DEM and NDVI_Max contribute strongly. Although a slight upward trend in error is observed in extreme samples where Slope and Water_Dist have a strong influence, the overall trend lines remain flat. This indicates that the model retains high predictive stability and robustness.

4. Discussion

4.1. Theoretical Implications with Previous Similar Studies

This study was designed to answer three interrelated research questions. First, regarding how environmental variables nonlinearly shape wilderness spatial decisions, SHAP dependence analyses identified clear threshold effects. Elevation exhibited an exponential growth pattern with a physical exclusion threshold at 1238.4 m, above which SHAP contributions increased sharply; nighttime light range showed a parabolic U-shaped influence, turning strongly positive beyond a standardized value of 15; distance to water followed a rapid-rise-then-saturation curve, with high slopes amplifying its positive contribution; and maximum NDVI acted as a negative suppressor yet also provided a landscape compensation effect that induced off-trail shifts in extreme terrain. Second, the benefit of combining continuous spatial representation with ensemble learning was evident: transforming discrete trajectories into a continuous distance-based dependent variable (Trail_Dist) dramatically improved predictive performance. Third, regarding the emergence of distinct behavioral regimes, SHAP clustering revealed three data-driven hiker types: terrain-dominant, human–landscape coupled, and water-resource-dependent.
Existing research on urban mobility behavior indicates that residents’ spatial movement is highly dependent on well-connected hardened pavements and public facility grids [56]. However, empirical evidence from this study in Hohhot and its surrounding areas reveals that while the hiker exhibits a certain centripetal pull towards civilization, like a positive response to nighttime lights, their core routes are not attached to mature infrastructure. Instead, they are highly concentrated in the valley areas of low-mountain regions. The study found that, compared to geographical proximity, hikers place greater value on the similarity of mountain wilderness experiences. They tend to select and substitute between mountains that offer the same type of experience, such as all being wilderness or all being leisure-oriented [57]. This research seems to suggest that hikers with a preference for leisure are more numerous. However, their findings are mostly derived from interviews, and the voices of light hikers may not be represented in the data.
A study from Poland found that when tourists in the Sudetes Mountains choose hiking routes, they tend to prefer mountain paths that are covered with soil or gravel, allow them to avoid cars, connect various scenic and interesting attractions, are not easy to get lost on, and enable them to see the scenery without having to retrace their steps [58]. A study from Germany suggests that hikers are more inclined to seek closeness to nature, distance from daily life, and a sense of peace and relaxation while simultaneously exhibiting varying degrees of aversion to overly dense trail networks [59]. In contrast, hikers in Hohhot, while pursuing high landscape quality as indicated by NDVI, still demonstrate a high dependence on resource supply and the edges of civilization. This may be because Hohhot has not yet established a comprehensive shelter system, making long-distance traverses particularly hazardous.
Research in the Alps indicates that people’s perceptions of wilderness are not uniform. Some view it as uninhabited areas, while others see it as places where nature grows freely. These differing perceptions determine whether they support or oppose the expansion of wilderness [60]. However, some people also regard wilderness as the antithesis of civilization [61]. The concentration of hikers near the city in Hohhot, with sparse distribution throughout the mountains despite their extensive presence, stems from differing individual beliefs and needs regarding wilderness or nature experiences. For instance, most tourists near the city seek a simple, relaxed experience of nature; thus, the wilderness in their eyes is closer to an accessible area where nature can develop on its own. In contrast, for the minority of trekkers who truly venture deep into the mountains, they are pursuing remote, uninhabited, and challenging wilderness. This difference in beliefs leads to their distinct spatial distribution patterns. According to Guo’s research, the local area has long been dominated by animal husbandry [62], which partly explains why there are fewer trekkers compared to hikers in Hohhot.
Some studies suggest that building trails is a better option [63,64,65]. This seems to contradict the preference hikers have for wilderness. Hiking is considered to be a dangerous activity, especially in high-mountain areas. However, there are quite a few hikers whose purpose is leisure. Constructing trails may better support these recreational users, helping to steer them away from overly hazardous areas.

4.2. Methodological Contribution of the Research

This study makes three methodological contributions. First, by transforming discrete GPS trajectories into a continuous distance-based dependent variable (Trail_Dist), the zero-inflation and spatial discontinuity problems inherent in traditional density-based measures are overcome. This continuous representation enables ensemble learning models to capture smooth spatial gradients rather than fragmented counts, which is particularly critical for unstructured wilderness environments where trail distribution is highly heterogeneous. Second, the systematic comparison of three ensemble algorithms (Random Forest, XGBoost, LightGBM) under identical hyperparameter optimization demonstrates that gradient boosting frameworks, especially XGBoost, are superior in approximating nonlinear response functions of environmental variables. The R 2 values of 0.869 for Trail_Dist and 0.373 for Trail_Dens quantify the value of dependent variable transformation, a rarely tested yet crucial design choice in behavioral modeling. Third, the integration of SHAP with dependence and waterfall plots provides a replicable analytical pipeline for decoding not only global feature importance but also local threshold effects (e.g., 1238.4 m elevation), U-shaped influences (NTL_Range), and saturation patterns (Water_Dist). This pipeline moves beyond black-box prediction to offer mechanism-level interpretability, setting a template for future studies that aim to explain, not merely predict, human–environment interactions in weakly structured spaces.

4.3. Planning Implications

Based on the driving mechanisms decoded through machine learning, this study provides the following decision-making support for outdoor recreation management in Hohhot and similar inland mountain cities.
Given the clear urban edge preference displayed by hikers, management authorities should avoid indiscriminate closures and instead fully explore the recreational potential of suburban low-mountain areas. By planning semi-developed city greenways, with mountain trails connecting loops, high-intensity adventure pressure can be diverted from vulnerable core ecological zones to edge areas with regulatory capacity, achieving a balance between ecological protection and recreational demand.
Hikers exhibit a high degree of spatial attachment to low-mountain areas. These areas often represent high-risk zones for ecological degradation. Management authorities should integrate the hotspot maps predicted by the model and enhance the signage systems in vulnerable vegetation zones. By guiding hikers to follow designated trails, such measures can prevent trampling behaviors induced by the pursuit of optimal landscapes from causing soil patch degradation.
The study reveals hikers’ strong preference for valleys and low-lying water source areas. However, mountainous valleys are prone to sudden disasters such as floods and mudslides during the rainy season, posing significant safety risks. It is recommended that smart scenic area management systems install monitoring devices or emergency shelters at key valley nodes with high preference. A targeted real-time risk warning and dynamic control mechanism should be established.

5. Conclusions

By smoothing discrete trajectories into continuous spatial fields, ensemble learning (especially XGBoost, R 2 = 0.869) revealed that night-time light drives civic cohesion, elevation and slope impose physical constraints above 1238.4 m, and vegetation causes path deviation via landscape compensation. The main conclusions are as follows:
The global interpretation based on SHAP revealed the differentiated driving characteristics of environmental factors. Night-time light range emerged as the primary anthropogenic factor guiding spatial decision-making, showing a clear tendency towards civic cohesion. In contrast, elevation and slope were identified as core physical constraint factors, with a significant suppression effect above 1238.4 m, defining the physical boundary for hiking activities.
The waterfall plot analysis of individual samples quantified the contribution paths of environmental variables to specific decisions. The study found strong spatial coupling between elevation and slope, and in extreme terrain, hikers’ dependence on water resources significantly increased. Meanwhile, the landscape compensation effect exhibited by a high vegetation index was a key factor in causing hikers to deviate from established paths and in generating spatial shifts.
SHAP clustering identified three distinct hiker behavioral patterns: terrain-dominant, human–landscape coupled, and water-resource-dependent. The discovery of this group heterogeneity demonstrates essential differences in hikers’ environmental perception and spatial selection logic. This provides scientific empirical evidence for implementing a segmented strategy and a precise guidance intelligent scenic area management system.
Although this study has improved model accuracy and multidimensional interpretability, optimization is still required for complex environmental modeling. Reliance on quantitative data limits the inclusion of hikers’ subjective perceptions and motivations. Trajectory data alone cannot distinguish between distinct subgroups such as adventure seekers and recreationists, whose behavioral logic differs and may show spatial non-stationarity. The Trail_Dist metric captures spatial potential but not actual usage intensity or stopping frequency. The 500-m sampling scale, while computationally balanced, may oversimplify fragmented micro-landforms and vegetation zones.
Future research should integrate qualitative data to capture perceptual and emotional dimensions, apply geographically weighted regression to address spatial heterogeneity, and incorporate dynamic kernel density estimation with temporal and frequency weights to model both spatial potential and actual trail use. Multiscale experiments with finer spatial resolution are needed to determine the optimal decision granularity for capturing hiker behavior.

Author Contributions

Conceptualization, Q.G. and Y.Z.; methodology, Q.G.; software, Y.Z.; validation, X.B. and Y.Z.; formal analysis, Q.G.; investigation, S.C. and X.B.; resources, S.C.; data curation, S.C.; writing—original draft preparation, Q.G. and Y.Z.; writing—review and editing, Y.Z.; visualization, S.C.; supervision, Y.Z.; project administration, Y.Z.; funding acquisition, Q.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 52408072.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be available on request.

Acknowledgments

The authors would like to thank the 2bulu platform for providing the hiking trajectory data. We are also grateful to the anonymous reviewers for their constructive comments that helped improve this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
AbbreviationFull Form/Description
RFRandom Forest
XGBoosteXtreme Gradient Boosting
LightGBMLight Gradient Boosting Machine
GBDTGradient Boosted Decision Trees
SHAPSHapley Additive exPlanations
DEMDigital Elevation Model
SlopeSlope
PrecAnnual Precipitation
TempAnnual Mean Temperature
Water_DistDistance to Water Systems
NDVINormalized Difference Vegetation Index
NDVI_MaxNDVI Annual Maximum
NDVI_MinNDVI Annual Minimum
NDVI_MeanNDVI Annual Mean
NDVI_Range NDVI Annual Range
NTL Nighttime Light
NTL_Max NTL Annual Maximum
NTL_Min NTL Annual Minimum
NTL_Mean NTL Annual Mean
NTL_Range NTL Annual Range
Trail_Dens Hiking Road Density
Trail_Dist Distance to Hiking Trails
R2 R-squared (Coefficient of Determination)
RMSE Root Mean Square Error
MAE Mean Absolute Error
OLS Ordinary Least Squares
GPS Global Positioning System
GIS Geographic Information System
2bulu Two-step Road (Chinese outdoor hiking platform)

Appendix A

Figure A1. (a) SHAP dependence and density analysis for NDVI_Max. (b) SHAP dependence and density analysis for Slope.
Figure A1. (a) SHAP dependence and density analysis for NDVI_Max. (b) SHAP dependence and density analysis for Slope.
Land 15 00715 g0a1
Figure A2. (a) Waterfall plot for an individual sample (Sample 25) reconstructing the specific environmental logic for long-distance hiking trajectories. (b) Waterfall plot for an individual sample (Sample 75) reconstructing the specific environmental logic for long-distance hiking trajectories.
Figure A2. (a) Waterfall plot for an individual sample (Sample 25) reconstructing the specific environmental logic for long-distance hiking trajectories. (b) Waterfall plot for an individual sample (Sample 75) reconstructing the specific environmental logic for long-distance hiking trajectories.
Land 15 00715 g0a2

References

  1. Bichler, B.; Peters, M. Soft adventure motivation: An exploratory study of hiking tourism. Tour. Rev. 2020, 76, 473–488. [Google Scholar] [CrossRef]
  2. Ahmed, Z.; Nihei, T. Assessing the environmental impacts of adventure tourism in the world’s highest mountains: A comprehensive review for promoting sustainable tourism in high-altitude areas. J. Adv. Res. Soc. Sci. Humanit. 2024, 9, 1–15. [Google Scholar] [CrossRef]
  3. Winter, P.L.; Selin, S.; Cerveny, L.; Bricker, K. Outdoor recreation, nature-based tourism, and sustainability. Sustainability 2019, 12, 81. [Google Scholar] [CrossRef]
  4. Nordin, M.R.; Jamal, S.A. Hiking tourism in Malaysia: Origins, benefits and post COVID-19 transformations. Int. J. Acad. Res. Bus. Soc. Sci. 2021, 11, 88–100. [Google Scholar] [CrossRef] [PubMed]
  5. Duc, M.D.; Duy, L.T.; Thi Thanh, T.N.; Le Minh, T. Human–nature relations in the Anthropocene: Responsible hiking and ecological balance in Thiềng Liềng. J. Outdoor Recreat. Tour. 2025, 51, 100926. [Google Scholar] [CrossRef]
  6. Cooke, M.T.; Xia, L. Impacts of land-based recreation on water quality. Nat. Areas J. 2020, 40, 179–188. [Google Scholar] [CrossRef]
  7. Cole, D.N. Impacts of hiking and camping on soils and vegetation: A review. In Environmental Impacts of Ecotourism; CABI: Wallingford, UK, 2004; pp. 41–60. [Google Scholar] [CrossRef]
  8. Ballantyne, M.; Gudes, O.; Pickering, C.M. Recreational trails are an important cause of fragmentation in endangered urban forests: A case-study from Australia. Landsc. Urban Plan. 2014, 130, 112–124. [Google Scholar] [CrossRef]
  9. Wolf, I.D.; Croft, D.B. Impacts of tourism hotspots on vegetation communities show a higher potential for self-propagation along roads than hiking trails. J. Environ. Manag. 2014, 143, 173–185. [Google Scholar] [CrossRef]
  10. Liedtke, R.; Barros, A.; Essl, F.; Lembrechts, J.J.; Wedegärtner, R.E.; Pauchard, A.; Dullinger, S. Hiking trails as conduits for the spread of non-native species in mountain areas. Biol. Invasions 2020, 22, 1121–1134. [Google Scholar] [CrossRef]
  11. Marzano, M.; Dandy, N. Recreationist behaviour in forests and the disturbance of wildlife. Biodivers. Conserv. 2012, 21, 2967–2986. [Google Scholar] [CrossRef]
  12. Dertien, J.S.; Larson, C.L.; Reed, S.E. Recreation effects on wildlife: A review of potential quantitative thresholds. Nat. Conserv. 2021, 44, 51–68. [Google Scholar] [CrossRef]
  13. Maté-González, M.Á.; Sáez Blázquez, C.; Herranz Herranz, D.; Camargo Vargas, S.A.; Martín Nieto, I. GIS-Based Assessment of Shaded Road Segments for Enhanced Winter Risk Management. Remote Sens. 2026, 18, 476. [Google Scholar] [CrossRef]
  14. Yulu, A.; Şekertekin, A. Why Climbers Die on Mount Ağrı (Ararat)?: Risks, gaps, and safety strategies. J. Geogr. 2025, 184–200. [Google Scholar] [CrossRef]
  15. Scanlon, R. Surviving the Trail: Five Essential Skills to Prepare Every Hiker for Adventure’s Most Common Perils; Simon and Schuster: New York, NY, USA, 2025. [Google Scholar]
  16. Wu, K.; Xing, A.; Zhou, J.; Su, L.; Zhang, S.; Yang, S. Influencing factors of sports tourism safety accidents in Tibet, China: fsQCA analysis based on the SCM. PLoS ONE 2025, 20, e0334226. [Google Scholar] [CrossRef] [PubMed]
  17. Rausch, L.; Limmer, M.; Pocecco, E.; Ruedl, G.; Posch, M.; Faulhaber, M. Sex-specific analysis of hiking accidents in the Austrian Alps: A follow-up from 2015 to 2021. AIMS Public Health 2024, 11, 160–175. [Google Scholar] [CrossRef]
  18. Fu, X.; Zhang, Y.; Ortúzar, J.d.D.; Lü, G. Activity-travel pattern inference based on multi-source big data. Transp. Rev. 2025, 45, 26–48. [Google Scholar] [CrossRef]
  19. Šauer, M.; Pařil, V.; Jandová, M.; Paleta, T.; Farbiak, M. How Much Does Mobile Phone Data Reveal Mobility and Tourist Behaviour? Available online: https://ssrn.com/abstract=5197715 (accessed on 10 April 2026).
  20. Connolly, C.; Steinbach, S.; Vo, M.; Wan, X. Harnessing Human Mobility Data for Applied Economic Research: Current Knowledge, Challenges, and Emerging Opportunities. J. Econ. Surv. 2025, early view. [Google Scholar] [CrossRef]
  21. Zhang, C.; Zhang, K.; Yuan, Q.; Zhang, L.; Hanratty, T.; Han, J. Gmove: Group-level mobility modeling using geo-tagged social media. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2016; pp. 1305–1314. [Google Scholar]
  22. Liu, L.; Wang, R.; Guan, W.W.; Bao, S.; Yu, H.; Fu, X.; Liu, H. Assessing reliability of Chinese geotagged social media data for spatiotemporal representation of human mobility. ISPRS Int. J. Geo-Inf. 2022, 11, 145. [Google Scholar] [CrossRef]
  23. Noviansyah, A.; Uno, N.; Matsunaka, R.; Nishigaki, T. Analyzing temporal mobility pattern shifts: Dynamic origin–destination estimation from aggregated mobile phone location data. Transp. Res. Interdiscip. Perspect. 2026, 36, 101806. [Google Scholar] [CrossRef]
  24. Liu, E.; Wang, X.; Wang, Y.; Zhao, D.; Niu, X.; Lu, X. A hybrid deep learning model for human mobility prediction. J. Phys. Complex. 2026, 7, 015009. [Google Scholar] [CrossRef]
  25. Xian, Y.; Chen, M.; Hu, M.; Chen, L. Unveiling pattern and structure of inter-urban mobility: Integrating flow space and geospatial information. Cities 2026, 171, 106789. [Google Scholar] [CrossRef]
  26. Yang, C.; Huang, H.; Meng, S.; Zhang, Y.; Lo, J.T.; Ma, R. Typology-based spatial modeling of urban block commercial vitality: Evidence from Shenzhen for land use planning. Land Use Policy 2026, 164, 107973. [Google Scholar] [CrossRef]
  27. Liu, R.; Xia, H.; Li, L.; Li, Q.; Liu, S. Transit oriented development under the influence of urban air mobility: A public transit-based vertiport siting method. J. Air Transp. Manag. 2026, 133, 102962. [Google Scholar] [CrossRef]
  28. Liu, W.; Cai, H.; Liu, B.; Xing, H.; Gong, L. Rethinking spatial community detection in human mobility: A random walk-based method. Cities 2026, 170, 106674. [Google Scholar] [CrossRef]
  29. Tweed, W.C. Uncertain Path: A Search for the Future of National Parks; University of California Press: Berkeley, CA, USA, 2010. [Google Scholar]
  30. Peng, J.; Tang, J.; Deng, M.; Liu, H.; Hu, Z.; Jiang, X.; Xiang, J.; Ning, X.; Zhao, W. Discovering spatiotemporal patterns of human outdoor activities with crowdsourced trajectory data. Geo-Spat. Inf. Sci. 2025, 28, 1214–1236. [Google Scholar] [CrossRef]
  31. Chen, C.; Zhao, W.; Zhao, B. Mechanisms Underlying the Effects of Recreation Services on Tourist Satisfaction in Forest Parks: A Case Study of the Yangtze River Delta, China. Sustainability 2026, 18, 1936. [Google Scholar] [CrossRef]
  32. Wang, X.; Mei, J.; Mei, Z.; Cheng, H.; Li, W.; Wang, L.; Chen, D.; Wang, Y.; Gao, Z. Spatio-Temporal Dynamics, Driving Forces, and Location–Distance Attenuation Mechanisms of Beautiful Leisure Tourism Villages in China. Land 2026, 15, 250. [Google Scholar] [CrossRef]
  33. Kosamkar, P.; Kulkarni, V. Forecasting Influencing Parameter for CO2 and CH4 Emission from Agriculture Using Lasso and Ridge Regression. In Application of Machine Learning in Earth Sciences: A Practical Approach; Springer Nature: Cham, Switzerland, 2026; pp. 461–474. [Google Scholar] [CrossRef]
  34. Schroeder, M.A.; Lander, J.; Levine-Silverman, S. Diagnosing and Dealing with Multicollinearity. West. J. Nurs. Res. 1990, 12, 175–187. [Google Scholar] [CrossRef] [PubMed]
  35. Du, Z.; Wang, Z.; Wu, S.; Zhang, F.; Liu, R. Geographically neural network weighted regression for the accurate estimation of spatial non-stationarity. Int. J. Geogr. Inf. Sci. 2020, 34, 1353–1377. [Google Scholar] [CrossRef]
  36. Tang, G.; Du, X.; Wang, S. Impact mechanisms of 2D and 3D spatial morphologies on urban thermal environment in high-density urban blocks: A case study of Beijing’s Core Area. Sustain. Cities Soc. 2025, 123, 106285. [Google Scholar] [CrossRef]
  37. Vogel, R.M.; Papalexiou, S.M.; Lamontagne, J.R.; Dolan, F.C. When Heavy Tails Disrupt Statistical Inference. Am. Stat. 2025, 79, 221–235. [Google Scholar] [CrossRef]
  38. Zube, E.H.; Sell, J.L.; Taylor, J.G. Landscape perception: Research, application and theory. Landsc. Plan. 1982, 9, 1–33. [Google Scholar] [CrossRef]
  39. Chen, J.; Cao, X.; Yang, D.; Huang, R.; Song, H.; Huang, L.; Hu, S. Deciphering socio-spatial drivers of visitor sentiment in cultural heritage tourism: A text mining and XGBoost-based empirical study of Go Seigen in Fuzhou, China. Asia Pac. J. Tour. Res. 2026, 1–24. [Google Scholar] [CrossRef]
  40. Liu, X.; Wang, S.; Tang, G. Understanding nonlinear and spatially heterogeneous effects of urban residential morphology on land surface temperature: Integrating SOM, XGBoost-SHAP, and GWR models. Sustain. Cities Soc. 2026, 136, 107100. [Google Scholar] [CrossRef]
  41. Zhang, Y.; Ge, J.; Wang, S.; Dong, C. Optimizing urban green space configurations for enhanced heat island mitigation: A geographically weighted machine learning approach. Sustain. Cities Soc. 2025, 119, 106087. [Google Scholar] [CrossRef]
  42. Zhou, K.; Zheng, X.; Huang, S.; Li, H.; Yin, H. Quantifying the combined and individual impacts of climate and human activity on the urban green space carbon sink capacity in Beijing. Sustain. Cities Soc. 2025, 122, 106253. [Google Scholar] [CrossRef]
  43. Zhang, J.; Hong, S.; Shi, S.; Chen, B.; Wu, S. Synergistic impacts of 2D/3D grey-green spatial morphology on land surface temperature across local climate zones in the Guangdong-Hong Kong-Macao Greater Bay Area. Sustain. Cities Soc. 2026, 139, 107231. [Google Scholar] [CrossRef]
  44. Gao, F.; Liu, H.; Guo, Q.; Xin, Y. Explainable Machine Learning Reveals Nonlinear Human-Environment Relationships in Mid-Holocene Settlement Patterns Across China. Available online: https://ssrn.com/abstract=6149458 (accessed on 10 April 2026).
  45. Yuan, B.; Guo, S.; Mu, H.; Pan, X.; Li, C.; Xia, Z.; Zhang, X.; Du, P. Assessment of land surface vulnerability using time-series geospatial datasets. Ecol. Inform. 2025, 89, 103178. [Google Scholar] [CrossRef]
  46. Helbing, D.; Keltsch, J.; Molnár, P. Modelling the evolution of human trail systems. Nature 1997, 388, 47–50. [Google Scholar] [CrossRef]
  47. Chen, Z.; Yu, B.; Yang, C.; Zhou, Y.; Yao, S.; Qian, X.; Wang, C.; Wu, B.; Wu, J. An extended time series (2000–2018) of global NPP-VIIRS-like nighttime light data from a cross-sensor calibration. Earth Syst. Sci. Data 2021, 13, 889–906. [Google Scholar] [CrossRef]
  48. Xiao, Y.; Lin, J.; Zhang, X.; Zhang, M.; Chen, W.; Li, J. Designing outdoor emergency rescue stations based on the spatiotemporal behavior of outdoor adventure tourists using GPS trajectory data. Saf. Sci. 2024, 175, 106497. [Google Scholar] [CrossRef]
  49. Belkin, M.; Hsu, D.; Ma, S.; Mandal, S. Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proc. Natl. Acad. Sci. USA 2019, 116, 15849–15854. [Google Scholar] [CrossRef] [PubMed]
  50. Lambert, D. Zero-Inflated Poisson Regression, With an Application to Defects in Manufacturing. Technometrics 1992, 34, 1–14. [Google Scholar] [CrossRef]
  51. Lou, L.; Ma, W.; Cheng, P.; Liu, H.; Huang, Y. Climatic and Fuel Drivers of Lightning-Induced Forest Fire Burned Area in the Da Hinggan Ling Region, Northeast China. Remote Sens. 2026, 18, 657. [Google Scholar] [CrossRef]
  52. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16); Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar] [CrossRef]
  53. Zhang, J.; Hong, S.; Chen, B.; Wu, S. Multiscale synergistic effects of urban green space morphology on heat-pollution: A case study of Guangdong-Hong Kong-Macao Greater Bay Area, China. Ecol. Indic. 2025, 173, 113390. [Google Scholar] [CrossRef]
  54. Seraj, H.; Abbaspour, A.; Bahadori-Jahromi, A. Interpretable Data-Driven Models for Energy Performance Assessment in Residential Buildings. Sustainability 2026, 18, 457. [Google Scholar] [CrossRef]
  55. Mauldin, R.L.; Mattingly, S.P.; Jang, S.; Handique, S.; Haque, M.; Parekh, R. Quantifying the Spatial Burden of Informal Ride Provision for Older Adults Using Activity Space Analysis and GIS. ISPRS Int. J. Geo-Inf. 2026, 15, 86. [Google Scholar] [CrossRef]
  56. Zhong, Y.; Guo, M.; Zhang, M.; Tan, L. Identifying Street Environmental Factors That Attract Public Attention from the Jogger’s Perspective: A Multiscale Spatial Exploration. Buildings 2024, 14, 1935. [Google Scholar] [CrossRef]
  57. Thiene, M.; Scarpa, R. Hiking in the Alps: Exploring Substitution Patterns of Hiking Destinations. Tour. Econ. 2008, 14, 263–282. [Google Scholar] [CrossRef]
  58. Kołodziejczyk, K. Factors determining changes in the network of marked hiking trails in the Sudetes. J. Mt. Sci. 2024, 21, 1075–1099. [Google Scholar] [CrossRef]
  59. Bachinger, M.; Hafner, M.; Harprecht, P. Cultural and space-based factors influencing recreational conflicts in forests. The example of cyclists and other forest visitors in Freiburg (Germany). Eur. J. Cult. Manag. Policy 2024, 13, 12494. [Google Scholar] [CrossRef]
  60. Zoderer, B.M.; Tasser, E. The plurality of wilderness beliefs and their mediating role in shaping attitudes towards wilderness. J. Environ. Manag. 2021, 277, 111392. [Google Scholar] [CrossRef]
  61. Buijs, A.E. Lay People’s Images of Nature: Comprehensive Frameworks of Values, Beliefs, and Value Orientations. Soc. Nat. Resour. 2009, 22, 417–432. [Google Scholar] [CrossRef]
  62. Xu, H.; Guo, Q.; Siqin, C.; Li, Y.; Gao, F. Study of Settlement Patterns in Farming–Pastoral Zones in Eastern Inner Mongolia Using Planar Quantization and Cluster Analysis. Sustainability 2023, 15, 15077. [Google Scholar] [CrossRef]
  63. Ravinsky Raichel, N.; Yahel, H. Planning Challenges and Opportunities in the Conservation of National Trails: The Case of the Israel National Trail. Land 2024, 13, 1449. [Google Scholar] [CrossRef]
  64. Schweinsberg, S. Tourism and trails: Cultural, ecological and management issues. Ann. Leis. Res. 2017, 20, 123–124. [Google Scholar] [CrossRef]
  65. Upadhayaya, P.K. Sustainable Management of Trekking Trails for the Adventure Tourism in Mountains: A Study of Nepal’s Great Himalaya Trails. J. Tour. Adventure 2018, 1, 1–31. [Google Scholar] [CrossRef]
Figure 1. Study area.
Figure 1. Study area.
Land 15 00715 g001
Figure 2. Integrated research workflow: From multi-source data processing to explainable machine learning modeling.
Figure 2. Integrated research workflow: From multi-source data processing to explainable machine learning modeling.
Land 15 00715 g002
Figure 3. Distribution of Trail_Dens across different elevation (DEM) groups with Kruskal–Wallis test results. *** p < 0.001. The p-value indicates the probability the observed difference is due to chance.
Figure 3. Distribution of Trail_Dens across different elevation (DEM) groups with Kruskal–Wallis test results. *** p < 0.001. The p-value indicates the probability the observed difference is due to chance.
Land 15 00715 g003
Figure 4. Distribution of Trail_Dist across different elevation (DEM) groups with Kruskal–Wallis test results. *** p < 0.001. The p-value indicates the probability the observed difference is due to chance.
Figure 4. Distribution of Trail_Dist across different elevation (DEM) groups with Kruskal–Wallis test results. *** p < 0.001. The p-value indicates the probability the observed difference is due to chance.
Land 15 00715 g004
Figure 5. Spearman rank correlation heatmap between environmental variables and hiking trail metrics. * p < 0.05, ** p < 0.01, *** p < 0.001. The p-value indicates the probability the observed difference is due to chance.
Figure 5. Spearman rank correlation heatmap between environmental variables and hiking trail metrics. * p < 0.05, ** p < 0.01, *** p < 0.001. The p-value indicates the probability the observed difference is due to chance.
Land 15 00715 g005
Figure 6. Predictive diagnostics of the XGBoost model for Trail_Dist: Target variable distribution (left) and residual distribution analysis (right).
Figure 6. Predictive diagnostics of the XGBoost model for Trail_Dist: Target variable distribution (left) and residual distribution analysis (right).
Land 15 00715 g006
Figure 7. (a) Feature importance based on split weights. (b) Feature importance based on mean absolute SHAP values.
Figure 7. (a) Feature importance based on split weights. (b) Feature importance based on mean absolute SHAP values.
Land 15 00715 g007
Figure 8. SHAP decision plots (left) and consistency verification between predicted values and additive SHAP contributions (right).
Figure 8. SHAP decision plots (left) and consistency verification between predicted values and additive SHAP contributions (right).
Land 15 00715 g008
Figure 9. SHAP summary plot (beeswarm) illustrating the distribution and directionality of feature impacts on Trail_Dist.
Figure 9. SHAP summary plot (beeswarm) illustrating the distribution and directionality of feature impacts on Trail_Dist.
Land 15 00715 g009
Figure 10. SHAP dependence and density analysis for NTL_Range.
Figure 10. SHAP dependence and density analysis for NTL_Range.
Land 15 00715 g010
Figure 11. SHAP dependence and density analysis for DEM.
Figure 11. SHAP dependence and density analysis for DEM.
Land 15 00715 g011
Figure 12. SHAP dependence and density analysis for Water_Dist.
Figure 12. SHAP dependence and density analysis for Water_Dist.
Land 15 00715 g012
Figure 13. Waterfall plot for individual sample (Sample 0) reconstructing the specific environmental logic for long-distance hiking trajectories.
Figure 13. Waterfall plot for individual sample (Sample 0) reconstructing the specific environmental logic for long-distance hiking trajectories.
Land 15 00715 g013
Figure 14. Scatter plots showing the relationship between absolute SHAP contributions and prediction errors for key features.
Figure 14. Scatter plots showing the relationship between absolute SHAP contributions and prediction errors for key features.
Land 15 00715 g014
Table 1. Definition and descriptive statistics of explanatory variables for hiking behavior modeling.
Table 1. Definition and descriptive statistics of explanatory variables for hiking behavior modeling.
CategoryVariable NameAbbreviation/Code
TopographyElevationDEM
SlopeSlope
ClimateAnnual PrecipitationPrec
Annual Mean TemperatureTemp
HydrologyDistance to Water SystemsWater_Dist
Vegetation (NDVI)NDVI Annual MaximumNDVI_Max
NDVI Annual MinimumNDVI_Min
NDVI Annual MeanNDVI_Mean
NDVI Annual RangeNDVI_Range
Nighttime Light (NTL)NTL Annual MaximumNTL_Max
NTL Annual MinimumNTL_Min
NTL Annual MeanNTL_Mean
NTL Annual RangeNTL_Range
Hiking NetworkHiking Road DensityTrail_Dens
Distance to Hiking TrailsTrail_Dist
Table 2. Comparative performance of machine learning models for Trail_Dens and Trail_Dist prediction.
Table 2. Comparative performance of machine learning models for Trail_Dens and Trail_Dist prediction.
Target VariableModelR2RMSEMAEn_EstimatorsMax_DepthLearning_Rate
Trail_DensRandom Forest0.3730.0110.00320030
XGBoost0.3270.0120.003200120.05
LightGBM0.2790.0120.00310090.1
Trail_DistRandom Forest0.828980.751681.71220030
XGBoost0.869858.155568.787200120.05
LightGBM0.824992.651699.637200−10.1
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Guo, Q.; Chen, S.; Bai, X.; Zhang, Y. Landscape Drivers of Trail Formation in Peri-Urban Mountains: Insights from an Explainable Machine Learning Approach. Land 2026, 15, 715. https://doi.org/10.3390/land15050715

AMA Style

Guo Q, Chen S, Bai X, Zhang Y. Landscape Drivers of Trail Formation in Peri-Urban Mountains: Insights from an Explainable Machine Learning Approach. Land. 2026; 15(5):715. https://doi.org/10.3390/land15050715

Chicago/Turabian Style

Guo, Qin, Shili Chen, Xueyue Bai, and Yue Zhang. 2026. "Landscape Drivers of Trail Formation in Peri-Urban Mountains: Insights from an Explainable Machine Learning Approach" Land 15, no. 5: 715. https://doi.org/10.3390/land15050715

APA Style

Guo, Q., Chen, S., Bai, X., & Zhang, Y. (2026). Landscape Drivers of Trail Formation in Peri-Urban Mountains: Insights from an Explainable Machine Learning Approach. Land, 15(5), 715. https://doi.org/10.3390/land15050715

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop