Identifying and Mapping Groups of Protected Area Visitors by Environmental Awareness

: Protected areas worldwide receive billions of visitors annually. The positive impact of nature on health and wellbeing, in addition to providing opportunities for cultural activities such as recreation and aesthetic appreciation, is well documented. Management to reduce negative impacts to biodiversity and conservation aims whilst providing amenities and access to visitors is important. Understanding environmental awareness of visitors and their on-site spatial patterns can assist in making effective management decisions within often constrained resources. However, there is a lack of strategies for site-speciﬁc identiﬁcation and predictive mapping of visitors by environmental awareness. Here, we demonstrate a method to map on-site visitation by latent groups of visitors based on their environmental awareness of on-site issues. On-site surveys and participatory mapping were used to collect data on environmental awareness on bird nesting and spatial visitation patterns in an upland moor in northern England. Latent class analysis (LCA), a structural equation model, was used to discover underlying groups of environmental awareness, with random forest (RF) modelling, a machine learning technique, using a range of on-site predictors (bioclimatic, land cover, elevation, viewshed, and proximity to paths and freshwater) to predict and map visitation across the site by each group. Visitors were segmented into ‘aware’ and ‘ambiguous’ groups and their potential spatial visitation patterns mapped. Our results demonstrate the ability to uncover groups of users by environmental awareness and map their potential visitation across a site using a variety of on-site predictors. Spatial understanding of the movement patterns of differently environmentally aware groups of visitors can assist in efﬁcient targeting of conservation education endeavours (i.e., signage, positioning of staff, monitoring programmes, etc.), therefore maximising their efﬁcacy. Furthermore, we anticipate this method will be of importance to environmental managers and educators when deploying limited resources.


Introduction
Much of the environmental deterioration evidenced around the world, including the alarming rate of biodiversity loss, can be attributed to humans [1,2]. Indeed, Schultz [2] argues that only through changing human behaviour can conservation have a chance of success and that identifying those behaviours that need to change, rather than broad education and awareness-raising campaigns, should be the priority. With visitation to protected areas expected to increase [3], already stretched resources for improving visitors' awareness of key site objectives, such as protecting bird species, will only need to go further.
The impact of human disturbance on birds can manifest in multiple ways, through a combination of changing distributions, behaviour, demography or population size [4]. For example, during sensitive periods of the year, such as territory establishment, even low levels of human disturbance can alter bird behaviour [5]. Bird disturbance caused by humans has been studied in various habitats, including forests [6,7], coastal areas [8], and upland habitats such as moorland [9,10]. The risk of negative impacts to birds is considered serious enough to be written into international law prohibiting deliberate disturbance (EU Birds Directive 2009).
Separating nature area visitors into groups has been explored for several scenarios in literature, including segmentation by motivation for visiting a site, to best respond to visitor needs [11,12]. Several studies have used a market segmentation approach to group visitors [13,14], with Halpenny [15] investigating place attachments as a predictor of pro-environmental behaviours. Conservation social sciences can be very useful in guiding conservation actions and outcomes that are effective and robust [16]. For example, Booth et al. [17] found substantial variation in protected status awareness in site visitors, with only conservation organisation membership being of predictive significance for being more informed.
Context-specific maps can improve and/or enable replication of strategic environmental assessment mapping to aid conservation decision-making [18], and knowledge of visitor preferences offers insights to inform destination development and promotion of ecotourism within a site [19]. In particular, mapping to support environmental decision-making provides an approach to understand where policy interventions could be most effective [20] and is a valid mechanism to translate scientific findings into tangible products that can be used by local on-site practitioners. Conservation education is the mechanism through which awareness and concern for the environment are raised and can take several forms, from free-choice learning to structured initiatives [16,21,22]. Understanding environmental awareness on-site can help in personal delivery of on-site minimum impact education strategies, as messages to park visitors have been found to minimise off-trail behaviour, whilst physical signs were found to be ineffective in preventing this behaviour [23].
Identifying visitation patterns of nature site users lacking in awareness of on-site issues is important. As lower awareness may result in unwanted behaviour, such as disturbance to birds or other wildlife, predicting areas these users may visit on-site is vital for both management and mitigation strategies (e.g., limiting access to sensitive areas or targeted education campaigns). In this study, we aim to provide a replicable method that can be used to both define and spatially map different types of protected area visitors, based on their environmental awareness of on-site issues. We use latent class analysis (LCA), a type of structural equation model, to group visitors based on environmental awareness and random forest (RF) modelling, a machine learning method, with participatory mapping data to produce predictive maps of visitation. This is demonstrated for a protected English upland moor that has importance for protected bird populations.

Study Site and Context
Ilkley Moor, an upland moor in northern England, UK (53 • 54 11.1 N, 1 • 49 36.9 W), Figure 1, is dominated by heather moorland over acidic soils from the Millstone Grit underlying it [22]. It is 402 m above sea level, which exposes it to harsh winds in the winter. Its nature conservation importance is recognised through international and national designations, including Site of Special Scientific Interest, Special Area of Protection, and Special Area of Conservation. In addition to being of high ecological importance, the site provides cultural ecosystem services such as recreation and sense of place, as well as providing regulating and supporting services including carbon storage in peat, water and air purification, and floodwater retention. The study site in multiple dimensions: (a) aerial view using Sentinel 2 natural colour (bands 432) mosaic imagery [23], (b) location of site in relation to the British Isles, (c) example photographs taken on-site, with surrounding landscape views, and (d) Red List Species of Conservation Concern [24,25] recorded on the site between 2015 and 2019 during the British Trust for Ornithology (BTO) British Breeding Bird Surveys [26]. The BTO/JNCC/RSPB Breeding Bird Survey is a partnership jointly funded by the BTO, RSPB, and JNCC, with fieldwork conducted by volunteers. All bird images are published on www.wikimedia.org and licensed under Creative Commons; see Table S1 for full attributions. © contains Copernicus data (2019) and JNCC/NE/NRW/SNH/NIEA data, © copyright and database right 2019.
In this study, we specifically consider environmental and ecological awareness regarding nesting birds on Ilkley Moor, which experiences large numbers of visitors. During the bird breeding season, the implications of disturbance for outcomes of nesting attempts range across multiple impacts, including nest failure [24], impaired nestling growth [7], reduction in the areas suitable for breeding [25], and immunosuppression of fledglings [26], which all put pressure on individual birds and future recruitment into the local population [27]. The strength of these impacts can vary widely with species and type of disturbance; they may only have mild effects on fitness or cause total breeding season failure [28,29]. Hence, it is important for disturbance at this time to be kept to a minimum.
The BTO's Breeding Bird Survey [30] has recorded 70 bird species on Ilkley Moor in the last 10 years, declining to 63 in the last 5 years. The seven lost species include the chiffchaff (Phylloscopus collybita), coal tit (Periparus ater), collared dove (Streptopelia decaocto), and hobby (Falco subbuteo); two Red listed species, the grasshopper warbler (Locustella naevia) and tree sparrow (Passer montanus); and one Amber listed species, the house martin (Delichon urbicum) [31]. In the last five years, 16 Amber listed species have been recorded (see Table S2 for full lists of bird species that can be found on-site).

Independent Variables; Survey and Participatory Mapping
A total of 124 surveys were conducted in the summer of 2019 with visitors to the study site. The survey contained questions covering demographics and questions related to participants' experiences on-site. Here we focus on five environmental awareness questions (Table 1) structured in the five-point Likert format [32,33] and a participatory mapping exercise, where the participants were asked to mark on a map the areas of the site they had visited on that specific visit. Through permission of the site owner (Bradford Council), 11 access points were used for surveying, where 10 out of the 11 had car parks.  Table S1 for full attributions. © contains Copernicus data (2019) and JNCC/NE/NRW/SNH/NIEA data, © copyright and database right 2019.
In this study, we specifically consider environmental and ecological awareness regarding nesting birds on Ilkley Moor, which experiences large numbers of visitors. During the bird breeding season, the implications of disturbance for outcomes of nesting attempts range across multiple impacts, including nest failure [24], impaired nestling growth [7], reduction in the areas suitable for breeding [25], and immunosuppression of fledglings [26], which all put pressure on individual birds and future recruitment into the local population [27]. The strength of these impacts can vary widely with species and type of disturbance; they may only have mild effects on fitness or cause total breeding season failure [28,29]. Hence, it is important for disturbance at this time to be kept to a minimum.
The BTO's Breeding Bird Survey [30] has recorded 70 bird species on Ilkley Moor in the last 10 years, declining to 63 in the last 5 years. The seven lost species include the chiffchaff (Phylloscopus collybita), coal tit (Periparus ater), collared dove (Streptopelia decaocto), and hobby (Falco subbuteo); two Red listed species, the grasshopper warbler (Locustella naevia) and tree sparrow (Passer montanus); and one Amber listed species, the house martin (Delichon urbicum) [31]. In the last five years, 16 Amber listed species have been recorded (see Table S2 for full lists of bird species that can be found on-site).

Independent Variables; Survey and Participatory Mapping
A total of 124 surveys were conducted in the summer of 2019 with visitors to the study site. The survey contained questions covering demographics and questions related to participants' experiences on-site. Here we focus on five environmental awareness questions (Table 1) structured in the five-point Likert format [32,33] and a participatory mapping exercise, where the participants were asked to mark on a map the areas of the site they had visited on that specific visit. Through permission of the site owner (Bradford Council), 11 access points were used for surveying, where 10 out of the 11 had car parks. Stratified sampling was used to identify a minimum of one access point per grid square by draping a 4 × 4 grid over an Ordnance Survey map of the site. Surveys took place between 09:00 and 17:00 BST, with people interviewed by a surveyor transcribing the responses to the questionnaire. The responses to the environmental questions (Table 1) were dichotomously recoded (with Likert ratings 1, 2, and 3 being "unaware" and 4 and 5 being "aware"). As a rating of 3 could be perceived as a neutral response, it was assumed this indicated a lack of awareness of the issue, rather than awareness. During the participatory mapping exercise, participants marked points, lines, or polygons with a pen on a map to indicate the areas they visited. Maps were digitised in ESRI ArcMap v10.6. Points and lines were converted to polygons using a 250 m buffer before all polygons for all participants were converted to a binary raster (25 m per pixel) to indicate visitation per participant. These were summed later by the allocated LCA group to produce visitation maps.

Secondary Variables; Independent Secondary Data
To upscale from the participatory survey data to a map covering the entire region, we used an RF algorithm with a set of climatic and physical factors as predictors. Bioclimatic variables were sourced from WorldClim version 2.1 [34], in the form of 19 variables derived from monthly rainfall and temperature  at the finest resolution available of 30 s as GeoTiff files. Variables were extracted as values from the centroid point per 25 m pixel across the site.
Land cover (such as forest) was used as the basis for the 25 m pixel size for this study and sourced from CEH LCM2019 [35] based on 21 land cover types at a 25 m by 25 m resolution raster map. Dummy binary variables were created for each pixel for the classes present over the site (deciduous woodland, coniferous woodland, improved grassland, acid grassland, heather, heather grassland, bog, freshwater, and suburban). Elevation was calculated from Ordnance Survey 50 m digital terrain model [36]. Proximity to paths and freshwater was calculated per pixel in meters in QGIS v3.16.0 [37] using OpenStreetMap data retrieved from Geofabrik.de [38].
Viewshed analysis (delineation of the area visible from any given location on a map) was calculated in QGIS v3.16.0 [37] using the "Advanced Viewshed Analysis" v1.4 plugin [39]. The site visibility index was calculated as a cumulative binary viewshed utilising only visibility points from within the site and the Ordnance Survey 50 m digital terrain model [36]. Visibility points were created at the centroid location of each 25 m pixel, with the default values of observer height set at 1.6 m and a 5 km radius used. This allowed consideration of whether on-site visibility of an area contributed to visitation by differently environmentally aware users.

Statistic Analysis
All statistical analysis was conducted in R (R Core Team, 2020). See Figure 2 for a flowchart of the study methodology.

Variable Selection
While RF models suffer less from collinearity than other traditional statistical methods, multiple predictors that are weakly correlated to the response, and strongly with each other, can cause unstable results, with it being unlikely that the averaging that occurs across all trees in an RF being able to overcome this [40,41]. We dealt with collinearity through several steps. Firstly, the Caret R package [42] was used to identify variables that were correlated at 0.75 or higher with each other, and the variable with the largest mean absolute correlation amongst all variables was removed. Secondly, variance inflation factors (VIFs) were calculated, and the variable with the highest VIF was removed until the cohort all had values under 10, resulting in fifteen predictors. Thirdly, the Boruta R package [43] was used to check for "variable importance" (the utilisation by the RF model to use a given variable to make accurate predictions; higher usage to make predictions implies a greater importance for the model). Boruta does this through iterative removal of variables found to be statistically less relevant than random probes using RF [43].

Latent Class Analysis
Latent class analysis (LCA) has been used in the study of environmental issues, including environmentally sustainable food choices [44], environmental concern of the UK population [45], and recreational demand based on attitudes towards water resources [46]. Here we use LCA to segment visitors into different environmental awareness groups. The poLCA R package [47] was used for the LCA analysis. Multiple classes were explored, with the lowest BIC dictating the class size of two (see Figure S1). LCA posterior probabilities were used to segment the surveyed individuals into the two classes ('aware' and 'ambiguous'). Classes were named on the resultant probability of binary response to any given environmental question.

Random Forest Modelling
Since its first introduction by Breiman in 2001 random forest (RF) modelling, a machine learning technique, has been used extensively in the literature, e.g., in using semantic information to classify urban buildings [48], forecasting power consumption using hybrid models including RF [49], and modelling travel mode choice behaviour [50]. Braun,

Variable Selection
While RF models suffer less from collinearity than other traditional statistical methods, multiple predictors that are weakly correlated to the response, and strongly with each other, can cause unstable results, with it being unlikely that the averaging that occurs across all trees in an RF being able to overcome this [40,41]. We dealt with collinearity through several steps. Firstly, the Caret R package [42] was used to identify variables that were correlated at 0.75 or higher with each other, and the variable with the largest mean absolute correlation amongst all variables was removed. Secondly, variance inflation factors (VIFs) were calculated, and the variable with the highest VIF was removed until the cohort all had values under 10, resulting in fifteen predictors. Thirdly, the Boruta R package [43] was used to check for "variable importance" (the utilisation by the RF model to use a given variable to make accurate predictions; higher usage to make predictions implies a greater importance for the model). Boruta does this through iterative removal of variables found to be statistically less relevant than random probes using RF [43].

Latent Class Analysis
Latent class analysis (LCA) has been used in the study of environmental issues, including environmentally sustainable food choices [44], environmental concern of the UK population [45], and recreational demand based on attitudes towards water resources [46]. Here we use LCA to segment visitors into different environmental awareness groups. The poLCA R package [47] was used for the LCA analysis. Multiple classes were explored, with the lowest BIC dictating the class size of two (see Figure S1). LCA posterior probabilities were used to segment the surveyed individuals into the two classes ('aware' and 'ambiguous'). Classes were named on the resultant probability of binary response to any given environmental question.

Random Forest Modelling
Since its first introduction by Breiman in 2001 random forest (RF) modelling, a machine learning technique, has been used extensively in the literature, e.g., in using semantic information to classify urban buildings [48], forecasting power consumption using hybrid models including RF [49], and modelling travel mode choice behaviour [50]. Braun, Cottrell, and Dierkes [51] used RF to investigate the effect of outdoor education programmes in school children across multiple countries. The predictive accuracy of RF lends the method to producing potentiality or susceptibility maps. Naghibi, Pourghasemi, and Dixon [52] used RF with boosted, classification, and regression trees to produce groundwater potential maps. Elsewhere, RF has been used for landslide susceptibility mapping [53][54][55], soil parent material and carbon mapping [56,57], and vegetation mapping and land cover classification [58][59][60]. In this study, we use RF to produce potential maps of visitation for different environmental awareness groups.
The ranger R package [61] was used to run separate RF models for 'all visitors' and both LCA classes, using the sum of the binary visitation maps per pixel as the response variable. Only pixels visited by at least one survey respondent were included. Each model was hyper-tuned to find the model with the best predictive accuracy running a grid search for the number of variables sampled from at each split (1-15), node size (3, 5, 7, and 9), and sample size (0.550, 0.632, 0.700, and 0.800), resulting in 240 variants for each model. All models were run with 500 trees and showed stabilisation of out-of-bag (OOB) error within this number (see Table S3). The models with the lowest OOB error were chosen.

Validation and Mapping
RF models are robust in providing out-of-bag estimation of error, though as an additional layer of validation, data for the RF models were randomly split into testing (20%) and training datasets (80%). Predicted values from the RF models were mapped at 25 m pixel resolution using ESRI ArcGIS Pro 2.7.0. Due to heavy-tail distribution of the data, an m-out-of-n bootstrap was used to discern statistical difference between the maps using the R package distillery [62] with Pearson correlation tests for a random set of 1000 subsampled points, bootstrapped over 1000 iterations with resampling.

Results
The LCA analysis resulted in two classes, as this presented the lowest AIC/BIC ( Figure  S1); these were named 'aware' and 'ambiguous' following the probability of the first group being likely to be aware across all questions, with the latter having more ambiguity in probability of awareness ( Figure 3). Posterior probabilities showed a split of 63.5% "aware" and 35.5% "ambiguous" amongst the surveyed visitors. Cottrell, and Dierkes [51] used RF to investigate the effect of outdoor education programmes in school children across multiple countries. The predictive accuracy of RF lends the method to producing potentiality or susceptibility maps. Naghibi, Pourghasemi, and Dixon [52] used RF with boosted, classification, and regression trees to produce groundwater potential maps. Elsewhere, RF has been used for landslide susceptibility mapping [53][54][55], soil parent material and carbon mapping [56,57], and vegetation mapping and land cover classification [58][59][60]. In this study, we use RF to produce potential maps of visitation for different environmental awareness groups. The ranger R package [61] was used to run separate RF models for 'all visitors' and both LCA classes, using the sum of the binary visitation maps per pixel as the response variable. Only pixels visited by at least one survey respondent were included. Each model was hyper-tuned to find the model with the best predictive accuracy running a grid search for the number of variables sampled from at each split (1-15), node size (3, 5, 7, and 9), and sample size (0.550, 0.632, 0.700, and 0.800), resulting in 240 variants for each model. All models were run with 500 trees and showed stabilisation of out-of-bag (OOB) error within this number (see Table S3). The models with the lowest OOB error were chosen.

Validation and Mapping
RF models are robust in providing out-of-bag estimation of error, though as an additional layer of validation, data for the RF models were randomly split into testing (20%) and training datasets (80%). Predicted values from the RF models were mapped at 25 m pixel resolution using ESRI ArcGIS Pro 2.7.0. Due to heavy-tail distribution of the data, an m-out-of-n bootstrap was used to discern statistical difference between the maps using the R package distillery [62] with Pearson correlation tests for a random set of 1000 subsampled points, bootstrapped over 1000 iterations with resampling.

Results
The LCA analysis resulted in two classes, as this presented the lowest AIC/BIC (Figure S1); these were named 'aware' and 'ambiguous' following the probability of the first group being likely to be aware across all questions, with the latter having more ambiguity in probability of awareness ( Figure 3). Posterior probabilities showed a split of 63.5% "aware" and 35.5% "ambiguous" amongst the surveyed visitors.  Demographic and behavioural responses in full can be found in Table 2. The RF model was performed for all visitors, and the 'aware' and 'ambiguous' groups were identified from the LCA. All three models had high accuracy (all visitors and 'ambiguous' groups had an OOB R 2 of 0.97, with 'aware' having an OOB R 2 of 0.96). OOB prediction error (MSE) was lowest for the 'ambiguous' group (0.36), followed by 'aware' (1.20) and the 'all visitors' group (2.56), with the error showing stabilisation within 500 RF trees (see Figure  S2 and Table S3). Root mean square error (RMSE) was calculated on the 20% holdback validation dataset for all models: 'all visitors' (1.61), 'aware' (1.10), and 'ambiguous' (0.60).
Variable importance was computed for all RF models; variables with higher importance levels contribute to explaining the outcome the most (Figure 4). Mean temperature in the driest quarter was the most important for all models. 'All visitors' and 'aware' visitors shared similar levels and rank of variable importance, with temperature seasonality being second most important, whereas this was moderately important for the other group. For the 'ambiguous' group the second most important variable was isothermality. Isothermality quantifies the extent to which day-to-night temperatures oscillate relative to summer-towinter annual oscillations [63]. Proximity to water, elevation, and viewshed were also moderately important for all groups. Heather grassland and proximity to paths were important to a lesser extent, and the remaining variables (deciduous woodland, coniferous woodland, suburban, improved grassland, freshwater, bog, and arid grassland) showed very low importance.  Mapped values showed distinct spatial patterns across all groups ( Figure 5). Potential visitation was predicted, at least for low levels, across the entirety of the site for the 'all visitors' and 'aware' groups, with activity concentrated in the north to north-east areas of the sites, where popular access points are situated, with north-south and east-west areas of higher visitation clearly visible. In the 'ambiguous' group, there are distinct areas where no visitation was predicted, with activity once again highest in the north-east of the site, though only clear north-south channels could be identified, rather than the east-west channels also visible in the 'aware' group map.  Mapped values showed distinct spatial patterns across all groups ( Figure 5). Potential visitation was predicted, at least for low levels, across the entirety of the site for the 'all visitors' and 'aware' groups, with activity concentrated in the north to north-east areas of the sites, where popular access points are situated, with north-south and east-west areas of higher visitation clearly visible. In the 'ambiguous' group, there are distinct areas where no visitation was predicted, with activity once again highest in the north-east of the site, though only clear north-south channels could be identified, rather than the east-west channels also visible in the 'aware' group map.

Discussion
Visitors to the protected area studied here can be segmented into 'aware' and 'ambiguous' in their on-site environmental awareness from survey data using LCA. Information provided by each group on their use of the site can be upscaled using RF models and mapped as shown in Figure 5. Over two-thirds of the visitors surveyed could be classed as 'aware', similarly to the study of Beh and Bruyere [11], where visitors were segmented by their motivation for site visitation, and most were found to have a high awareness of the environment. Our study spatially maps visitation patterns and provides a means to target 'ambiguous' groups.

Discussion
Visitors to the protected area studied here can be segmented into 'aware' and 'ambiguous' in their on-site environmental awareness from survey data using LCA. Information provided by each group on their use of the site can be upscaled using RF models and mapped as shown in Figure 5. Over two-thirds of the visitors surveyed could be classed as 'aware', similarly to the study of Beh and Bruyere [11], where visitors were segmented by their motivation for site visitation, and most were found to have a high awareness of the environment. Our study spatially maps visitation patterns and provides a means to target 'ambiguous' groups.
The LCA showed that awareness levels were higher in the 'aware' group, apart from awareness of site designations being similar for both groups. Surprisingly, awareness of site designations had the lowest probability of all the questions in the 'aware' group and the highest in the 'ambiguous' group ( Figure 3). This may infer that being generally 'aware' may not necessarily translate to being educated on all environmental aspects of a site; for example, Booth et al. [17] showed a variation of 8-43% in understanding what an SSSI was amongst site visitors. The LCA showed that awareness levels were higher in the 'aware' group, apart from awareness of site designations being similar for both groups. Surprisingly, awareness of site designations had the lowest probability of all the questions in the 'aware' group and the highest in the 'ambiguous' group ( Figure 3). This may infer that being generally 'aware' may not necessarily translate to being educated on all environmental aspects of a site; for example, Booth et al. [17] showed a variation of 8-43% in understanding what an SSSI was amongst site visitors.
Variable importance in the random forest models showed that land cover predictors were of lower importance than other factors. The 'aware' group had a high importance related to mean temperature and seasonality, which could be attributed to people potentially visiting the site when the weather is favourable. The 'ambiguous' group had a relatively low importance for all factors apart from mean temperature and isothermality. Outdoor sensory experiences attract all types of visitors [64], which helps to explain the importance of these factors; hence, when the weather has been constant, visitors could potentially feel more confident in outside pursuits.
A larger proportion of the members of the 'aware' group were over 50 years old. Older generations have grown up in the increased presence of nature, and it has been suggested that they have been sensitised by those experiences they had when they were younger [65], whereas technological advancements may have distanced younger generations from nature [66]. More 'aware' visitors identified as visiting daily. This is supported by Halpenny [15] and Maguire et al. [67], who suggest that a sense of place promotes pro-environmental behaviour and stewardship. We found that within the 'ambiguous' group most visited at a frequency of 2-3/week, but altogether visited more at the lower frequencies. Ballantyne et al. [64] found that infrequent visitors were more likely to be motivated by learning, and thus the more infrequent visitors in the ambiguous group could be more receptive to conservation education delivery. In the aware group, 31.25% travelled less than from one mile away, with 25% in the ambiguous group, yet the highest proportion for both groups was more than five miles away (40% compared to 43.18% within each group respectively). When nature is nearby, it has been suggested that visitation increases, which fosters place dependence [68]. Nevertheless, close proximity can be linked with convenience, being part of an individual's residential environment, which they regularly use [69]. Lower visitation from those nearer the site could have been attributed to the timing of the survey periods in this study, with locals visiting earlier or later in the day. Alcock et al. [70] found that frequency of visitation was lower in individuals living in green-space-abundant areas compared to those living in areas that lack green spaces, who were potentially compensating for the lack of nature, which may also explain the higher number of visitators from further away in this study.
Visitors from the 'ambiguous' group were found to venture less from the northern access points on-site, and then in a linear north-south pattern, as opposed to 'aware' visitors who used these channels and also adopted east-west patterns of movement. This may be due to 'aware' visitors having a larger proportion of daily visitors (indicating familiarity with the site) and included many dog walkers, who tend to spend more time on-site and therefore take longer and more exploratory routes [71]. Bootstrapped Pearson correlation revealed relatively high similarity between the 'aware' and 'ambiguous' potential visitation maps when comparing the entire site (correlation mean: 0.893), which highlights the overlap in visitation across the sites between both groups. Though as can be seen from Figure 5, specific areas, and differences in where visitors from both groups are more likely to visit, can readily be discerned.
By understanding which areas are most visited by different groups, site management can be implemented in association with the results of areas of conservation priority within the site, for example alongside bird survey maps to reduce potentially negative impacts on breeding birds. Kim and Weiler [14] recommend differing on-site communication strategies for visitors with low and high environmental awareness (in relation to fossil collecting). This can be taken further through differentiating strategies for different age cohorts. Personal delivery has been found to be more successful in educating and changing behaviours [23], and hence different events and inclusive educational activities can be targeted for different ages. Habitat management could be focussed on areas with fewer visitors and/or areas with better-informed visitors to attract sensitive species into areas where they are less likely to experience the negative impacts of disturbance. Signage has been shown to work in multiple conservation scenarios [72], though the style of such signs needs to be carefully considered for greatest impact in terms of "behaviour change" outcomes [8,73]. An understanding of where visitors are most concentrated, especially those that may require more conservation education, will allow more targeted education efforts. This could be in the form of increased signage or targeted personal delivery as demonstrated by Kidd et al. [23]. Weaver and Lawton [74] suggest that adhering to a completely biocentric approach that sees visitors as an inherent threat to protected areas can lead to suboptimal sustainability outcomes, whereas seeing visitors as an opportunity rather than a threat for the park can allow visitor mobilisation towards park enhancement, such as proenvironmental activities and citizen science (e.g., bird surveys). Therefore, conservation education combined with these activities could provide a multitude of benefits.
To further assess the robustness of this study, the behaviour of the 'ambiguous' visitors would need validation through replication of this study in other protected areas. Other sites may have site-specific characteristics, e.g., coastal sites, woodland, or different biomes, that will need to be assessed for relevant predictor variables, though the same approach elucidated in this study could be used. The variables chosen in this study were limited to bioclimatic and physical, and they could be expanded to include a wider range, both within and outside these categories. Utilisation for machine learning technologies requires specialised skills; thus, the techniques demonstrated in this study are reliant on access to these resources. However, the resources saved in other areas, e.g., wider-scale conservation education, may provide greater savings in the long term. This study used stated responses as part of the survey and participatory GIS exercise. This could be improved using revealed behaviour methodologies, such as voluntary GPS tracking using independent sensors and data loggers or mobile telephones, as seen in Wolf et al. [75]. Alternatively, the accuracy of the participatory mapping could be improved, for example, by studies using online participatory exercise which can capture additional information such as aesthetic appeal of visited areas, as demonstrated by Gosal et al. [76].

Conclusions
The methodology elucidated in this study can be readily applied to other areas for which suitable spatial data are available, allowing the development of highly site-specific maps of visitors with differing environmental awareness. Predictors used in this study, e.g., the digital terrain model, calculated viewshed analysis, and bioclimatic variables, are easily found for many regions of the world, albeit at varying resolutions. There is a paramount need for conservation resources to be channelled into the most effective management strategies, and this study demonstrates a method to spatially define those areas that attract visitors with lower environmental and ecological awareness so that on-site resources can be efficiently targeted to where they are needed the most.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/land10060560/s1, Table S1: Full attribution for bird images in Figure 1 under Creative Commons licenses; Table S2: Birds recorded on Ilkley Moor with Birds of Conservation Concern (BoCC) designations; Table S3: Parameters giving the best out-of-bag (OOB) root mean square error (RMSE) for all RF models using hyper-tuning; Figure S1: Latent class analysis (LCA) AIC and BIC for multiple group (class) sizes; Figure S2: Random forest tree out-of-bag (OOB) stabilisation for random forest models.   Table S1) and Kulvinder Kaur for photographs of Ilkley Moor used in Figure 1c. The authors are grateful to the visitors to the study site that participated in the surveys.

Conflicts of Interest:
The authors declare no conflict of interest.