Ecological Risks from Atmospheric Deposition of Nitrogen and Sulphur in Jack Pine forests of Northwestern Canada

: Chronic elevated nitrogen (N) deposition can have adverse effects on terrestrial ecosystems. For large areas of northern Canada distant from emissions sources, long-range atmospheric transport of N may impact plant species diversity, even at low deposition levels. The objective of this study was to establish plant species community thresholds for N deposition under multiple environmental gradients using gradient forest analysis. Plant species abundance data for 297 Jack pine ( Pinus banksiana Lamb.)-dominant forest plots across Alberta and Saskatchewan, Canada, were evaluated against 43 bioclimatic and deposition variables. Bioclimatic variables were overwhelmingly the most important drivers of community thresholds. Nonetheless, dry N oxide (DNO) and dry N dioxide deposition inferred a total deposited N (TDN) community threshold of 1.4–2.1 kg N ha − 1 yr − 1 . This range was predominantly associated with changes in several lichen species, including Cladina mitis , Vulpicida pinastri , Evernia mesomorpha and Lecanora circumborealis , some of which are known bioindicators of N deposition. A secondary DNO threshold appeared to be driving changes in several vascular species and was equivalent to 2.45–3.15 kg N ha − 1 yr − 1 on the TDN gradient. These results suggest that in low deposition ‘background’ regions a biodiversity-based empirical critical load of 1.4–3.15 kg N ha − 1 yr − 1 will protect lichen communities and other N-sensitive species in Jack pine forests across Northwestern Canada. Nitrogen deposition above the critical load may lead to adverse effects on plant species biodiversity within these forests.


Introduction
The intensification of anthropogenic activity during the last century has led to a rise in atmospheric emissions and the deposition of many pollutants including nitrogen (N), the effects of which have been studied across local, regional, and global scales [1][2][3]. In Canada, agriculture, transportation, and industry, including oil and gas and other utilities, are the dominant sources of N emissions, with transportation (via road, rail, air, and marine) and the oil and gas industry accounting for almost 70% of all emissions [4]. In eastern Canada, nitrogen oxide (NO × ) emissions were reduced by 26% from 1990 to 2017 due to international agreements and national efforts [4,5]; however, emissions in the west have remained high, with Alberta alone accounting for 27% of all national ammonia (NH 3 ) emissions and 36% of all NO × emissions in 2017 [4].
Increases in N emissions (both reduced and oxidized N) can affect terrestrial ecosystems and has been widely recognized as a key driver of changing plant species diversity [6][7][8]. Even at low levels, chronic N deposition can have adverse effects through direct foliar changes and increased susceptibility to secondary stressors. It can also lead to soil acidification, eutrophication, and shift entire ecosystems to a point of N saturation, favouring more Ntolerant species, and ultimately reducing biodiversity [1,3,[9][10][11][12]. Jack pine (Pinus banksiana Lamb.) coniferous forests are a predominant stand across almost 80% of boreal forests in Canada (preceded only by spruce and poplar stands, [13]). Moreover, Jack pine stands tend to have nutrient-poor, well-drained soils that range from coarse to fine sands or gravel [14] Nitrogen 2023, 4 103 and are generally accompanied by an N-sensitive understory composed of a variety of lichen, bryophytes, and dwarf shrubs [8,15].
Assessing the effects of N deposition on sensitive terrestrial environments is widely carried out using critical loads [16][17][18][19], which have been used to underpin emissions reductions policies. Critical loads can be established through a variety of methods: empirically using observational data; a steady-state mass balance approach that estimates the loss or accumulation of N inputs and outputs over the long term; or dynamic modelling that includes time-dependent processes [20,21]. Empirical critical loads of nutrient N (CL emp N) are based on environmental responses to N and infer a level below which no adverse effects occur [22][23][24].
In Europe, CL emp N have been published according to the European Nature Information System (EUNIS), which defines different ecosystems based on dominant tree species, soil hydrology, and management practices [23,24]. In a review of gradient and experimental N-addition studies, coniferous woodland forests (EUNIS, class T3) had a recommended range of 3-15 kg N ha −1 yr −1 , with exceedance defined by changes in soil processes, nutrient imbalances, altered composition mycorrhiza, and ground vegetation [24]. In many studies, CL emp N from the 2010 UNECE workshop [23] are extrapolated to similar ecosystems [20]; however, some European forests have been known to experience as much as 100 kg N ha −1 yr −1 , levels much greater than have historically been seen across large regions of Alberta and Saskatchewan [15]. A similar review in northern United States [20] recommended a range of 3-8 kg N ha −1 yr −1 for nutrient N deposition in hardwood and coniferous northern forests [20]. In Canada, recommended critical loads for coniferous forests have been set between 5-10 kg N ha −1 yr −1 , with exceedance marked by changes in ground cover and increases in foliar and soil N concentration [15]. Despite differences in published CL emp N for coniferous and mixedwood forests, changing ground species biodiversity has remained a constant indication of exceedance.
Thresholds for N deposition can be identified using several methods based on the data availability [22][23][24]. Long-term field additions and targeted N experiments have been used to inform CL emp N [25][26][27] but gradient studies have also been effective in quantifying CL emp N [24]. Studies have assessed CL emp N using survey data and applied general additive models (GAMs) to identify important community thresholds across a given habitat [28,29], while others have relied on regressions to establish terrestrial N deposition thresholds [30]. Regressions techniques have also been used to derive critical loads on a European scale through a vegetation-based model called Probability of Plant Species (PROPS), where plant species observational data were evaluated as a function of multiple abiotic environmental variables [31]. Gradient studies using Threshold Indicator Taxon Analysis (TITAN) have been used to evaluate changes in individual plant species abundance in European grasslands along an N gradient [32], and habitats in Ireland classified as high conservation importance (Annex I under the Habitats Directive), with changepoints inferring CL emp N but only when they included positive indicator species previously identified by the National Parks and Wildlife Service [33]. Similar ecological thresholds have been determined for vegetation communities (i.e., alliances) across the United States [19].
In 2010, a novel approach was developed to examine community thresholds that sought to address several deficiencies in existing methods. Gradient forest analysis [34], is an extension of random forest analysis [35], a method that fits multiple regression tree models between individual species abundances and environmental variables [36]. However, gradient forest analysis was developed to be a more flexible, non-linear, multivariate approach, addressing multi-species threshold responses and establishing where in the gradient range compositional changes occur [36][37][38]. In this respect, gradient forest analysis is like TITAN, but it also accommodates multiple environmental variables and identifies their importance with respect to changes in plant species composition [34,36,38].
The objective of this study was to evaluate whether N deposition is a significant driver of changing plant species composition across Jack pine-dominant forests in northwestern Canada. Plant species abundance data for 297 plots across Alberta and Saskatchewan were evaluated against 43 environmental variables using gradient forest analysis to determine the importance of N as a driver of community level changes. Furthermore, gradient forest analysis was used to identify where community thresholds occurred across the gradient of environmental variables and the species that were particularly vulnerable. These results were used to infer an CL emp N for Jack pine dominant forests.

Study Area
Study sites were selected across the province of Alberta and Saskatchewan, covering an area of approximately 270,000 km 2 . Study sites were predominantly located across the Boreal Plains and Boreal Shields ecozones, with additional sites located across the Taiga Shield ecozone. The Boreal Shield ecozone, stretching from Newfoundland to Alberta and covering roughly 1.8 million km 2 and 2/3 of the province of Alberta, is defined by long cold winters and short warm summers [39,40]. The Boreal Plains ecozone covers 650,000 km 2 , is representative of a moist climate, and is defined by cold winters and moderately warm summers [39,40]. The Taiga shield is characterized by short, cool summers and long cold winters, and covers approximately 1.3 million km 2 across Canada with some sections of northern Alberta and northern Saskatchewan falling into this ecozone [39,40]. Across both the Boreal Plains and the Boreal Shield, yearly precipitation averages around 400 mm, while the Taiga Shield, which has characteristically low yearly precipitation, ranges between 175 to 200 mm [39,40]. Jack pine is identified as a main conifer species across all three ecozones; however, stands are known to be stunted across the Taiga shield.

Data Sources
Plant species, including vascular, bryophyte, and lichen abundance data (recorded as percent cover) were obtained from three separate surveys with established semi-permanent and permanent monitoring plots across Alberta and Saskatchewan. Data from Alberta were obtained primarily from the Wood Buffalo Environmental Association (WBEA, [41]) while the Forest Ecosystem Classification Guide (SK-FEC, [42]) was used for most of the Saskatchewan data. In addition, the National Forest Inventory (NFI, [43]) provided data for 10 sites across Alberta and Saskatchewan. Initial WBEA data collection began in 1998 but over time, both sampling plots and methods were updated to accommodate both natural and anthropogenic changes to the landscape, including fires and construction [41]. The WBEA survey consisted of 25 Jack pine dominant, enhanced forest health network plots measuring 10 m by 40 m. Plant species and soil data were collected within small, medium, and large subplots across each site [41]. Across Saskatchewan, SK-FEC data were collected from 1700 semi-permanent, 10 m by 10 m plots and included species coverabundance data as well as soil and site characteristics [42]. Survey data collection for the NFI began in 2001 with subsequent re-sampling efforts to occur over 10-year periods. The NFI has 1115 permanent plots set in a grid system across the country, each measuring 20 km × 20 km and within each permanent plot, a ground plot measuring 10 m by 10 m was established to identify ground vegetation and estimate percent cover [44]. For additional plot information, see the Supplementary Materials SI.
Across all three surveys, sites were only selected for analysis if the canopy coverage was classified as Jack pine-dominant, which was determined by evaluating the percentage of Jack pine coverage against total canopy coverage within a site. A site was only considered for the study if total canopy coverage was 10% or more of the site and of that, Jack pine comprised 50% or more of the stand. In total, 297 sites were selected across Alberta and Saskatchewan (Figure 1). Although disturbance data were not consistent across surveys, sampling criteria ensured that the selected sites were not recently subjected to forest fire disturbance, with SK-FEC ensuring sites were at least 40 years post-fire, WBEA ensured approximately 60 years post-fire disturbance in their sampling criteria, and the NFI reporting a minimum of 30 years post-fire disturbance. ensured approximately 60 years post-fire disturbance in their sampling criteria, and the NFI reporting a minimum of 30 years post-fire disturbance.

Environmental Data
Modelled estimates of N and sulphur (S) deposition data (in eq ha −1 yr −1 ) were obtained from GEM-MACH (Global Environmental Multi-Scale Modelling Air Quality and Chemistry); data were obtained at a resolution of 2.5 km × 2.5 km and based on emissions inventories from 2013 and 2014 [45]. Nitrogen deposition data, which included total deposited nitrogen (TDN), accounted for 12 different forms of N (Table 1) while S deposition accounted for four different S species (Table 1) and included total deposited sulphur (TDS). In addition, 23 bioclimatic variables (predictors) based on long-term climate normals were included in the analysis, as was elevation, longitude, and latitude ( Table 2, [46]). In total, 43 different environmental variables were analyzed using gradient forests. Table 1. List of nitrogen and sulphur species (n = 17) and their associated summary statistics across all 297 study sites (see Figure 1). Data source: GEM-MACH [45].

Environmental Data
Modelled estimates of N and sulphur (S) deposition data (in eq ha −1 yr −1 ) were obtained from GEM-MACH (Global Environmental Multi-Scale Modelling Air Quality and Chemistry); data were obtained at a resolution of 2.5 km × 2.5 km and based on emissions inventories from 2013 and 2014 [45]. Nitrogen deposition data, which included total deposited nitrogen (TDN), accounted for 12 different forms of N (Table 1) while S deposition accounted for four different S species (Table 1) and included total deposited sulphur (TDS). In addition, 23 bioclimatic variables (predictors) based on long-term climate normals were included in the analysis, as was elevation, longitude, and latitude (Table 2, [46]). In total, 43 different environmental variables were analyzed using gradient forests. Table 1. List of nitrogen and sulphur species (n = 17) and their associated summary statistics across all 297 study sites (see Figure 1). Data source: GEM-MACH [45].

Statistical Analysis
Gradient forest analysis was performed in R studio using two packages: extended-Forest and gradientForest. In the extendedForest package, a forest of regression trees (or classification trees if the data is binary) is constructed. During construction, a random sample of observations (termed in-bag observations) is fitted for each tree and response data is split into two groups along a gradient, with each split forming two branches and eventually a full tree [36,38]. A forest is created by repeating the construction of a single tree many times over and is done to reduce the instability of a single tree [36]. The number of trees in a forest is a call that can be set in the gradientForest package.
A branch in a tree is created by partitioning data according to a certain split value (s) for a given predictor (p), where sites to the left of the split value s have a predictor value p, below or equal to s, while those to the right have a predictor value above the split value [34,36,38]. Partitions in the data are done to maximize homogeneity in groupings, with respect to the response variable (e.g., one species' abundance) and are repeated until only a minimum number of sites in the partition is reached, eventually forming a terminal node, where the predicted value is equivalent to the mean response of the node [34]. The split value s is selected so as to minimize the sum-of-squares deviation of the species abundance data (impurity) and in-turn maximizing the fit improvement (also known as reducing impurity). The fit improvement is a measure of how much the overall model variance has been explained by that partition and determines the importance of a split [34,36,47]. Each split can be interpreted as a response data threshold or changepoint, where a particular species (response data) may be present above this threshold but not below and vice versa for other species [34]. This process results in raw importance values that are then given to each predictor at a split value s in a particular tree for species f [36,38].
Observations not used in the construction of a tree are termed out-of-bag observations (OOB) and provide an estimate of the generalization error, the expected variance of the residuals for the new observation [34]. For each observation, the OOB prediction is the average of all predictions on all trees where the observation was OOB [34]. OOB predictions are used to cross validate the performance of a single tree, while the mean cross validated performance of all trees results in the goodness-of-fit measure [34,38]. The goodness-of-fit measure (also known as the predictive performance R 2 S values) produced in extended-Forest represent, for each indicator (species), the proportion of variance explained by the random forest model [34,48]. Predictive performance values are calculated by comparing the generalization error in the model with the variance of observations (in other words, comparing a model without a given predictor to the full model). While it is possible for this value to be 0 (where generalization error exceeds the observed variance), the end result is the associated variable is determined to have no predictive power [34,36,48].
Accuracy importance values are also defined in extendedForest and are a measure of prediction accuracy in the model. Accuracy importance values are produced by assessing the importance values of each variable and their respective contribution to the overall model's goodness-of-fit and are analogous to an increase in mean square error of OOB observations [48]. Accuracy importance informs on whether a variable has true predictive power or not and is measured by how much removing a variable will decrease (or increase) model accuracy. Relatively large accuracy importance values represent true predictive power for a variable, while small or negative values indicate that the environmental variable explain very little or nothing [34,48].
The 'extendedForest' package adopts similar functions from Breiman's random forest models [35] and while random forest is known to perform well with high-dimensional data [49], additional measures such as the use of conditional permutations further increase model effectiveness when processing a large quantity of predictors by attempting to reduce collinearity between them. Conditional permutations allow for the assessment of the effects on the model when a given predictor is not considered, and as such the comparison of the prediction accuracy before and after the permutation of a conditional variable can be used as an importance measure [36,49]. Unlike marginal permutations, conditional permutations are permuted within bins of data defined by tree splits for predictors that are correlated above a specific threshold value that can be set in the gradientForest call [36,38,47].
GradientForest will then combine data from extendedForest, produce additional measures, and provide graphical outputs for specific measures to help evaluate community thresholds. Overall R 2 -weighted importance values are one such measure provided by gra-dientForest. R 2 -weighted importance is a measure of conditional importance and classifies the importance of each environmental variable with respect to community composition [38]. This is produced when gradientForest partitions the predictive performance R 2 S values (from extendedForest) into contributions (R 2 SP ) from each predictor in proportion to the predictor importance. These contributions (R 2 SP ) are then averaged across all species to provide the R 2 -weighted importance values for each predictor [48]. The mean R 2 -weighted importance value is the average of all R 2 -weighted importance values and is a measure of how much species variation in the entire model was explained by all predictors [38]. Compositional turnover functions for each predictor are then produced when gradientForest distributes the predictive performance values (R 2 S , calculated in extendedForest) for all species attributed to a given predictor gradient (environmental variables, [34,48]).

GradientForest Output
The output from gradient forest analysis includes several important statistical values and plots that can be used to identify the response for both individual plant species and the community across environmental gradients. To evaluate individual species variance, gradientForest produces R 2 performance values for each environmental variable specific to each species (R 2 S ). The sum of all specific R 2 S importance values within each species provides the R 2 overall importance values, which is an indication of how well overall variance within each individual species was predicted by all environmental variables. Although values that are negative or zero are numerically possible for R 2 overall performance, they are not included for further threshold analysis [50].
In addition to information at the individual species level, gradientForest produces several graphical outputs that provide information relating to the overall community composition. A plot depicting accuracy importance and R 2 -weighted importance visually classifies the importance of all environmental variables in relation to overall changing community composition as well as the model predictive power for each predictor (accuracy importance). Additional plots, including the densities plot and cumulative importance plots, serve to identify potential community-level thresholds across predictor gradients, and which species were particularly responsive within a given range [34,36,38]. Results for both the densities plot and the cumulative community plot are then plotted as non-linear, smoothed curves as gradientForest re-expresses split improvements from the random forest and extendedForest components with respect to both their contribution to total variance explained by the predictors and the contribution of each species in quantifying compositional change proportional to the variance explained by the predictors [36].
A densities plot identifies regions across an environmental gradient of higher importance for plant species compositional change and is estimated as the ratio of density of split importance to the density of observed predictor values along the given predictor gradient [48]. The black curve represents the density of splits, the red curve represents the density of data, while the blue curve represents the ratio of split-over data densities and grey histogram bars (binned split importance values) represent both split locations and relative importance on the gradient [36,38]. Peaks in the ratio of densities curve (where the blue line is greater than 1) indicate threshold values for that environmental variable [36] where relatively large changes in community composition occur. The distribution shown in each densities plot integrates to variable importance [38]. In cumulative importance plots, the cumulative compositional change along each gradient (in R 2 units) is calculated by aggregating the normalized splits as cumulative distributions and are scaled by R 2weighted importance, standardized by the density of observations, and averaged across all species [36,48]. These plots display the same community thresholds seen in the densities plot but will display them in relation to all species data, identifying species that might be associated to large community shifts across the gradient [34,36,38].
Plant species abundance data for Jack pine dominant sites was evaluated against 43 environmental variables (Tables 1 and 2). Initially, abundance data for 357 sites across all three surveys was used in gradientForest, however results revealed that longitude and latitude were the top environmental predictors and predictor importance for all other variables was considerably lower. In this initial analysis, several important thresholds were identified, representing a possible geographic bias in the data, with 2 thresholds for latitude at 60 • and 54 • and a single threshold for longitude at −115 • . To increase uniformity, following previous studies [38,50] data were truncated to exclude any sites north of 60 • , west of −115 • and south of 54 • , and a second analysis was performed, leaving 297 sites (n SK = 271 and n AB = 26, Figure 1) across Alberta and Saskatchewan. Species data for those sites included percent cover abundance, and species occurring at fewer than 3 sites were considered rare and removed from analysis. Overall, plant species abundance data for 218 species (n lichen = 96, n bryophyte = 41, n vascular = 79 and n fungi = 2) was included in the gradient forest analysis. Given the inherent importance some geographic and bioclimatic variables play in species composition [38,51,52], the focus of gradient forest analysis was placed on variables that were both classified as having high importance with respect to changing plant species composition and that are influenced by human activity. For additional information, see Supplementary Materials SII and SIII.
Of the 218 species evaluated in gradient forest analysis, 96 species had a positive R 2 overall importance value, and of these, 13 species had a greater R 2 measure of fit of their associated random forest model (R 2 overall importance > 0.4, Figure 2, Table 3), meaning these species were particularly well predicted by the 43 environmental variables. Amongst the top-predicted species (R 2 > 0.4), recurring environmental variables with highest R 2 S performance values (Table 3) were: longitude, total precipitation for period 1 (sg_04), total precipitation for period 3 (sg_06), wet hydrogen sulfite (HSO 3 ), dry nitrogen oxide (DNO), TDS, and total precipitation for period 4 (sg_07).     Mean accuracy importance was greatest for wet hydrogen sulfite (HSO 3 : 39.53, Figure 3) but rapidly reduced for all variables after dry nitrogen dioxide (DNO 2 at 10.417, Figure 3). This suggests that variables with accuracy importance below DNO 2 have very little predictive power, indicating that model accuracy would be altered very little or not at all if any of these variables were removed. Individual R 2 -weighted importance and mean R 2 -weighted importance (R 2 = 0.00467, Table 4) were relatively low, indicating that the environmental variables explained a relatively small portion of the out-of-bag variation in the abundance data [48,53].  In addition to longitude, which remained the most important environmental variable (R 2 = 0.0161, Table 4), both latitude and elevation were also identified as important predictors ( Figure 3 and Table 4). The densities plot produced in gradient forest revealed that important splits (changes in abundance) occurring around -112 and around -106 ( Figure  4) appeared to be consistent with different sampling efforts outlined across the WBEA, SK-FEC, and NFI survey protocols. The longitude densities plot also revealed an important split between -109 and -110 (Figure 4), consistent with the provincial boundary of Alberta and Saskatchewan, again outlining the variations in sampling efforts per survey and province.  In addition to longitude, which remained the most important environmental variable (R 2 = 0.0161, Table 4), both latitude and elevation were also identified as important predictors ( Figure 3 and Table 4). The densities plot produced in gradient forest revealed that important splits (changes in abundance) occurring around −112 and around −106 ( Figure 4) appeared to be consistent with different sampling efforts outlined across the WBEA, SK-FEC, and NFI survey protocols. The longitude densities plot also revealed an important split between −109 and −110 (Figure 4), consistent with the provincial boundary of Alberta and Saskatchewan, again outlining the variations in sampling efforts per survey and province.
Six deposition variables (in weighting order: HSO 3 , DNO, TDS, DSO 2 , DNO 2 and DHNO, Figure 3 and Table 4) were among the top 15 variables with greatest R 2 -weighted importance. Among these variables, five were also identified as having high accuracy importance values, with only dry nitroxyl having a much lower comparative value (Figure 3), indicating very low predictive power. The remaining variables among the top 15 with greatest R 2 -weighted importance values were longitude, latitude, elevation and six bioclimatic variables. Only longitude, latitude and two bioclimatic variables, including Gdd above base temperature for period 2 (sg_09) and annual mean temperature (sg_12), had a relatively high associated accuracy importance value (Figure 3). Although these variables had high overall conditional importance (high values for both accuracy importance and R 2weighted importance), they are widely documented abiotic drivers for species distribution and were excluded from further analysis [51]. The remaining four bioclimatic variables had relatively low accuracy importance values, including total precipitation for period 1 (sg_04), precipitation of the driest quarter (bio_17), precipitation of the coldest quarter (bio_19), and annual minimum temperature (sg_13), and were excluded from further review as they were identified as having very little predictive power across the model (Figure 3).  Six deposition variables (in weighting order: HSO3, DNO, TDS, DSO2, DNO2 and DHNO, Figure 3 and Table 4) were among the top 15 variables with greatest R 2 -weighted importance. Among these variables, five were also identified as having high accuracy importance values, with only dry nitroxyl having a much lower comparative value ( Figure  3), indicating very low predictive power. The remaining variables among the top 15 with greatest R 2 -weighted importance values were longitude, latitude, elevation and six bioclimatic variables. Only longitude, latitude and two bioclimatic variables, including Gdd above base temperature for period 2 (sg_09) and annual mean temperature (sg_12), had a relatively high associated accuracy importance value (Figure 3). Although these variables had high overall conditional importance (high values for both accuracy importance and R 2 -weighted importance), they are widely documented abiotic drivers for species distribution and were excluded from further analysis [51]. The remaining four bioclimatic variables had relatively low accuracy importance values, including total precipitation for period 1 (sg_04), precipitation of the driest quarter (bio_17), precipitation of the coldest quarter (bio_19), and annual minimum temperature (sg_13), and were excluded from further review as they were identified as having very little predictive power across the model (Figure 3).
Wet hydrogen sulfite was an important environmental variable with respect to changes in community-level composition. The associated densities plot revealed two important community thresholds (where the ratio of densities > 1, Figure 5, left column), with the first threshold occurring between 70 and 80 eq ha −1 yr −1 and the second at 550 eq ha −1 yr −1 ( Figure 5). Both peaks were relatively small but the first was representative of a slightly more significant community threshold. In addition, the secondary peak not only had a small ratio of densities ( Figure 5), indicating that very little is occurring at the Wet hydrogen sulfite was an important environmental variable with respect to changes in community-level composition. The associated densities plot revealed two important community thresholds (where the ratio of densities > 1, Figure 5, left column), with the first threshold occurring between 70 and 80 eq ha −1 yr −1 and the second at 550 eq ha −1 yr −1 ( Figure 5). Both peaks were relatively small but the first was representative of a slightly more significant community threshold. In addition, the secondary peak not only had a small ratio of densities ( Figure 5), indicating that very little is occurring at the community level, but this peak was also located at the extremity of the gradient, representing a possible bias in the data [50]. The cumulative importance plots revealed several species experiencing smaller changes across the first range, including Stellaria longipes and Arctoparmelia centrifuga, but the species experiencing the most substantial change was Cladina mitis ( Figure 5). Across the wet hydrogen sulfite gradient, the specific R 2 performance value (R 2 S , Table 3) was greatest for Cladina mitis, indicating that wet hydrogen sulfite explained the greatest amount of variation within this species.
As the 8th most important predictor (R 2 -weighted importance) and high accuracy importance (Figure 3), TDS was also an important variable in the gradient forest model. The densities plot for TDS identified several community thresholds across the gradient, with the greatest ratio of densities peak (ratio of densities > 1) occurring between 150 and 200 eq ha −1 yr −1 (Figure 5, right column). Additional smaller peaks occurred around 650 eq ha −1 yr −1 and between 900-950 eq ha −1 yr −1 ( Figure 5). Similar to wet hydrogen sulfite, the secondary, smaller thresholds have low density of data (red line) and low density of splits, indicating that very little is occurring at the community level ( Figure 5). The cumulative importance plot for TDS identified Cladina mitis and Peltigera neopolydactyl as species experiencing the most change across this threshold ( Figure 5). Several species, including Bryoria glabra, Stellaria longipes and Dicranum flagellare experienced substantial changes on an individual species level but these did not contribute to any important community level thresholds ( Figure 5). However, across the full gradient, TDS explained the greatest amount of variation for Bryoria glabra (highest specific R 2 S performance value returned by gradient forest; Table 3). When wet hydrogen sulfite and TDS were compared, there was a high correlation factor found between both gradients (0.94), with the first threshold across the wet hydrogen sulfite gradient (70-80 eq ha −1 yr −1 ) correlating well with the 150-200 eq ha −1 yr −1 threshold across the TDS gradient.
Nitrogen 2023, 4, FOR PEER REVIEW 13 community level, but this peak was also located at the extremity of the gradient, representing a possible bias in the data [50]. The cumulative importance plots revealed several species experiencing smaller changes across the first range, including Stellaria longipes and Arctoparmelia centrifuga, but the species experiencing the most substantial change was Cladina mitis ( Figure 5). Across the wet hydrogen sulfite gradient, the specific R 2 performance value (R 2 S, Table 3) was greatest for Cladina mitis, indicating that wet hydrogen sulfite explained the greatest amount of variation within this species.  Dry nitrogen oxide and DNO 2 had relatively high R 2 -weighted importance and accuracy importance (4th and 13th most important variable, respectively, Figure 3). The associated densities plots revealed that a large proportion of species' abundance breakpoints (denoted by grey histogram bars) were relegated to low levels of deposition ( Figure 6, left column). The densities plot for DNO identified two thresholds, the first located between 1 and 2 eq ha −1 yr −1 and the second located between 3 and 4 eq ha −1 yr −1 , with greater binned split importance and a much greater ratio of densities associated to the first peak ( Figure 6). The densities plot for DNO 2 revealed a single threshold (where the ratio of densities was > 1 and binned split importance values peaked, Figure 6, right column) that ranged between 10 and 40 eq ha −1 yr −1 and peaked around 25 eq ha −1 yr −1 .
Nitrogen 2023, 4, FOR PEER REVIEW 15 importance and mean accuracy importance values were very low for TDN, as such TDN breakpoints might not represent a reliable community threshold. Instead, a TDN threshold was inferred by evaluating the correlation between N deposition variables. Variables with relatively high correlation coefficients were considered and a TDN threshold was estimated by evaluating the equivalent TDN value or range from the line of best fit and the corresponding gradient forest threshold value or range of the correlated variable.  Although these are quite low levels of deposition, the cumulative importance plots for DNO and DNO 2 revealed several reoccurring species that experienced substantial changes across identified thresholds including Cladina mitis, Bryoria glabra, Cladonia gracillis, Vulpicida pinastri, and Lecanora circumborealis. For DNO, individual species changes were greatest for Cladina mitis, Tuckermanopsis americana, and Evernia mesomorpha across the first, more pronounced community level threshold ( Figure 6). Species associated with the secondary threshold, including Melampyrum lineare, Bryoria glabra, and Vaccinium uliginosum, experienced large cumulative changes at the individual species level but did not result in a large community threshold ( Figure 6). It should be noted that across the entire DNO gradient, Cladina mitis had the greatest specific R 2 performance value (R 2 S , Table 3), indicating that DNO explained the most amount of variation across this individual species. The cumulative importance plot for DNO 2 revealed individual species changes that were more pronounced for Cladina mitis and Vulpicida pinastri at the peak of the ratio of densities curve ( Figure 6). Several other species experienced large cumulative importance changes across this gradient but did not contribute to an overall community level threshold. The greatest specific R 2 performance value across the DNO 2 gradient belonged to Parmeliopsis hyperopta, indicating that variance for the entire species was best explained by DNO 2 .
The densities plot for TDN revealed a high density of data between 0 and 200 eq ha −1 yr −1 (red lines, Figure 7). Binned split importance values also peaked in this range and the ratio of densities showed two minor thresholds in this region (around 80 eq ha −1 yr −1 and again around 150 eq ha −1 yr −1 , Figure 7), indicating that compositional change is occurring, but the resulting overall community-level change is minor. A more pronounced community threshold was present around 400 eq ha −1 yr −1 . However, both R 2 -weighted importance and mean accuracy importance values were very low for TDN, as such TDN breakpoints might not represent a reliable community threshold. Instead, a TDN threshold was inferred by evaluating the correlation between N deposition variables. Variables with relatively high correlation coefficients were considered and a TDN threshold was estimated by evaluating the equivalent TDN value or range from the line of best fit and the corresponding gradient forest threshold value or range of the correlated variable.  The correlation coefficients for DNO and DNO2 (both identified as important predictors in gradient forest) were 0.586 and 0.735, respectively. Community thresholds for DNO (between 1 and 2 eq ha −1 yr −1 ) and DNO2 (around 25 eq ha −1 yr −1 ) were evaluated and resulted in TDN equivalents around 100 eq ha −1 yr −1 and 150 eq ha −1 yr −1 , respectively. While DNH3 had the greatest correlation coefficient (0.904, Table 5) and relatively high mean accuracy importance, it was identified as having only intermediate importance (24th for variable importance, Figure 3). Therefore, TDN threshold values or ranges inferred from DNH3 were not considered as they might not be truly representative of community level changes. Based on results from gradient forest and correlation evaluations, it is rea- The correlation coefficients for DNO and DNO 2 (both identified as important predictors in gradient forest) were 0.586 and 0.735, respectively. Community thresholds for DNO (between 1 and 2 eq ha −1 yr −1 ) and DNO 2 (around 25 eq ha −1 yr −1 ) were evaluated and resulted in TDN equivalents around 100 eq ha −1 yr −1 and 150 eq ha −1 yr −1 , respectively. While DNH 3 had the greatest correlation coefficient (0.904, Table 5) and relatively high mean accuracy importance, it was identified as having only intermediate importance (24th for variable importance, Figure 3). Therefore, TDN threshold values or ranges inferred from DNH 3 were not considered as they might not be truly representative of community level changes. Based on results from gradient forest and correlation evaluations, it is reasonable to infer that community thresholds in the study area were between 100 to 150 eq ha −1 yr −1 for TDN, and between 150 and 200 eq ha −1 yr −1 for TDS.

Drivers of Community Thresholds
Overall, gradient forest analysis revealed that the variables with greatest predictor importance were predominantly bioclimatic variables (Figure 3). Given the large area covered by our study sites, and therefore the wide range in environmental variables, it would make sense that community thresholds seen in gradient forest were driven more by regional and climatic variables. This has been seen in other studies, where there was less of an effect of deposition on plant species composition when spatial scale was increased [51,52,54,55].
Gradient forest identified several deposition variables with high R 2 -weighted importance and accuracy importance, indicating that despite the broad spatial scale of our study, atmospheric deposition remained an important factor with respect to changing community composition. Similarly, in Switzerland, elevated N deposition has been shown to contribute, albeit minimally, to changes in plant species community composition across a large spatial scale [56]. Total sulphur deposition and wet hydrogen sulfite were identified as deposition variables with high predictor importance ( Figure 3). While wet hydrogen sulfite had the greatest predictor importance of all deposition variables, it also accounted for approximately 30% of TDS. Gradient forest returned community thresholds for wet hydrogen sulfite between 70-80 eq ha −1 yr −1 and around 200 eq ha −1 yr −1 for TDS (equivalent to approximately 3.2 kg S ha −1 yr −1 ). While TDS was an important predictor with respect to plant species community composition, only 4% of all 297 study sites received TDS levels greater than the estimated community threshold, indicating that TDS induced changes occurring at the plant species community level might be relatively localized across only a small portion of sites.
In contrast, 15% of all 297 sites received TDN greater than the estimated community threshold. While modelled DNO was not a large overall contributor to TDN, gradient forest results still identified DNO and DNO 2 , both oxidized forms of N, as important predictors with respect to plant species community composition. Several studies have shown that both individual plant species and entire ecosystems can have varying sensitivity to different forms of N deposition, documenting in some cases greater sensitivity to reduced N deposition [54,[57][58][59] and in other cases to oxidized N [60,61]. However, as seen in our study, DNO is known to contribute a relatively small amount to total dry deposition in North America [62] and the thresholds identified by gradient forest for DNO and DNO 2 independently are very low. Rather, the importance of DNO and DNO 2 seen in gradient forest could be the result of variability in the modelled deposition data. Although modelled deposition estimates were reasonable compared to actual deposition data, there was a low bias in N deposition estimates that required model correction [45]. While our results could be an ecosystem response indicating a preference to oxidized forms in Jack pine dominant forests, the interacting effects of deposition with other factors, including light availability or soil fertility [63] may also be partly responsible for the increased importance of oxidized over reduced N in our study area. Alternatively, it could also indicate that modelled TDN may not be fully reliable or that variables contributing to TDN, including DNO or DNO 2 , were better represented in the model.
Thresholds identified for the top deposition predictors were highly variable. Dry nitrogen dioxide had a single community threshold around 25 eq ha −1 yr −1 (Figure 6), and it represented almost 30% of TDN. In contrast, important community thresholds for DNO were identified at relatively low deposition levels (between 1-2 eq ha −1 yr −1 and 2.5-4 eq ha −1 yr −1 , Figure 6), and DNO only accounted for 1.3% of TDN. Although gradient forest analysis determined that at a regional scale, total modelled N deposition was a smaller contributor to changes in plant species community composition than either DNO or DNO 2 , these predictors were relatively well correlated to TDN (Table 5) and inferred a threshold range for TDN between 100-150 eq ha −1 yr −1 , equivalent to approximately 1.4 to 2.1 kg N ha −1 yr −1 .
Our study revealed a lower N threshold compared to some studies; however, these studies often used a variety of species metrics, including species richness or species diversity to assess community changes [30,56,64,65]. In other studies, changes in ecosystem function or mechanisms were used to derive empirical critical loads of N. For instance, across similar European woodlands and forested habitats, a range of 5-15 kg N ha −1 yr −1 was recommended based on several indicators of exceedance, including changes in soil processes, nutrient imbalances, and changes in species composition across ground vegetation [23]. In this same study, they also found that declines in lichen, specifically, were associated with N deposition as low as 5-10 kg ha −1 yr −1 [23]. More recently, there has been a notable shift in deriving critical loads with greater emphasis placed on observed changes across specific taxa or individual species rather than using broad species metrics. For lichen communities across northern hardwood and coniferous forests in the United States, a range of 1-9 kg N ha −1 yr −1 was recommended [20], while a critical load of 1.5 kg N ha −1 yr −1 was estimated specifically for lichen communities in the Northwestern US [66]. These biodiversity-based critical loads, specific to lichen communities, are similar to this study.

Species & Community Level Response
Species cumulative importance curves for N-related predictors with high importance (DNO and DNO 2 , Figure 6) revealed several similarities in species response. Species that were most highly associated with the only threshold across the DNO 2 gradient (25 eq ha −1 yr −1 , Figure 6), were predominantly lichen, including Cladina mitis, Vulpicida pinastri, Evernia mesomorpha, and Lecanora circumborealis ( Figure 6). These same lichen species were also identified as highly responsive across the primary threshold (1-2 eq ha −1 yr −1 ) along the DNO gradient ( Figure 6). In addition, all but Vulpicida pinastri were found to be particularly well predicted by all environmental variables used in the model (R 2 > 0.4, Figure 2). This is indicative that not only were DNO and DNO 2 important drivers of changepoints for these species, but that these changepoints were also significant with respect to community-level change. In addition to being among the most responsive species in the model, our study also found that for N-related predictors, several lichens, including Cladina mitis and Evernia mesomoprha experienced changepoints at relatively low deposition levels. Other studies evaluating species-specific responses have identified similar lichen as bioindicators to changing environmental conditions [20,67,68]. For instance, in Sweden, three years of fertilization (at 150 kg N ha −1 or more) resulted in a marked reduction in cover for Cladina species in boreal forests [69]. While in a separate study, the effects of a five-year liquid fertilization treatment found that high abundance lichen, particularly Cladina species, disappeared entirely from the bottom layer vegetation [70]. Although these studies were carried out in boreal forests across Europe, the habitats are analogous to Jack pine stands in the boreal forests of Canada [15] and previous studies and reviews have identified similar N sensitivities for lichen communities [68,71,72].
It has been well documented that lichen respond primarily to air concentrations for N [73], which could explain why both DNO and DNO 2 were among the top drivers of changes in lichen composition across the gradient forest model, out-performing TDN as a driver of species change. Increased lichen sensitivity to increased N emissions in aerosol form is predicated on their physiology. While vascular plants have specialized tissues that regulate the entry of gases, lichen do not, and instead will rapidly absorb gases, water, and dissolved nutrients, including increased N in the region [20,23,74]. Overall, increased atmospheric deposition can result in a shift in species composition when existing species in the area absorb more N than nutritionally required [73]. When they can no longer process any excess N, more N-tolerant, nitrophilous species begin to replace them, a well-documented response to increased N deposition [20,23,73,75,76].
The importance of lichen was well captured in gradient forest analysis with nine of 13 species that were particularly well predicted (by all 43 environmental variables) being lichen (R 2 > 0.4, Figure 2, Table 3). However, vascular species, which represented the remaining four of 13 species that were well predicted by gradient forest, were also relevant to the model (R 2 > 0.4, Figure 2). A closer look at the individual predictors responsible for change across top responding vascular species revealed that, apart from bioclimatic predictors, DNO was a recurring driver (Table 3). In addition, a secondary threshold across the DNO gradient (between 2.5-4 eq ha −1 yr −1 , Figure 6) revealed several vascular species that exhibited important changes in this range. In response to N deposition, changes in competitive relationships can alter vascular communities [77]. While fast-growing grasses, sedges, and herbs will often experience an increase in cover, slow-growing, ericaceous shrubs often experience a decline in cover [10,77,78]. Vascular plants experiencing significant change across the secondary threshold for DNO included Arctostaphylos uva-ursi, Vaccinium uliginosum, Vaccinium myrtilloides, Melampyrum lineare, and Elymus innovatus, a grass ( Figure 6). Several of these species are commonly found across the boreal forest and the response, particularly for Arctostaphylos uva-ursi or Vaccinium species, is consistent with findings from previous studies noting a decline in abundance in response to N addition [77][78][79][80]. In a study of heathlands across Britain, a negative relationship was observed between Arctostaphylos uva-ursi and modelled N deposition greater than 15 kg N [28]. However, in a study of boreal forests in Sweden, abundance of several Vaccinium species was associated to N deposition greater than 6 kg N ha −1 yr −1 [77]. These results are similar to our study, where the secondary DNO threshold, which correlated to approximately 2.45-3.15 kg N ha −1 yr −1 on the TDN gradient, revealed that change for many vascular species was occurring at the low end of (or lower than) the recommend critical load range of 3-15 kg N ha −1 yr −1 [24].

Uncertainties & Limitations
In our study, one of the more evident limitations was combining data from multiple surveys, as sampling efforts were not consistent among them. We attempted to reduce background variability by specifically selecting Jack pine dominant forests for analysis, however given the broad spatial scale in addition to data being compiled from observations by different surveyors across different surveys, we were not able to reduce variability indefinitely. Furthermore, the mean R 2 -weighted importance identified in gradient forest was relatively low (Table 4), indicating that the combination of all environmental variables explained a considerably small amount of overall variation. While this could be associated to uncertainties regarding how well the scale of our modelled deposition data were able to capture variation in the model, a low mean R 2 is not necessarily out of the ordinary for gradient forest analysis. In a review of previous studies using gradient forest methodology [47], overall explainable variation was often relatively low, but this was attributed to the selection of predictor variables used in each analysis. In this study, it is possible that including more defined site descriptors, including soil data, could increase the amount of total explained variation [50]. However, it has been recognized that environmental variables alone cannot explain all changes associated with plant species composition or community thresholds across a gradient and a truly comprehensive dataset would also include variables such as historical events, including fire history, species interactions, and even temporal variability [47].
Gradient forest does not indicate a direction associated with the identified community or species thresholds. As such, careful analysis of both community and independent species thresholds is often required, along with additional information relating to a given ecosystem or study area to determine whether a threshold is presently associated with ecological degradation or restoration [50]. In addition, because gradient forest relies on a minimum number of species occurrences across a survey to determine a community threshold, rare species can be overlooked. In this study, a minimum of three occurrences was required per species, but even increasing this to seven [50] may still result in overlooking rare species that might be ecologically relevant.
Despite these limitations, gradient forest proved to be beneficial for several reasons with respect to this dataset. Similar to TITAN, gradient forest has a multivariate analysis approach that can account for most species in a survey and assumes that each species could represent a possible indicator for one of the environmental variables [38]. In addition, gradient forest, which is predisposed to produce a stair-step-like response rather than a linear response, is adept at modelling abundance data, which can often be sparse and discontinuous across a large survey [38]. However, the greatest asset for gradient forest is its ability to account for multiple environmental variables at once and in turn produce community threshold information using a singular unit (R 2 ) that can be compared across all gradients [36,38,48,50].

Conclusions
In this study, community thresholds were predominantly associated with specific plant species that were well predicted by the gradient forest model and previously identified as indicators of N addition, including Cladina mitis, Evernia mesomoprha, and Arctostaphylos uva-ursi. Our analysis inferred two distinct community thresholds across the total nitrogen deposition gradient. The primary threshold ranged from 1.4 to 2.1 kg N ha −1 yr −1 and was predominantly associated with changes in lichen species. The secondary threshold, between 2.45 and 3.15 kg N ha −1 yr −1 , suggested changes occurring across vascular species that are known bioindicators of increased N deposition. Additional work emphasizing the species-specific response to N deposition, particularly for species such as Arctostaphylos uva-ursi and Vaccinium species in similar habitats, could be beneficial in further characterizing this relationship. Given the two thresholds, we recommend a biodiversity-based empirical critical load for nutrient N of 1.4 to 3.15 kg N ha −1 yr −1 across Jack pine forests in northwestern Canada.