Inﬂuence of Environmental Factors on Forest Understorey Species in Northern Mexico

: Background : Understorey plants are key to maintaining forest structure and functioning. They protect the soil, improve its structure and fertility, reduce water run-off and sustain the below-ground biota, amongst other ecological services. However, little is known about the environmental conditions that regulate the occurrence of these plants. This study focuses on determining how canopy cover inﬂuences the occurrence of understorey species and identifying the most important soil properties that affect these species. The study area was a pine-oak forest in the Sierra Madre Occidental, an important source of ecological services for northwestern Mexico. Methods : To assess the conditions inﬂuencing the presence of herbaceous and shrub species, 25 soil variables were examined in relation to the species occurring in forest gaps and under the canopy. Sampling was conducted in ﬁve plots, each of 100 × 100 m. In each plot, 4 subplots, each of 20 × 20 m, were each subdivided in a grid of 2 × 2 m units, in which the presence-absence of herbaceous and shrub species was recorded (2000 units in total). Soil samples were extracted for analysis from the central point in each subplot. Data were analyzed using a Binomial Logistic Model (BLM) and Random Forest (RF) classiﬁcation. Results : Understorey species were more strongly affected by soil variables than by their location in gaps or below canopy. The concentrations of Ca, P, K, Fe, Na, C, Zn, Mn, nitrates, organic matter, sand, silt, and percentage water saturation were statistically signiﬁcantly associated with the presence of some plant species, whilst no signiﬁcant differences were found in regard to preference for gaps or canopy, although several species were more frequent in open areas. Conclusions : Given the importance of the understorey cover in forest system functioning, we propose that understorey should be considered in integrated management and conservation practices for the temperate forests of northern Mexico.


Figure 1.
Location of the five 100 × 100 m (1 ha) sampling plots in the municipality of Madera, Chihuahua, Mexico. Source: Own elaboration using ArcGis software available at Esri, Digital-Globe, GeoEye, Earthstar Geographics, CNES/Airbus DS, USDA, USGS, AEX, Getmapping, Aerogrid, IGN, IGP, swisstopo, and the GIS User Community). Maps were created using ArcGIS® software by Esri. Copyright ©Esri. All rights reserved [45].  1. Location of the five 100 × 100 m (1 ha) sampling plots in the municipality of Madera, Chihuahua, Mexico. Source: Own elaboration using ArcGis software available at Esri, DigitalGlobe, GeoEye, Earthstar Geographics, CNES/Airbus DS, USDA, USGS, AEX, Getmapping, Aerogrid, IGN, IGP, swisstopo, and the GIS User Community). Maps were created using ArcGIS®software by Esri. Copyright ©Esri. All rights reserved [45]. Field sampling and data collection were carried out in two different periods. In August-September 2016, the soil samples were collected and the study sites were established [28], and in August-September 2018, the understory species were recorded in each plot. This time of the year was chosen since it is when most of the plants in the area bear reproductive structures, necessary for correct taxonomic identification. The presence of shrubs and herbaceous species was determined by direct observation, and those species not identified in situ were classified and recorded using digital photographs and collections. Samples and images were identified using a preliminary checklist and from taxonomic literature and databases. The records were included in the database of the CIIDIR herbarium in Durango, Mexico [46].
Five 100 × 100 m plots were established in five previously randomly selected forest communities in four "ejidos" and one "colonia", rural communities where the holders are entitled to share all the benefits [47]. Each plot included four 20 × 20 m subplots, each located in the centre of the quarter of the plot. To determine the variables influencing the occurrence of understorey plant species in the gaps and below the canopy, a grid of 100 units of 2 × 2 m was marked from the centre of each of the 20 subplots (2000 units in total) ( Figure 2). In each of these 2 × 2 m units the occurrence of herbaceous and shrub species was recorded, and the location of the species under the canopy or in a gap was also recorded. where: n = presence.

Determining the Presence of Understorey Species, Soil Variables and Canopy
To study the effect of soil properties on understory species, a sample of 1000 g was collected from the centre of each of the 20 subplots at 15 cm depth (four samples by plot); a total of 11 soil samples were obtained from the canopy cover and 9 from the gaps. In total, 25 soil variables were analyzed according to the methodology described by Dominguez-Guerrero et al. [48]: concentrations (ppm) of potassium (K), magnesium (Mg), sodium (Na), copper (Cu), iron (Fe), zinc (Zn), calcium (Ca), manganese (Mn) and phosphorus (P), concentration of nitrate (NO3, kg/ha), pH, cation exchange capacity (CEC, meq/100 g soil), organic matter (%) and texture (%), among others (Table S1). The soil did not differ significantly among sites across gaps or under the canopy. The texture ranged as follows: sand (35.42-63.42%), clay (9.3-23.3%) and silt (19.28-45.28%). The organic matter (OM) content ranged from 0.66-14.2% and the percentage water saturation from 29.5 to 80%.  To evaluate how the soil variables, forest canopy and gaps influence the occurrence of the understorey species, we selected the 22 most conspicuous species (those that were easily recognized in the field, in order to prevent mistakes in records) (see Table 2). The canopy coverage and gaps were delimited within each sampling plot by using ArcGIS version 10.0 [45].

Determining the Presence of Understorey Species, Soil Variables and Canopy
To study the effect of soil properties on understory species, a sample of 1000 g was collected from the centre of each of the 20 subplots at 15 cm depth (four samples by plot); a total of 11 soil samples were obtained from the canopy cover and 9 from the gaps. In total, 25 soil variables were analyzed according to the methodology described by Dominguez-Guerrero et al. [48]: concentrations (ppm) of potassium (K), magnesium (Mg), sodium (Na), copper (Cu), iron (Fe), zinc (Zn), calcium (Ca), manganese (Mn) and phosphorus (P), concentration of nitrate (NO 3 , kg/ha), pH, cation exchange capacity (CEC, meq/100 g soil), organic matter (%) and texture (%), among others (Table S1).

Data Analysis
The 22 plant species were recorded and categorized with values of "0" for absence and "1" for presence in each of the 2000 2 × 2 m units for correlation with the gaps and canopy. The relationship between species and soil variables was analyzed considering their frequency (which was directly obtained from the number of units/100 where each species was recorded). This grouping was established to avoid pseudoreplication, given that only one soil sample was obtained in each 20 × 20 m subplot. In order to detect collinearity between variables, the Spearman correlation coefficient (r) was determined using the "cor.test" function. When the r Value between two soil variables was greater than an absolute value of 0.7, one of the variables was excluded from the analysis in order to prevent collinearity. The variable with the lowest significance value obtained from the regression analysis was used for model fitting. The statistical significance in difference of median unit number with species presence per subplot between canopy and gaps was analyzed by the Kruskal-Wallis test (after Bonferroni correction α = 0.0025). In this case, only those 2 × 2 m units that represented a full gap or full canopy were considered (46.15% of the total 2000 units). However, for the subsequent analysis and modelling, all of the units were included (under canopy, partially under canopy and those in the gaps).
To determine the probability of association between the presence-absence of the understorey species with the edaphic variables and the canopy or gaps, we used a binomial logistic regression model (BLM) [49]. The accuracy of the BLM was determined by the Akaike Information Criterion (AIC) (within each species), residual deviation (RD) and null deviation (ND). The deviance is an important indicator for determining the goodness of fit of the models and is obtained by ANOVA, which compares the predicted and observed values, with greater precision in the model prediction capacity when the residual deviance is less than the null deviation [50]. Appropriate independent variables for the BLM were found using the stepAIC stepwise function in the "MASS" package, in which variables are added to create a reduced initial model that minimizes the AIC value of the models [51,52].
The probability of presence (PoP) of a given species with the studied variables was estimated with the BLM model [49], as follows: where a 1,2,3 are the model parameters, var 1,2,3 are the independent variables, and b is the model intercept.
In order to reduce the variation, we used the non-parametric random forest (RF) technique, with 5-fold cross-validation (CV), this term indicates that the 80% of the data are used by model and, the 20% of the data represent the model validation, which allows rapid, efficient identification of the most important variables for the model [53,54].
The relative variable importance was determined by the Partial Least Squares (PLS) based on the weighted sums of the absolute regression coefficients using the importance function in "Random Forest" packages. PLS is an analytical method applicable when the data include more predictions than observations, and it is used to estimate the importance of a predictor variable in a model regression [55][56][57]. In RF, the most commonly used method to analyze the variable importance is the mean decrease in Gini, where the importance value is expressed as a percentage; this corresponds to the mean variable total decrease in node impurity, weighted by the proportion of samples reaching that node in each decision tree in RF, i.e., normalization of importance values [49].
The RF model performance was estimated using Receiver Operating Characteristic (ROC) analysis, which indicates the quality of the model by the relationship between true and false positives obtained in the validation set, with specificity and sensitivity [53] and true skill statistics (TSS) [58]. The sensitivity measures the proportion of actual positives that are correctly identified, while the specificity measures the proportion of actual negatives that are correctly identified. TSS compares the number of correct forecasts taking into consideration both commission and omission errors and successes, as a result of random guessing, where +1 indicates perfect agreement between forecast and reality and values of zero or below indicates a performance no better than random (for more information, see Allouche et al. [58]).
The BLM and RF classification models were applied using the train, rf and glm (binomial family) functions of the "caret" package [59]. All analyses were performed with the free statistics application R [60].

Relationships between the Presence of Understorey Species and Soil Variables and Lack of Association with Forest Canopy and Gaps
Analysis of the occurrence of herbaceous and shrub species in the five sampling plots revealed that the species presence is influenced by 22 soil variables, but not by  (Table 2).
Soil variables strongly influenced (up to 36%) the presence-absence of understory species, as shown by the Spearman correlation (Table 3). The binomial logistic regression model indicated that some species were positively correlated, showing a weak but significant association (α = 0.0025) with some physical and chemical soil properties.  The BLM regression model also showed that some soil variables were statistically significant predictors for explaining the species presence under the canopy or in the gaps ( Table 4), given that the dependent variables yielded low p-Values. For the species not shown in Table 4, the model did not significantly predict occurrence.
Nevertheless, regarding which edaphic variables influenced the species presence, the model produced significant predictors (Tables 3 and 4). A positive correlation was found with the percent of water saturation (Sat) (r = 0.82; p ≤ 2.87 −6 ), while for some species the correlation was moderate or low, e.g., in Ratibida mexicana and Agastache pallida, Sat was a significant predictor with respectively moderate correlation, r. Ratibida mexicana and Stevia serrata yielded the highest number of significant predictors (Tables 3 and 4).
The probability of presence of each species was modelled with moderate accuracy by different physical and chemical soil properties. The relationships between selected soil traits and the probability of each species are shown in Figure 3. For Sand and Na, the fitted probability corresponds to the relationships between the two variables (blue line). Figure 3a shows a negative relationship between the probability and the predictor variables (percent of water saturation, (Sat)): as the predictor increases, the probability decreases; Figure 3c shows a positive relationship. The difference or reduction between output deviances (ND and RD) indicated that the model is useful for predicting species presence. The best model fits were found for Viola grahamii and Agastache pallida, indicated by the lowest p Values (Table 4).     Predictions for scarcely represented species, such as Chimaphila maculata, may be biased due to the small sample size.
The RF model indicated that the most important soil traits in relation to the probability of the species were Cu, Ca, K, Fe, Zn, Mg, P, CEC, OM, Clay, Silt, %K and %Mg, all of which had a large enough influence to appear in the RF models for different species (Figure 4, Table S2).    The RF models for Commelina dianthifolia, Acmispon wrightii and Ceanothus coeruleus performed best, according to moderate TSS, with values of 0.93, 0.73 and 0.73, respectively (Table 5, Figure 4b,c,f). A low error rate (OBB) indicates a strong classifier, related to the lowest values of the number of randomly chosen variables. For the best performance of the predictor, the sensitivity of the RF presence-absence models was generally high, but the specificity was low to moderate.

Relationships between the Presence of Understory Species and Soil Variables
The study findings revealed that several soil variables have a significant influence on understorey species. The relationships between soil characteristics and floristic composition, richness and distribution of understorey plants in temperate forests have been described in several studies [9,14,61,62]. In the present study, the occurrence of the understorey species appeared to be more closely related to the soil conditions than to the influence of canopy or gaps. A relationship between some edaphic components and properties (e.g., Mg, Ca, P, Zn, Cu, Mn, Fe, Mg, Na, K, Zn, %K, %Mg, OM, Sat, Sand, Silt, and CEC) and the presence of certain understorey species was detected by both models (BLM and RF) ( Tables 4 and 5; Figures 3 and 4).
Soil pH is often recognized as having a strong influence on vegetation (e.g., [63]; A. pungens and C. coeruleus are shrubs related to pH (Table S2). However, we did not find a significant association between pH and the occurrence of the other species, which may be explained by the scarce pH differences among the studied sites, all of which are moderately acidic (Table S1). A forest with slightly acidic to neutral soil (pH 6.2-7) in the Sierra Madre Oriental, northeastern Mexico, has a high richness of shrub species [64]; however, this is related to the floristic affinities with chaparral and xerophytic scrub from lower elevations, rather than to pH.
The nitrate content in the soil varied widely in the study area, ranging from 9.73 kg ha −1 to 68.99 kg ha −1 . Understorey plants are generally important in the nitrogen deposition process in forests [65]; the essential concentrations in forest ecosystems are realized by plants, incorporate available nitrogen released by microbial decomposition into carbon compounds that are difficult to decompose [66]. The presence of Lupinus diehlii, Agastache pallida, Arctostaphylos pungens, and Ceanothus coeruleus in the area appears to be related to nitrate content, according to the RF model. Species of Ceanothus are among the major actinorhizal plants in the world, fixing nitrogen through actinomycetes of the genus Frankia in their roots, as does Alnus [67,68]. Legumes (Fabaceae) are well-known nitrogen fixers that act through symbiotic bacteria in their root nodules; one species of Lupinus has been recorded as a particularly efficient fixer of atmospheric N [69]. Although no significant relation between the nitrate content and the presence of legumes was found in this study, the relationship with soil fertility is worth investigating in this region, given their abundance, e.g., Cologania obovata, C. angustifolia, Phaseolus pauciflorus, and Lupinus spp., among others.
The effect of forest management on soil nitrification and its relationship with forest species was not considered here; however, our findings are consistent with those of Dominguez-Guerrero et al. [48], who recorded an even broader range of soil nitrate content in a study determining the habitat of Picea chihuahuana across the Sierra Madre Occidental. Otherwise, nitrate concentrations have been attributed to air pollution and disturbances caused by forest harvesting in temperate forests in Europe and North America [70] and in Mexican mountain pine (Pinus hartwegii) forests [71].
The influence of factors such as livestock farming, logging and other human activities on the herbaceous community was not considered. However, visual evidence indicates that some species (e.g., Penstemon miniatus, Agastache pallida) occur in areas impacted by logging, while others (Ceanothus spp., Arctostaphylos pungens, and Ratibida mexicana) prosper in areas affected by other types of degradation, although with similar soil affinities (e.g., low organic matter and sandy loam soils). These species have been associated with deteriorated areas of temperate forests [40,72]. The impact of anthropogenic activities on forest species, decreasing or enhancing their presence and distribution, has been widely recorded [73][74][75], but it has not yet been explored in this region.
The study findings show that the occurrence of most understory species considered appears to be linked to soil properties (Tables 3 and 4). Previous findings also indicate that understorey species are governed by different factors other than those affecting the overstorey [9,19]. Understorey species richness, mainly determined by soil conditions, and canopy richness, which is correlated with climate, are attributable to differences in the life-form and life strategy of the plants in the upper and lower strata of forests [61].

Relationships between Presence of Understorey Species, Forest Canopy and Gaps
Most of the herb and shrub species considered were found both under the canopy and in exposed areas, and no significant differences were found in regard to their preference for one or another type of area. Nevertheless, frequency data indicate some trends (Tables 2 and 4, Figure 4). The lack of clear preferences of the understorey species for growing either in openings or under the canopy can be partly explained by the intrinsic biological attributes of some of the local species. For example, of those species with the highest occurrence values that develop both in gaps and below the canopy, two are generalists with a broad ecological and geographical distribution (Cologania obovata and Stevia serrata) [72], and one (Packera candidissima) is a regional endemic to the northern zone of the Sierra Madre Occidental [76], although abundant and with a high adaptation capability favoured by its colonial habit. The data presented here regarding the correlation with canopy and gaps are based on occurrences (presence/absence), and different results may be obtained when density or other quantitative data are considered.
Another explanation for the low proportion of plants restricted to growing either in gaps or under the canopy may be the relative openness of the forests from the northern Sierra Madre Occidental [40]. In this region, tree density is low, ranging from 383 to 1573 trees per hectare in the Colonia Nicolás Bravo 2 and the Ejido Socorro Rivera, respectively [28] and even in the latter case the canopy cannot be considered closed. Humidity, rather than light, is the main limiting factor in those areas, together with the effects of extreme drought and frosts, which are common in the zone [28]. The low density may also affect soil water availability, as direct radiation and high evapotranspiration affect the understorey composition [77], in contrast to what occurs in dense forests, where light is the main limiting factor and the cover and richness of herbs increase in the gap openings [78][79][80], favouring colonization; gap size also influences the presence of some species [81,82]. Species composition can increase around the tree canopy, favoured by environmental edaphic variables and the availability of water in the soil [81][82][83], as also occurs for regeneration in the study area [28]. Soil conditions influence the herbaceous layer under the forest canopy [79,84,85] and, as already mentioned, the soil properties and the broad ecological adaptation of many species in the study area may have a stronger influence on the local distribution of understorey plants.
Although canopy provides some protection in the study area, it is not as essential as in denser forests. Shelter from the canopy is helpful to conserve humidity for water claiming species, as recorded by Tang et al. [86], or to mitigate extremely high temperatures [87], but does not explain the preference of the few species that appeared most frequently under the canopy in the present study. Humidity is not a particularly limiting factor for the most common, well-adapted herbaceous species in the region, as most precipitation falls during a short season (Southwest monsoon or Mexican monsoon) [88], during which heavy rains provide plants with more water than needed. During the dry, cold season, most of the plants are dormant and the canopy shade/protection is therefore not as necessary as in drier areas. These hypotheses remain to be explored for the understorey species in the region.
Nevertheless, occurrence data indicate that canopy may be an important, favourable factor for a few of the studied species. Of the 12 most frequent species, five grow almost equally in gaps and below the canopy, five show a preference for gaps and two were more frequent under the canopy. Penstemon miniatus is found under P. engelmannii, P. arizonica and P. leiophylla, mainly occupying areas with low organic matter and sandy soils. These findings are consistent with those of Villers et al. [89], who recorded Penstemon sp. in closed and semi-closed canopy of a Pinus forest, although other species of Penstemon prefer open areas. Among the shrubs, Ceanothus depressus was more frequent under the canopy of Quercus spp., Pinus arizonica and P. strobiformis. Ceanothus is a widely distributed genus in North America, being common in pine and oak forests [90]. Ceanothus depressus is endemic to the Sierra Madre Occidental, and its preference for shaded sites may be associated with the aforementioned symbiosis with actinomycetes and nitrogen fixation [67,68]. The relationship between the abundant C. depressus with the nitrogen deposition and the fertility of the forest soils would be worth exploring in the study region.
Several of the understorey species in the area are heliophytes, which can colonize open areas due to their fast-growth rates and good dispersal capacity [91][92][93][94]. The more frequent species in open areas include Agastache pallida, Commelina dianthifolia, Cyperus sphaerolepis, Geranium wislizeni, Pseudognaphalium arizonicum, Ratibida mexicana, Milla biflora and Arctostaphylos pungens ( Table 2). All of these species belong to very well represented and widely distributed taxonomic groups in Mexico [91][92][93][94], and they can therefore be considered well adapted plants capable of growing under different conditions.
The understorey of temperate forests contains the highest floristic diversity in Mexico (about 7000 species, including trees) [36]. This, along with the role of the understorey cover in maintaining forest structure and function, highlights the value of the understorey and the importance of considering it in forest management plans.

Conclusions
In the study area, understorey species are more strongly influenced by physical and chemical soil properties than by their location under canopy or in gaps. The low correlation between species occurrence and the forest canopy or gaps can be partly explained by the relatively open canopy in the forests under study. The results highlight the usefulness of soil variables for local scale modelling, in order to determine which edaphic elements influence the presence of understorey species. This study contributes to a better understanding of the ecological factors that regulate the composition and local distribution of understorey plants in the Sierra Madre Occidental. The findings may also serve as a basis for further research on combining environmental and management data with species composition. Many other aspects of the forest understorey remain to be explored in the study region, e.g., the relationship between nitrogen fixing plants and soil fertility, the influence of grazing and the importance of understorey species in soil retention. Also, a better knowledge of how human activities influence the forest would provide useful information to enhance its management. The application of other models, such as Generalized Linear Mixed Model (GLMM) may help to understand better the complex relationships of the species and the environment. Such information could be used to improve environmentally sound forest management practices.