Generalized Models: an Application to Identify Environmental Variables That Significantly Affect the Abundance of Three Tree Species

In defining the environmental preferences of plant species, statistical models are part of the essential tools in the field of modern ecology. However, conventional linear models require compliance with some parametric assumptions and if these requirements are not met, imply a serious limitation of the applied model. In this study, the effectiveness of linear and nonlinear generalized models was examined to identify the unitary effect of the principal environmental variables on the abundance of three tree species growing in the natural temperate forests of Oaxaca, Mexico. The covariates that showed a significant effect on the distribution of tree species were the maximum and minimum temperatures and the precipitation during specific periods. Results suggest that the generalized models, particularly smoothed models, were able to detect the increase or decrease of the abundance against changes in an environmental variable; they also revealed the inflection of the regression. In addition, these models allow partial characterization of the realized niche of a given species according to some specific variables, regardless of the type of relationship.


Introduction
In order to identify potential geographic or environmental space, where organisms can be established and successfully developed, researchers have used different modeling methods.Among them are correlative-type models, which aim to relate the presence or abundance of a species with different predictor variables via a mathematical function [1].The multiple linear model has been one of the most preferred analysis techniques to explain the stochastic relationship between descriptive variables and distribution of species [2], often providing satisfactory results [3].However, it is also common that data violate some parametric assumptions.For example, in the field of forestry and forest ecology, predictors usually have high interdependence; moreover, the data do not follow a theoretical normal distribution and there is heteroscedasticity.Sometimes there is a linear relationship between a variable of interest and some covariates, while with the remaining covariates, a curvilinear relationship may exist [4,5].
Several techniques have been explored as options to deal with those dilemmas in ecology, such as Chi-squared Automatic Interaction Detector (CHAID) [6], Machine Learning Techniques (MLTs) [7] or the generalized additive models (GAMs) proposed by Hastie and Tibshirani [8].GAMs have become one of the most used techniques, in which non-linearity and non-parametric regression are incorporated [1].These types of model are an extension of Generalized Linear Models (GLMs), constructed by the sum of smooth functions of predictor variables, where it is common to use defined polynomials forn intervals known as "splines" [8][9][10].For this reason, a function of this type loses its purely parametric nature and becomes a semi-parametric or non-parametric model [11].In addition, GAMs can be applied without the compliance of independent regressors or a specific normal distribution shape of the sample [10].Furthermore, with a GAM, the algorithms enable the introduction of any distributions (e.g., Binomial, Poisson or Gamma) that enable the researcher to identify and select the best representation of the data, converting the GAM technique into a viable alternative for forestry and ecology.
GLM and GAM are common tools in ecology to model the response of living organisms to environmental factors using discrete or continuous variables [12][13][14].In essence, these models do not have substantial discrepancies between them, sometimes a GLM is more accurate, and in others, a GAM is preferable.When a GAM is better, this is often attributed to its semi-parametric structure [3,15].
Many studies have been carried out to understand the relationship and interactions between environmental variables and the presence of a species in a given locality [15,16], to model habitat distribution [1] or to evaluate the effects of environmental variables [17].However, in forestry, our knowledge of the relationship between the abundance of forest species and their habitat conditions is scarce, but is essential when we want to find efficient ways to reforest or restore specific areas.
The main purposes of this study were: (i) to find variables that better explain the behavior of the abundance of three tree species that grow naturally in the temperate forests of Oaxaca, Mexico, using 18 environmental variables; and (ii) to identify which type of model (linear or nonlinear) best describes the relationship between the abundance of each species and the analysed covariates.For addressing these objectives, generalized lineal models (GLMs) and nonlinear generalized additive models (GAMs) were tested.

Study Area
The study area was the forest of Santiago Comaltepec, which is a portion of the "Sierra Juárez" in the Oaxaca region, in southwestern Mexico, located at 17 • 33 35 north latitude and 99 • 26 32 west longitude, covering an area of 26.5 km 2 (Figure 1).The elevation above sea level goes from 1800 to 3000 m.The annual average maximum temperature is 13.4 • C; the annual average minimum is 4.7 • C; the summer rainfall from June to September ranges between 600 mm and 1200 mm, while the dry season lasts from December to May with an average precipitation of 225.5 mm and an average temperature of 16.7 • C [18,19].
constructed by the sum of smooth functions of predictor variables, where it is common to use defined polynomials forn intervals known as "splines" [8][9][10].For this reason, a function of this type loses its purely parametric nature and becomes a semi-parametric or non-parametric model [11].In addition, GAMs can be applied without the compliance of independent regressors or a specific normal distribution shape of the sample [10].Furthermore, with a GAM, the algorithms enable the introduction of any distributions (e.g., Binomial, Poisson or Gamma) that enable the researcher to identify and select the best representation of the data, converting the GAM technique into a viable alternative for forestry and ecology.
GLM and GAM are common tools in ecology to model the response of living organisms to environmental factors using discrete or continuous variables [12][13][14].In essence, these models do not have substantial discrepancies between them, sometimes a GLM is more accurate, and in others, a GAM is preferable.When a GAM is better, this is often attributed to its semi-parametric structure [3,15].
Many studies have been carried out to understand the relationship and interactions between environmental variables and the presence of a species in a given locality [15,16], to model habitat distribution [1] or to evaluate the effects of environmental variables [17].However, in forestry, our knowledge of the relationship between the abundance of forest species and their habitat conditions is scarce, but is essential when we want to find efficient ways to reforest or restore specific areas.
The main purposes of this study were: (i) to find variables that better explain the behavior of the abundance of three tree species that grow naturally in the temperate forests of Oaxaca, Mexico, using 18 environmental variables; and (ii) to identify which type of model (linear or nonlinear) best describes the relationship between the abundance of each species and the analysed covariates.For addressing these objectives, generalized lineal models (GLMs) and nonlinear generalized additive models (GAMs) were tested.

Study Area
The study area was the forest of Santiago Comaltepec, which is a portion of the "Sierra Juárez" in the Oaxaca region, in southwestern Mexico, located at 17° 33′ 35″ north latitude and 99° 26′ 32″ west longitude, covering an area of 26.5 km 2 (Figure 1).The elevation above sea level goes from 1800 to 3000 m.The annual average maximum temperature is 13.4 °C; the annual average minimum is 4.7 °C; the summer rainfall from June to September ranges between 600 mm and 1200 mm, while the dry season lasts from December to May with an average precipitation of 225.5 mm and an average temperature of 16.7 °C [18,19].According to direct observations from field data and database information that originated from the last forest inventory (2014-2015), there are different types of vegetation in the study area.Oak forest is the main vegetation, followed by tropical deciduous forest and cloud forest.

Variables and Sample
The frequency and density of a species are common indicators of its abundance [20].This work refers to density, which is defined as the number of trees of a species per sampling plot.We counted only trees with diameter greater than or equal to 7.5 cm at 1.3 m above the ground level.Data were captured from a total of 634 circular 1000 m 2 plots.These plots were established by a local forestry and technical services office (Union Zapoteca Chinanteca-UZACHI).The plots and the registration of data were developed following the method of the National Forestry Commission, concerning the planning of the temperate forest in Mexico [21].
The species selected for this study, were: Pinus pseudostrobus Lindl.var.apulcensis (Lindl.)Shaw, Pinus patula Schl.et Cham.and Quercus macdougallii Martínez.The first is characterized by its fast growth and is commonly used to reforest degraded areas or forest sites without vegetation; the second is in high demand by sawmills, furniture factories, and pulp and paper industries [22].Quercus macdougallii, besides fulfilling ecological functions, is a rare species in the study area; it was found in just 33 out of the 634 plots; moreover, it is a vulnerable species, included in the red list of the International Union for Conservation of Nature (IUCN).
For this study, 18 climatic and physiographic variables were considered, given that in several studies, it has been shown that they affect, in different ways and scales, the geographical distribution of species [23][24][25][26].Table 1 shows the variable acronyms, their meanings and some descriptive statistics.The climatic variables for each site were obtained from the server of the Forest Service, Department of Agriculture of the United States (see Algorithms).These data were computed using methods proposed by Rehfeldt et al. [27] with climate data from 1976 to 1990, originating from about 4000 weather stations in Mexico, the southern part of the United States of America, Guatemala, Belize and Cuba [27][28][29].The physiographic variables and elevation above sea level were directly obtained in the field (Table 1).
In developing this study, we did not find quantifiable information useful to evaluate the effect of human activities on the abundance of species in the study area; therefore, we did not analyse this factor, but it is worth to say that, in Santiago Comaltepec, the area under exploitation is incorporated into a certified Management Plan and that the current forest management schemes focus on sustainable development which takes into account biodiversity conservation [30].

Data Analysis
In order to identify the variables that significantly affect the abundance of the species and to detect the type of relationship (linear or nonlinear), parametric models (GLM) and non-parametric models (GAM) were tested.These two model types were chosen because they have demonstrated satisfactory results for these types of studies [15,31,32] in some cases, surpass other analysis techniques [3,33].
A total of 18 explanatory variables were used that, in previous studies, have demonstrated significant effects on forest species (Table 1).An exploratory data analysis using Pearson´s linear correlation coefficients detected a high correlation between the covariates, mainly among temperature and precipitation variables.For this reason, the modeling was conducted using two approaches: (i) taking into account collinearity, to reduce as much as possible the spatial autocorrelation [34]; and (ii) in a scenario with an absence of collinearity.
In the first case, the variables that were independent of each other, but at the same time were significantly correlated with the abundance of the species, were filtered.In this way, the chosen variables included: one precipitation variable, an index whose estimate includes temperature and precipitation and two physiographic variables, they were: mean precipitation in the growing season (GSP), annual aridity index (AI), average slope (AST) and dominant aspect (ASP).In the other approach of the analysis, we intended to adjust non-parametric models (GAMs) assuming a scenario with an absence of collinearity, initially including the 18 variables.They were then eliminated one by one, leaving in the models only the variables that showed a significant contribution (p < 0.01) to explain the global deviance of each model (DE) and, therfore, that allowed reduced mean squared error of prediction (SME).For this purpose, simple models (including just one variable) and multiple models (including several variables at a time) were tested.In the multiple type models, ASP and AST were always included because these variables can change drastically in short distances within the study area.
Measures of the prediction error (SME) in both GLM and GAM were obtained by cross validation using for this purpose the cv.glm function in the "boot" R package for GLM [35] and "CVgam" function in the "gamclass" R package for GAM models [36].
In addition to SME, other indicators such as the values of global deviance statistic (DE), which express the percentage of variance explained by the model, the Akaike Information Criterion (AIC) and the analysis of residual values by quantile-quantile plot (Q-Q-plot), were also used to support the filtering of variables and choose the degrees of freedom in the GAM models.
Mathematical expressions, as well as the theoretical framework about GLM, can be consulted in Nelder [9], McCullagh and Nelder [12], McCulloch [37] among others.A general linear model (GLM) has a random part (linear predictor), a systematic component and a link function that specifies the links between the variable of interest and the systematic part of the model [9].
Any coefficient of an explanatory variable was considered statistically significant at 5%, if its p-value was less than 0.0028 (after the Bonferroni correction) [38].
A GAM is constructed by the sum of smoothed functions of the predictor variables, which can identify the types of effects and nonlinear relationships between variables.For this purpose, it is common to use polynomials defined based on intervals known as splines [8,10].
Here, we reproduce the general expression of a GAM according to Wood [10]: where µ i ≡ E(Y i ) and Y i ∼ some exponential family distribution.Y i is a variable of interest, X * i is a row of the model matrix for any strictly parametric model components, θ is the corresponding parameter vector, and the f i are smooth functions of the covariates.
GLM and GAM allow the inclusion of a specific theoretical distribution in their initial structure.In this study, the Poisson distribution was integrated in the functions since they are count data and only have positive integer values [39,40].
Due to the flexibility of GAMs, there is a risk of over-fitting of these models [41].In order to integrate the optimal smoothing parameters for Pinus patula and Pinus pseudostrobus, Generalized Cross Validation (GCV) described by Craven and Wahba [42] was used employing the "mgcv" package in R [43], while for Quercus macdougallii, degrees of freedom were manually estimated by trial and error, which was feasible since there were only 33 observations.In the case of parametric models, a reference threshold of α = 0.01 (after the Bonferroni correction) was used to test the evidence against the null hypothesis of no significant contribution of each covariate.

Results
The coefficients associated with each environmental variable whose contribution was significant for the parametric models and the degrees of freedom for non-parametric models are shown in Table 2.The reader may appreciate the magnitude of the evidence against the null hypothesis (no significant contribution of a variable) by observing the p-values associated with each covariate.P-values with an asterisk were significant (p < 0.01, after a Bonferroni correction), using an initial p = 0.05.In spite of the fact that most p-values were less than 0.01, adjustment indicators suggested a poor linear relationship (GLM) between the abundance of trees and the predictors compared.Including four variables, the global variances (DE) explained by the linear models were just 8.8% for Quercus macdougallii, 14.08% for Pinus patula and 11.45% for P. pseudostrobus, which are clearly lower than the percentages explained by the GAMs, although, the cross-validation values of the prediction error (SME) and the AIC values resulted similarly in both model types, except for Q. macdougallii (Table 2).Comparing the adjustment among species, Q. macdougallii showed the lowest values of AIC in both model types, meaning that the models' adjustment for this species was better than for the other two.The square mean error value was also remarkably lower in both Q. macdougallii models (Table 2).
AST and ASP in the GAM were significant for Q. macdougallii, but were not significant in the corresponding GLM, suggesting that there could be a relationship other than linear between these covariates and the abundance of this species.
GLMs suggested that precipitation from April to September and the annual aridity index, linearly explain the variability of species abundance since their corrected p-values were significant (p < 0.01) in the three species.However, due to a poor fit of all the models (see the SME, DE and AIC values), it is not possible to make a robust prediction of abundance by a parametric model using only these predictors.
The simple GAMs detected the variables that showed a greater effect on the abundance of each species by observing the value of individual deviance (DE) of each variable (Table 3).The three variables that separately explain the higher percentage of variability of the abundance of Quercus macdougallii were MTCM, MTWM and SPRP.For Pinus patula FFP, SMRSPRPB and ELEV showed high values of deviance, while MMAX, GSP, WINP and MTWM showed the largest percentage of variability of Pinus pseudostrobus.
The results for multiple models revealed another perspective; in this case, all variables included for Pinus patula and Pinus pseudostrobus were significant while AST, ASP, ELEV and FFP were significant in Quercus macdougallii GAM model (Table 3).
The resulting polynomial functions for a GAM include a constant term in its first component (Intercept), for example, for Pinus patula, (in a simple form) would be as follows: where Y is the abundance of the species.Figure 2a,b show the results of the non-parametric GAM using the Poisson distribution.These results reveal two behaviors: on one hand, the type of relationship between the variables studied and on the other, the behavior of the abundance of each species at different gradients of the covariates.Mostly nonlinear relationships between abundance of species and the covariates were identified.In some cases they were clearer, such as in Pinus pseudostrobus against the elevation above sea level (Figure 2a) or Pinus patula in relation to the precipitation from April to September (Figure 2c).In addition, signs of linear effects were observed for some variables, such as the slope and the exposure, which influenced the abundance of both pine species, and still more evident was the effect of ASP, AST, ELEV or FFP on the abundance of Pinus patula.
Figure 2d showed a linear trend of Quercus macdougallii against ELEV values.No drastic changes are perceived as ELEV values increase; however, the strip of uncertainty is more noticeable when the elevation above sea level takes values less than 2500 m.
Using the multiple GAM, the Quercus macdougallii model showed a high value of the global deviance (76.8%), followed by the Pinus patula model with 33.6% and the P. pseudostrubus model with 31.9%.The good fit of Quercus macdougallii was reflected in the graphs of the residuals, which were mostly distributed within the range of confidence of Q-Q plot (gray shaded area) at 99% of CI (Figure 3b), and there was moderate dispersion between the adjusted values and response values (Figure 3a).Mostly nonlinear relationships between abundance of species and the covariates were identified.In some cases they were clearer, such as in Pinus pseudostrobus against the elevation above sea level (Figure 2a) or Pinus patula in relation to the precipitation from April to September (Figure 2c).In addition, signs of linear effects were observed for some variables, such as the slope and the exposure, which influenced the abundance of both pine species, and still more evident was the effect of ASP, AST, ELEV or FFP on the abundance of Pinus patula.
Figure 2d showed a linear trend of Quercus macdougallii against ELEV values.No drastic changes are perceived as ELEV values increase; however, the strip of uncertainty is more noticeable when the elevation above sea level takes values less than 2500 m.
Using the multiple GAM, the Quercus macdougallii model showed a high value of the global deviance (76.8%), followed by the Pinus patula model with 33.6% and the P. pseudostrubus model with 31.9%.The good fit of Quercus macdougallii was reflected in the graphs of the residuals, which were mostly distributed within the range of confidence of Q-Q plot (gray shaded area) at 99% of CI (Figure 3b), and there was moderate dispersion between the adjusted values and response values (Figure 3a).

Discussion
GAMs, both in their simple and multiple structure, successfully described the abundance of a species in response to unitary changes of a climatic variable.Graphical results of GAMs not only revealed the trend or turning points of regression, but also showed the possible limits within which the optimum abundance of each species could occur (Figure 2a-d).For example, Figure 2b shown that between 9 and 13 °C, there is a greater distribution of data, the trend of the curve is relatively stable and there is a marked reduction in the uncertainty region (95%).This means that the range of maximum abundance of Quercus macdougallii in relation to mean annual temperature would be between these values.
In this study, we did not find data to include the effect of human activities; neither did we find any study addressing such effect on the distribution and abundance of species in the study area.However, since 1983, the communities of the Sierra Juárez have adopted projects aimed to recover the forest quality, focusing also on the regeneration and conservation of the native pine species, such as: Pinus patula, P. pseudostrobusand P. ayacahuite, which are the most abundant in the study area.Furthermore, the forest management scheme of the timber harvesting area in Santiago Comaltepec has the certification of good practices, awarded by the Forest Stewardship Council (FSC).Nonetheless, the effect of human activities ought to be addressed in future studies, because management certification by itself does not guarantee the conservation of biodiversity [44], and anthropogenic activity has had increased influence on natural ecosystems in recent years [45,46].
Knowing the variables that have a significant effect on the abundance of a species is important because these variables could condition higher or lower presence of that species in a locality.This information could be vital for decision making.The models tested in this research suggest that the aridity index, extreme temperature (the maximum of the warmest month) and the rainfall during specific periods (winter precipitation, summer precipitation, and the precipitation from April to September), significantly affect the abundance of species (Tables 2 and 3).The aridity index has been reported as a variable that is correlated to the altitudinal gradient, which is useful for a possible assisted migration of species to different climate change scenarios [47], whereas extreme temperatures and specific rainfall have also been observed as key variables for determining the density of pine and oak in the northwest of Mexico [25].
Environmental preferences and geographical distribution of a given species are inherent aspects.Therefore, a review is needed of some limitations of mathematical models such as the ones used in this study: Firstly, a simple regression model permits only the testing of the hypothesis that there is a high probability that the tested covariate, and no other influences (or due to chance) explain the predicted variable [48].That is, a good prediction model only increases the probability of

Discussion
GAMs, both in their simple and multiple structure, successfully described the abundance of a species in response to unitary changes of a climatic variable.Graphical results of GAMs not only revealed the trend or turning points of regression, but also showed the possible limits within which the optimum abundance of each species could occur (Figure 2a-d).For example, Figure 2b shown that between 9 and 13 • C, there is a greater distribution of data, the trend of the curve is relatively stable and there is a marked reduction in the uncertainty region (95%).This means that the range of maximum abundance of Quercus macdougallii in relation to mean annual temperature would be between these values.
In this study, we did not find data to include the effect of human activities; neither did we find any study addressing such effect on the distribution and abundance of species in the study area.However, since 1983, the communities of the Sierra Juárez have adopted projects aimed to recover the forest quality, focusing also on the regeneration and conservation of the native pine species, such as: Pinus patula, P. pseudostrobus and P. ayacahuite, which are the most abundant in the study area.Furthermore, the forest management scheme of the timber harvesting area in Santiago Comaltepec has the certification of good practices, awarded by the Forest Stewardship Council (FSC).Nonetheless, the effect of human activities ought to be addressed in future studies, because management certification by itself does not guarantee the conservation of biodiversity [44], and anthropogenic activity has had increased influence on natural ecosystems in recent years [45,46].
Knowing the variables that have a significant effect on the abundance of a species is important because these variables could condition higher or lower presence of that species in a locality.This information could be vital for decision making.The models tested in this research suggest that the aridity index, extreme temperature (the maximum of the warmest month) and the rainfall during specific periods (winter precipitation, summer precipitation, and the precipitation from April to September), significantly affect the abundance of species (Tables 2 and 3).The aridity index has been reported as a variable that is correlated to the altitudinal gradient, which is useful for a possible assisted migration of species to different climate change scenarios [47], whereas extreme temperatures and specific rainfall have also been observed as key variables for determining the density of pine and oak in the northwest of Mexico [25].
Environmental preferences and geographical distribution of a given species are inherent aspects.Therefore, a review is needed of some limitations of mathematical models such as the ones used in this study: Firstly, a simple regression model permits only the testing of the hypothesis that there is a high probability that the tested covariate, and no other influences (or due to chance) explain the predicted variable [48].That is, a good prediction model only increases the probability of finding a species of interest in a given locality where that species really exists, but there will always be a degree of uncertainty as well.
Hence, possible errors in modeling must be taken into account, such as: (i) Predicting the abundance of a species based on a list of variables; the model implicitly assumes that the predicted range or potential space is fully occupied by the modeled species, which does not always happen [49]; furthermore, the spatial distribution of organisms has adopted a dynamic behavior over time [49], so that a potential site may or may not be sparsely vegetated by certain species for a certain period (e.g., during sampling) due to progressive succession of plants; or a temporary absence could be found due to natural causes, such as season of the year, attack of pests or diseases or inter-species competition; (ii) The current presence of a species reflects the contexts of the past [49] which in turn could give rise to uncertainty (though on a smaller scale) in predicting habitats; (iii) The global environmental conditions follow changing trends of different duration [50,51], so it is possible that in a certain case, an observed species may be declining or at risk of extinction in a locality or it could be suffering displacement, dispersion or fragmentation of its habitat [52,53], but the prediction model does not detect this dynamic behavior.
Despite the limitations of predictive models [1,54], several studies have demonstrated the usefulness of correlative models [55,56].Our results reported here could contribute to the identification of environmental preferences of the species studied, as in the case of Quercus macdougallii, which is included on the red list of the IUCN.In general, the proven models served the purpose of the study, but we do not rule out exploring other tools in the future, which will complement these results, such as spatial autocorrelation regression models [57,58] or the Chi-squared Automatic Interaction Detector (CHAID) together with Regression Trees [6], as well as probability functions which would be useful to find the climatic values in which the maximum likelihood of abundance occurs and also, generate maps of maximum abundance.

Conclusions
In this study, the GAMs better served our purpose than GLMs, considering that their results not only revealed the trend of the response variables or the inflection of the regression, but also served to detect when species abundance increased or decreased.GAMs also allow partial characterization of the realized niche of a given species, according to some specific variables, regardless of the type of relationship (Figure 2a-d), which represents a substantial advantage for these types of studies.Furthermore, GAMs are useful in studies with multiple variables in instances when the observations do not follow a specific (normal) distribution.Particularly, smoothed models, both in their simple and multiple forms, are useful tools that help answer questions like those raised in this report: Which variables have a greater and more significant effect on the abundance of each studied species?Or, is there a linear or a curvilinear relationship between the variable of interest and its covariates?The results obtained with GAMs could also be useful to find potential areas for endemic or threatened taxa, because through adequate models it is possible to obtain specific values for each variable that might affect where each species could grow with greater chances of success.

Figure 1 .
Figure 1.Location of the study area.

Figure 1 .
Figure 1.Location of the study area.

Figure 2 .
Figure 2. Response of abundance rate of: (a) Pinus pseudostrobus to the smooth function f(ELEV); (b) Pinus pseudostrobus to the f(MAT); (c) Pinus patula to the f(GSP) and (d) Quercus macdougallii to the f(ELEV).Dashed lines indicate the point-wise standard errors for each smoothing term with a 95% confidence interval (CI).The values on the vertical axis are estimated degrees of freedom.

Figure 2 .
Figure 2. Response of abundance rate of: (a) Pinus pseudostrobus to the smooth function f(ELEV); (b) Pinus pseudostrobus to the f (MAT); (c) Pinus patula to the f (GSP) and (d) Quercus macdougallii to the f (ELEV).Dashed lines indicate the point-wise standard errors for each smoothing term with a 95% confidence interval (CI).The values on the vertical axis are estimated degrees of freedom.

Figure 3 .
Figure 3. Fitted against response values of Quercus macdougallii smooth function (a), and distribution of its residuals in a Q-Q plot, showing a 99% confidence interval (b).

Figure 3 .
Figure 3. Fitted against response values of Quercus macdougallii smooth function (a), and distribution of its residuals in a Q-Q plot, showing a 99% confidence interval (b).

Table 1 .
Descriptive statistics of the variables used in this study.

Table 2 .
Coefficients and adjustment indicators for GLMs and GAMs.
AST: Average slope of the terrain; ASP: dominant aspect or geographic orientation of the ground; GSP: mean precipitation in the growing season (April-September); AI: annual aridity index; SME: Square mean error estimated by cross validation; DE: Global deviance, expressing the percentage of variance explained by the model; AIC: Akaike Information Criterion values; Parms: parameter values in GLM models; EDF: Estimated freedom degrees in the GAM models; * indicates significant p-values (less than 0.01), after the Bonferroni correction at an initial significance level of 0.05.

Table 3 .
Estimated degrees of freedom and adjustment indicators of the GAM models.Estimated degrees of freedom associated with each independent variable estimated by the multiple model; AIC: Akaike information criterion; SME: Square mean error estimated by cross validation; Model type: M = Multiple model, S = Simple model; DE: proportion of the null deviance explained by the model; UBRE: Un-Biased Risk Estimator criterion; * p-values < 0.0028, after the Bonferroni correction at an initial significance level of 0.05.
AST: average slope of the terrain; ASP: dominant aspect or geographic orientation of the ground; ELEV: elevation above sea level; GSP: mean precipitation in the growing season; SPRP: spring precipitation (April + May); SMRSPRPB: summer/spring precipitation balance: (July + August)/(April + May); MTCM: mean temperature in the coldest month; MTWM: mean temperature in the warmest month; FFP: average length of frost-free period; MAP: mean annual precipitation; SMRP: summer precipitation (July + August); DD5: degree-days > 5 • C (based on mean monthly temperature); D100: julian date the sum to 100 of degree-days above 5 • C; MMIN: mean minimum temperature in the coldest month (January); WINP: winter precipitation (November + December + January + February); MMAX: mean maximum temperature in the warmest month (June); MAT: Mean annual temperature; EDF: