Optimized Maxent Model Predictions of Climate Change Impacts on the Suitable Distribution of Cunninghamia lanceolata in China

: Climate change signiﬁcantly inﬂuences changes in ecological phenomena and processes, such as species distribution and phenology, thus accelerating the rate of species extinction or prosperity. Climate change is considered to be one of the most important threats to global biodiversity in the 21st century and will pose signiﬁcant challenges to biodiversity conservation in the future. The use of niche modelling to predict changes in the suitable distribution of species under climate change scenarios is becoming a hot topic of biological conservation. In this study, we use data from China’s National Forest Continuous Inventory as well as specimen collection data of Cunninghamia lanceolata (Lamb.) Hook to run optimized Maxent models to predict potential suitable distribution of the species in the present day, 2050s, and 2070s under di ﬀ erent climate change scenarios in China. In the modeling process, the most important uncorrelated variables were chosen, and the sample-size-adjusted Akaike information criterion (AICc) was used to select the optimal combination of feature type and regularization multiplier. Variable selection reduced the number of variables used and the complexity of the model, and the use of the AICc reduced overﬁtting. Variables relating to precipitation were more important than temperature variables in predicting C. lanceolata distribution in the optimal model. The predicted suitable distribution areas of C. lanceolata were di ﬀ erent for the di ﬀ erent periods under di ﬀ erent climate change scenarios, with the centroids showing a degree of northward movement. The suitable distribution area is predicted to become more fragmented in the future. Our results reveal the climate conditions required for the suitable distribution of C. lanceolata in China and the likely changes to its distribution pattern in the future, providing a scientiﬁc basis for the sustainable management, protection, and restoration of the suitable habitat of this economically important tree species in the context of climate change.


Introduction
Climate is the main factor determining the distribution pattern of plants, and changes in this distribution pattern are a direct reflection of climate change [1][2][3][4][5]. The Fifth Assessment Report (AR5) of the Intergovernmental Panel on Climate Change (IPCC) indicates that the annual mean temperature of the Earth's surface has increased by 0.85 • C over the past 130 years (1880-2012). While the mean temperature of the past 60 years (1951-2012) has increased by 0.72 • C. The mean global surface temperature is predicted to increase by an additional 0.3-4.8 • C by the end of the 21st century, according to simulations of greenhouse gas emission scenarios [6]. Several observational studies have shown that climate change has caused or is causing significant changes to ecological processes, including distribution ranges, morphological characteristics, and phenology that are accelerating species extinction or prosperity [7][8][9]. Because of this, quantitative analysis of climate factors and predictions of climate change impacts on potential suitable species distributions have become key to biogeography and global change research [8,10,11].
Niche modelling (also known as species distribution modelling or habitat suitability modelling) uses statistical or biophysical methods to infer the environmental requirements for the survival of a species based on the environmental variables taken from the known distribution areas, and then predicts their potential or future distribution [12][13][14]. Niche modelling has been widely used in ecology, biogeography, and conservation biology studies, such as the adaptive analysis of invasive alien species [15,16], conservation of endangered species [17,18], selection of natural reserves [19,20], and impacts of climate change on species distribution [21][22][23]. At present, there are many niche models used for species distribution prediction, including the bioclimate analysis and prediction system (BIOCLIM) [24], the genetic algorithm for rule set production (GARP) [25], the ecological niche factor analysis (ENFA) [13], and the Maximum Entropy Model (Maxent) [26,27]. These niche models, established on the basis of the statistical relationship between species distribution and environment variables (rather than the causality), only consider the environmental factors that control species distribution and do not consider the interactions between species, species evolution, extreme interference events, or species diffusion processes [12,28]. The current correlation between species and the environment (climate) does not necessarily guarantee that it can be used to predict future species distribution; however, niche modelling is still the most commonly used method for assessing the impacts of climate change on species distribution at the regional scale. Niche modelling is required to predict changes in species distribution, due to the difficulty of effectively carrying out field experiments at the geographical distribution scale of an entire species [28,29]. Niche models can be easily constructed and can provide rapid predictions for the changes to species distribution and the risk of habitat loss.
The Maxent model, which is based on the theory of maximum entropy, is an effective model of species distribution. The model simulates the potential geographical distribution of a species using information on its current (present day) distribution as well as various environmental data [27,30]. The Maxent model is relatively simple and quick to run, has a small sample demand, provides stable operation results, and allows prediction results to be tested [27,31,32]; thus, it is favored by many researchers [14,33,34]. However, the model is often employed using the default parameters or those published previously without consideration of the details of the algorithm or input parameters, and all the environment variables that can be collected are included in the model indiscriminately. This not only increases the complexity and operation time of the model, but also greatly limits its scope of application and accuracy. Previous studies have shown how different environmental variables and parameters used in niche models affect the results [35][36][37]. Complex models show low performance in identifying the most important environmental variables and in transferring habitat suitability to new environmental conditions [38][39][40]. When using the default parameters of the Maxent model to predict the potential suitable distribution of species, overfitting can occur, leading to a decline in species-transfer ability [41,42]. Therefore, the selection of appropriate environmental variables and model parameters is an important step that must be considered in the application of the Maxent model.
Cunninghamia lanceolata (Lamb.) Hook is a fast-growing and high-yielding tree that is native to the subtropical region of South China, where it is the main species used in afforestation and for timber production. It is a long-lived evergreen tree and plays an important role in regional ecological construction. Therefore, it is important to study the impact of climate change on the suitable habitat of C. lanceolata. In this study, the Maxent model is used to simulate the suitable distribution of C. lanceolata and how this might change under different climate change scenarios after parameter optimization and climate variable selection. The specific objectives of this study are as follows: (1) to explore the influence of environmental variable selection and parameter settings on the performance of the Maxent model; (2) to understand the importance of environmental variables in predicting the potential suitable distribution of C. lanceolata; and (3) to compare the differences in the predicted distribution of C. lanceolata under different climate change scenarios in different time periods. Our optimized model, based on the variables with the highest explanatory power, reveal the climate conditions required by C. lanceolate that account for its present-day distribution. The model also suggests marked changes to this distribution pattern under predicted climate change scenarios and increased fragmentation of suitable habitat by the 2070s.

Species Occurrence Data
Species occurrence data from the basis of potential suitable distribution predictions under climate change scenarios in niche modelling and the accuracy and reliability of the occurrence data are directly related to the plausibility of the predicted results [43,44]. Two sources of C. lanceolata occurrence data were used for this study: (1) the National Forest Continuous Inventory (NFCI) of China; and (2) records of specimen collection of the China National Specimen Information Infrastructure (NSII) and the Chinese Virtual Herbarium (CVH). The NFCI is the first level of the forest inventory system of China designed to provide reliable data on the current status of, and changes in, forests in the form of an integrated spatial database [45]. The NFCI survey is carried out every five years at the provincial scale. The permanent sample plots are systematically located at the graticule intersection of the national topographic map (scale 1:100,000 or 1:50,000) [46]. The NFCI data are the only high-quality forest inventory data available at the national scale at present [47]. The NFCI plots, in which the dominant tree species was C. lanceolata, were selected. Records of collected C. lanceolata specimen that had accurate associated coordinates were obtained from the China National Specimen Information Infrastructure (NSII) [48] and the Chinese Virtual Herbarium (CVH) [49]. We selected only one point closest to the center in the 30" grid plots (approximately 1 km 2 ), which was matched with climate data layers, to reduce spatial autocorrelation [50]. In total, 1307 NFCI points and 570 specimen record points were obtained for niche modelling (Figure 1).

Climate Data
Environmental variables were selected according to their relevance to plant survival and growth, thus precipitation and temperature are the primary environmental factors affecting plant distribution, since the formation of the plant distribution pattern is closely associated with the precipitation and temperature [51]. The climate data used in the prediction of suitable species distributions under climate change scenarios included the baseline climate condition data (time period: present day [average for 1970-2000]) and the climate scenario data (time periods: 2050s [average for 2041-2060] and 2070s [average for 2061-2080]). The baseline climate condition data were obtained using a thin plate spline function to establish the relationship between the observation values (24,254 mean temperature observatories, 14,835 minimum and maximum temperature observatories, and 47,554 precipitation observatories) and the corresponding latitude, longitude, and elevation of the observatories; the data were then interpolated to the area without the original observation values [52]. The climate scenarios for the 2050s and 2070s were derived from the downscaled global climate model (GCM) of the IPCC AR5. The Beijing Climate Center Climate System Model version 1.1 (BCC-CSM1-1), developed by the Beijing Climate Center of China Meteorological Administration, provided the data on China for the IPCC AR5 [53,54]. The BCC-CSM1-1 included data on four climate change scenarios under the different Representative Concentration Pathways (RCPs), which had a high temperature and precipitation prediction accuracy, and included detailed consideration of the impact of greenhouse gas emissions through various policies and strategies for coping with global climate change [6]. The four RCPs (RCP2.6, RCP4.5, RCP6.0, and RCP8.5) are defined by the possible range of radiative forcing values (2.6, 4.5, 6.0, and 8.5 W/m 2 , respectively) in the year 2100 [6]. Previous work has indicated that the RCPs of BCC-CSM1-1 have a high accuracy for China [55]; thus, the RCP data from the BCC-CSM1-1 was used to predict future suitable distributions of C. lanceolata in this study.
The climate variable data used in this study can be downloaded from the WorldClim-Global Climate Data website [56]. The data includes 19 climate variables, which were derived from the monthly temperature and precipitation values to generate more biologically meaningful data that reflect a range of temperature and precipitation summaries (e.g., trends, seasonality, and extremes) [57]. The spatial resolution of the data was 30" (approximately 1 km 2 ). The description and statistics of these climate variables are presented in Tables 1 and A1, respectively. Forests 2020, 11, x FOR PEER REVIEW 4 of 25 BCC-CSM1-1 have a high accuracy for China [55]; thus, the RCP data from the BCC-CSM1-1 was used to predict future suitable distributions of C. lanceolata in this study. The climate variable data used in this study can be downloaded from the WorldClim-Global Climate Data website [56]. The data includes 19 climate variables, which were derived from the monthly temperature and precipitation values to generate more biologically meaningful data that reflect a range of temperature and precipitation summaries (e.g., trends, seasonality, and extremes) [57]. The spatial resolution of the data was 30" (approximately 1 km 2 ). The description and statistics of these climate variables are presented in Table 1 and Table A1, respectively.

Model
The Maxent model is a niche modelling based on the maximum entropy theory [27]. Maximum entropy theory, proposed by Jaynes, in 1957, holds that anything with the maximum entropy is closest to its real state under known conditions [58]. The Maxent model is used to estimate the target probability distribution by finding the probability distribution of the maximum entropy (i.e., that is most spread out, or closest to uniform) that is subject to a set of constraints that represent our incomplete information regarding the target distribution [27]. In a study of species' distribution, the constraints are climate, soil, terrain, or other environmental variables influencing the species' occurrence points. Maxent software was developed by Phillips et al., in 2004 [26].

Optimization of Model Parameters
The Maxent model was used to predict the potential suitable distribution of C. lanceolata in three time periods (present day, 2050s, and 2070s). The predictive performance of the model is influenced by the choice of two parameters that need to be optimized: (i) the regularization multiplier (RM), and (ii) the feature combination (FC). RM can affect how focused the output distribution is. A smaller RM value will result in a more localized output distribution that fits the given occurrence records better, but is more prone to overfitting. Overfitted models are poorly transferable to novel environments and, thus, not appropriate to project distribution changes under environmental change. A larger RM value will produce a prediction that can be applied more widely [27,30]. We used RM constants of 0.5 to 5 with steps of 0.5.
The features, which were produced from the climate variable layers, constrain the computed probability distribution. The available feature types include linear (L), quadratic (Q), product (P), threshold (T), and hinge (H). The functions of these feature types are as follows [27,30]: (1) Linear features constrain the output distribution for each species to have the same expectation of each of the continuous climate variables as the sample locations for that species. A linear feature is simply one of the continuous climate variables. (2) Quadratic features (when used together with linear features) constrain the output distribution to have the same expectation and variance of the climate variables as the samples. A quadratic feature is the square of one of the continuous climate variables. (3) A product feature is the product of two continuous climate variables; when used with linear and quadratic features, product features constrain the output distributions to have the same covariance for each pair of climate variables as the samples. (4) A threshold feature is derived from a continuous climate variable. For a threshold value v, the threshold feature is binary (taking values 0 and 1) and is 1 when the variable has value greater than v. The effect of a threshold feature is to make the total probability of grid cells with a value greater than the threshold be equal to the fraction of sample locations with the value above the threshold v. (5) A hinge feature is also derived from a continuous environmental variable. It is like a linear feature, but it is a constant below a threshold v.
The default RM value is 1. The selection of FC is related to the number of species occurrence points. Generally, linear features are always used in the model, quadratic features are used when the number of occurrence points are more than 10, hinge features are used when the number of occurrence points are more than 15, and threshold and product features are used when the number of occurrence points total more than 80 [59]. Two feature types and five FCs (i.e., L, H, LQ, LQH, LQP, LQHP, and LQHPT) were used in this study.
The grid search approach (i.e., an exhaustive search method for setting parameter values) was used to find the best combination of RM and feature type-i.e., the performance of the 70 models (run using different combinations of RM and feature type) was compared.

Variable Selection
Along with the selection of the combination of RM and feature type, the complexity of the Maxent model should also be considered. Generally, the accuracy of simple niche models is low but the transferability to novel climate conditions is high, while the opposite is true for complex models [36,38,60]. The number of climate variables affects the complexity of the Maxent model. On the one hand, too many climate variables will affect the calculation time and efficiency when predicting the suitable distribution of species on a large regional scale; on the other hand, the climate variables with obvious collinearity will affect the interpretation of the results. Therefore, it is particularly important to reduce the number of climate predictor variables and select the appropriate ones to improve the transferability of the Maxent model to different climate change scenarios. Data dimensionality reduction methods such as correlation analysis and principal component analysis are widely used in the selection of climate variables before the prediction of suitable habitats for species [61][62][63]. However, these methods of variable selection before modeling cannot effectively identify which variables are most important for the predictions using the Maxent model. Maxent provides two metrics to determine the importance of input climate variables in the final model: percent contribution and permutation importance [27]. Each step of the Maxent algorithm increases the gain of the model by modifying the coefficient for a single variable, and the program can assign the increase to the climate variables that the variable depends on; the percent contribution is obtained at the end of the training process by converting the increase to a percentage [64]. The permutation importance of each variable is determined by randomly permuting the values of that variable among the training points and measuring the resulting decrease in training of the curve of the receiver-operating characteristic plot, with a large decrease indicating that the model depends heavily on that variable [64].
In this study, the climate variables were selected on the basis of their percent contribution. However, the interpretation of variable contributions is unsuitable when the predictor variables are correlated [27]. Therefore, the correlation between the variables was also considered to exclude those with high correlations with the important contributing variables. The acquisition of the suitable variable subset was a continuous search process consisting of four steps: (1) Subset generation: generation of a candidate variable subset according to a certain search strategy.
We used a generalized sequential backward selection approach where variables were continuously excluded from the start point of the search (the original full variable set). First, an initial Maxent model was compiled with all climate variables included. Then, all variables that had a percent contribution score below the set value of contribution threshold were removed. Variables that correlated with the variable showing the highest percent contribution above the set value of correlation coefficient threshold were then excluded. The remaining variable set was used to compile a new Maxent model. (2) Stopping criterion: used to determine when the variable search algorithm should stop. The process of subset generation was repeated until the variable subset met the stopping criterion. The search process stopped only when the variable subset met two criteria: (i) the percent contribution score of variables must be above the contribution threshold (equal to 1); and (ii) the correlation of variables must be below the correlation coefficient threshold (equal to 0.9). (3) Subset validation: used to verify the validity of the selected variable subset. A 10-fold cross-validation approach was performed to evaluate the performance of the variable subset in each round; therefore, the subset validation was not an independent step in the process. (4) Subset evaluation: evaluation of the prediction performance of the final variable subset through an evaluation function. We used two main evaluation criteria: (i) the sample-size-adjusted Akaike information criterion (AICc) [65]; and (ii) the area under the curve (AUC) estimated from test data [66].
The R package dismo [67] was used to run the Maxent software (Version 3.4.1, Cambridge, MA, USA) which performed the Maxent modeling and variable selection. The contribution of the variables was obtained using the jackknife approach [68]. The correlation between the climate variables was calculated using the Pearson correlation coefficient in the software SPSS (Version 25, Armonk, NY, USA) (Table A2). Each model runs using a combination of RM and feature type needed to search the variable subset. The variable selection workflow is shown in Figure 2. (4) Subset evaluation: evaluation of the prediction performance of the final variable subset through an evaluation function. We used two main evaluation criteria: (i) the sample-size-adjusted Akaike information criterion (AICc) [65]; and (ii) the area under the curve (AUC) estimated from test data [66].
The R package dismo [67] was used to run the Maxent software (Version 3.4.1, Cambridge, MA, USA) which performed the Maxent modeling and variable selection. The contribution of the variables was obtained using the jackknife approach [68]. The correlation between the climate variables was calculated using the Pearson correlation coefficient in the software SPSS (Version 25, Armonk, NY, USA) (Table A2). Each model runs using a combination of RM and feature type needed to search the variable subset. The variable selection workflow is shown in Figure 2.

Evaluation Criteria
Evaluation of model performance is an important topic in niche modelling [69]. In the Maxent model, the AUC of the receiver-operating characteristic plot is used as a default evaluation criterion to assess the accuracy of the model. The AUC, which is a threshold-independent evaluation criterion that is not sensitive to species prevalence [70], is widely used in niche modelling [23,27,66,71]. Generally, the AUC value ranges from 0.5 to 1, and the higher the value, the better the prediction of the model. Five grades of model prediction accuracy can be classified according to the AUC value [72]: 0.5-0.6, fail; 0.6-0.7, poor; 0.7-0.8, fair; 0.8-0.9, good; and 0.9-1, excellent.
However, it has been suggested that high AUC values are associated with the degree to which a model can successfully distinguish occurrence from background localities, but this rank-based nonparameter metric cannot be used to evaluate the model goodness-of-fit [73,74]. That is, the maximization of test AUC values favors models that can discriminate between occurrence and background sites well but may overfit the realized niche of a species [38,66]. Therefore, using multiple evaluation criteria to reduce the deficiency of single evaluation criterion has become a popular method for evaluation of the niche modelling [75,76].
The AIC, based on the entropy theory, is a criterion to measure the goodness-of-fit of a statistical model and can weigh the complexity of the estimated model [65]. The AICc was developed by Warren et al. [38,40] as an AIC corrected for small sample size; however, it has been widely used regardless of sample size based on the recommendation of Burnham et al. [77]. The AICc is calculated based on the unpartitioned (full) dataset [78]: where k is the number of parameters in the model (i.e., number of non-zero parameters in the Maxent lambda file) which record the coefficients of the Maxent model for a species; n is the number of occurrence localities; v is a vector of Maxent raw values at occurrence localities; and t is the sum of

Evaluation Criteria
Evaluation of model performance is an important topic in niche modelling [69]. In the Maxent model, the AUC of the receiver-operating characteristic plot is used as a default evaluation criterion to assess the accuracy of the model. The AUC, which is a threshold-independent evaluation criterion that is not sensitive to species prevalence [70], is widely used in niche modelling [23,27,66,71]. Generally, the AUC value ranges from 0.5 to 1, and the higher the value, the better the prediction of the model. Five grades of model prediction accuracy can be classified according to the AUC value [72]: 0.5-0.6, fail; 0.6-0.7, poor; 0.7-0.8, fair; 0.8-0.9, good; and 0.9-1, excellent.
However, it has been suggested that high AUC values are associated with the degree to which a model can successfully distinguish occurrence from background localities, but this rank-based non-parameter metric cannot be used to evaluate the model goodness-of-fit [73,74]. That is, the maximization of test AUC values favors models that can discriminate between occurrence and background sites well but may overfit the realized niche of a species [38,66]. Therefore, using multiple evaluation criteria to reduce the deficiency of single evaluation criterion has become a popular method for evaluation of the niche modelling [75,76].
The AIC, based on the entropy theory, is a criterion to measure the goodness-of-fit of a statistical model and can weigh the complexity of the estimated model [65]. The AICc was developed by Warren et al. [38,40] as an AIC corrected for small sample size; however, it has been widely used regardless of sample size based on the recommendation of Burnham et al. [77]. The AICc is calculated based on the unpartitioned (full) dataset [78]: where k is the number of parameters in the model (i.e., number of non-zero parameters in the Maxent lambda file) which record the coefficients of the Maxent model for a species; n is the number of occurrence localities; v is a vector of Maxent raw values at occurrence localities; and t is the sum of Maxent raw values across the entire study area. The AICc was calculated using the "calc.aicc" function in the R package ENMeval [79]. Generally, minimization of AICc values favors models that recognize the fundamental niche of a species and that are more transferable to future climate scenarios [38]. Therefore, the minimal AICc value has been used to select the optimal model in numerous studies [80][81][82].
In addition, a relative measure (i.e., the difference between training and testing AUC values [AUC.Diff]) was also used to quantify the degree of overfitting. The AUC.Diff is expected to be high for models overfit to the training data [38].

Division of Suitable Distribution Area
The suitable distribution area was divided into three grades using the Jenks' natural breaks approach according to the value of suitability in previous studies [83,84]: 0-0.25, non-suitable area; 0.25-0.5, low-suitability area; and 0.5-1, high-suitability area. However, the natural breaks approach did not produce clear divisions of suitable distribution areas and the area defined as high-suitability was too large, which hinders detailed analysis for a species. Therefore, five grades of suitable distribution area were classified using the equal interval approach: 0-0.2, non-suitable area; 0.2-0.4, low-suitability area; 0.4-0.6, general-suitability area; 0.6-0.8, medium-suitability area; and 0.8-1, high-suitability area.
The change in suitable distribution area is a common species distribution analysis criterion. The size of the suitable area plays an important role in maintaining the spatial continuity of a species, but the structure and function of a suitable area are also important aspects. Therefore, we used four landscape metrics-aggregation index (AI), fractal dimension index (FRAC), Shannon's diversity index (SHDI), and contagion (CONTAG)-to evaluate spatial pattern change in suitable areas from a landscape aspect (Table A3). The AI and FRAC are class metric measures that aggregate properties of the patches belonging to a single class in the landscape, while SHDI and CONTAG are landscape metrics that measure the aggregate properties of the entire patch mosaic. These landscape metrics are calculated by the software FRAGSTATS: Spatial Pattern Analysis Program for Categorical Maps (Version 4.2, Amherst, MA, USA).
In addition, the direction and range of the species migration under the climate change scenarios were analyzed by comparing changes to the location of the suitable-area centroid. Figure 3 shows how the performance of the Maxent models altered when different combinations of RM and feature type were included after variable selection. Variable selection could significantly reduce the number of variables included, but the models with a low RM or a complex combination of feature type required more variables. Thirteen variables were selected which differed in importance. Bio2, Bio6, Bio8, Bio12, Bio14, and Bio15 were frequently selected, highlighting the importance of these variables in Maxent models of C. lanceolata.

Variable Selection and Accuracy Evaluation
The evaluation metric result showed that the test AUC values of all models were larger than 0.86, suggesting that the models worked well and had high prediction accuracy, but the values of change were not too large. By contrast, the minimum AICc value of the model (RM = 3.5, FC = LQH) was significantly lower than the default model value (RM = 1, FC = LQHPT). The AUC.Diff value of the default model was larger than the minimum AICc value, indicating that the default model was overfitted to the training data. Additionally, the number of variables of the model with the minimum AICc value (variable count = 5) was also lower than the default model (variable count = 7). Therefore, to reduce the overfitting and complexity, the model with the minimum AICc was selected as the optimal model to predict the suitable distribution area for the present day, 2050s, and 2070s under different climate scenarios in China (Table 2).

Variable Importance
Five variables were included in the optimal model: Bio14 (precipitation of driest month), Bio12 (annual precipitation), Bio6 (min temperature of coldest month), Bio8 (mean temperature of wettest quarter), and Bio5 (max temperature of warmest month) ( Table 3). The cumulative contribution (i.e., importance) of Bio14 and Bio12 was more than 90%, rising to over 95% with the inclusion of Bio6, indicating that these variables best explain the data. Therefore, precipitation variables appear more important to the Maxent predictions for C. lanceolata than temperature variables. The selected variables determining the suitable distribution of C. lanceolata represent near-extreme climate variables. To understand how these variables affected the predicted distribution of C. lanceolata, the response curves were calculated to estimate the relationship between the probability of presence and climate factors. The red lines in Figure 4 show how the variables affect the predicted probability of suitable conditions when all other variables are set to their average value; that is, the curves showed the marginal effect of changing only one variable. High correlations between the variables after variable selection was not observed (Table A2), and thus, the marginal response curves can be interpreted. The probability of presence value increased with increasing Bio14, Bio12, and Bio6 values, before stabilizing at a high probability, or decreased after reaching the maximum probability; this indicates the importance of these variables for predicting distribution patterns of C. lanceolata. By contrast, the probability value was high for most Bio8 and Bio5 values although it decreased rapidly at high variable values; however, the roles these variables were not obvious and the significance of other variables to the optimal model suggests that they are not of high importance in predicting distribution patterns.
The blue lines in Figure 4 show how each of the five optimal-model variables independently affect the predicted probability of suitable conditions, namely, a Maxent model created using only the corresponding variable. The probability of presence of C. lanceolata is close to 0 when the precipitation of the driest month (Bio14, the most significant variable), is less than 5 mm, then increases rapidly and reaches the maximum when Bio14 is over 30 mm. Similarly, the probability of presence is close to 0 when annual precipitation (Bio12) is less than 500 mm, then increases rapidly and reaches the maximum when Bio12 is over 1200 mm. Therefore, there appears to be no upper limit but a strict lower limit of precipitation variable values for the suitable distribution area of C. lanceolata. According to the suitability grade, the suitable distribution area (probability ≥0.6) for C. lanceolata requires more than 20 mm of precipitation during the driest month and an annual precipitation of over 1100 mm.
In contrast to the precipitation factors, the suitable distribution area of C. lanceolata has strict upper and lower temperature limits. The probability of presence was close to 0 when Bio6 (min temperature of coldest month), Bio8 (mean temperature of wettest quarter), and Bio5 (max temperature of warmest month) were less than −15 • C, 5 • C, and 10 • C, respectively, before increasing rapidly and reaching the maximum level when these values rose to 2 • C, 23 • C, and 33 • C, respectively, again showing a sharp drop-off above these values. According to the suitability grade, the suitable distribution area (probability ≥0.6) for C. lanceolata requires the minimum temperature in the coldest month, mean temperature of the wettest quarter, and maximum temperature of the warmest month to be 0-8 • C, 19-25 • C, and 28-34 • C, respectively.

Analysis of Potential Suitable Distribution
The present-day predicted suitable distribution area of C. lanceolata was situated in Guizhou, Hunan, Guangdong, and Fujian Province, coincident with the actual distribution ( Figure 5). The percent of area of general-, medium-, and high-suitability distribution was 6.86%, 5.25%, and 0.76%, respectively ( Table 4). The suitable distribution areas markedly changed for different time periods under different climate change scenarios. Compared to the present-day predictions, the total area of suitable distribution (general-, medium-, and high-suitability) gradually decreased in the 2050s and 2070s ( Table 4). The ranking of the total area under different climate change scenarios was RCP2.6 > PCP4.5 > PCP6.0 > PCP8.5, indicating that as the greenhouse gas emissions increase, the distribution of C. lanceolata is likely to decrease. It is of note that the area of high-suitability distribution for RCP4.5 greatly increased in the 2050s.

Analysis of Potential Suitable Distribution
The present-day predicted suitable distribution area of C. lanceolata was situated in Guizhou, Hunan, Guangdong, and Fujian Province, coincident with the actual distribution ( Figure 5). The percent of area of general-, medium-, and high-suitability distribution was 6.86%, 5.25%, and 0.76%, respectively ( Table 4). The suitable distribution areas markedly changed for different time periods under different climate change scenarios. Compared to the present-day predictions, the total area of suitable distribution (general-, medium-, and high-suitability) gradually decreased in the 2050s and 2070s ( Table 4). The ranking of the total area under different climate change scenarios was RCP2.6 > PCP4.5 > PCP6.0 > PCP8.5, indicating that as the greenhouse gas emissions increase, the distribution of C. lanceolata is likely to decrease. It is of note that the area of high-suitability distribution for RCP4.5 greatly increased in the 2050s.   Because the edges of the suitable distribution area were not regular, it is more suitable and representative to express their change using the centroid movement. The centroids of the suitable distribution area were located in Hunan and Jiangxi Provinces and showed a degree of northward movement compared to the present-day position under different climate change scenarios ( Figure 6). However, the distance of centroid movement differed between suitability grades in different time periods. The movement distance of the general-suitability grade was smaller than the medium grade, which was in turn smaller than the high-suitability grade and the predicted movement distance was greater in the 2070s than in the 2050s. Therefore, the distance of centroid movement was positively correlated with the range of change between the different climate change scenarios and the present day ( Table 4), indicating that the larger the moving distance of the centroids, the greater the change of the suitable distribution area. In addition, the centroids of the high-suitability distribution area for RCP2.6, RCP4.5, and RCP8.5 in the 2070s show a clear eastward progression compared to the 2050s.  Because the edges of the suitable distribution area were not regular, it is more suitable and representative to express their change using the centroid movement. The centroids of the suitable distribution area were located in Hunan and Jiangxi Provinces and showed a degree of northward movement compared to the present-day position under different climate change scenarios ( Figure 6). However, the distance of centroid movement differed between suitability grades in different time periods. The movement distance of the general-suitability grade was smaller than the medium grade, which was in turn smaller than the high-suitability grade and the predicted movement distance was greater in the 2070s than in the 2050s. Therefore, the distance of centroid movement was positively correlated with the range of change between the different climate change scenarios and the present day (Table 4), indicating that the larger the moving distance of the centroids, the greater the change of the suitable distribution area. In addition, the centroids of the high-suitability distribution area for RCP2.6, RCP4.5, and RCP8.5 in the 2070s show a clear eastward progression compared to the 2050s.   The general-, medium-, and high-suitability distribution areas were predicted as more fragmentated in the 2050s and 2070s compared to the present day ( Figure 5). The landscape metric results also indicated that the suitable distribution areas will become increasingly fragmented (Table 5). For the class metrics, the AI values of the general-, medium-, and high-suitability distribution areas gradually decreased from RCP2.6 to RCP8.5, indicating that the predicted patches became more discrete; conversely, the FRAC values of these classes gradually increased, suggesting that the predicted edges of these patches became more complex. Because the areas of non-suitable distribution increased, the CONTAG values also gradually increased while the SHDI values gradually decreased. Although these metrics express the structural characteristics of different suitability grades from different aspects, they all predict that the suitable area of C. lanceolata will gradually decrease or even be lost altogether as a consequence of climate change.

Discussion
Niche modelling is a powerful tool for the prediction of potential species distribution that can help inform applied ecology. However, there are several limitations to niche modelling, such as low transfer ability and discrepancies between the simulation results and the actual species distribution, that can result in an inappropriate conclusion being drawn from the models. Improving the transfer ability of niche models not only helps to accurately simulate the potential species distribution, but can also generate important reference data for theoretical issues such as niche conservation.
Maxent has been widely used in the prediction of potential distribution of species in recent years [34,85]. The model assumes that species will be present in all areas with suitable climate conditions and will not present in any area with unsuitable conditions; it is believed that the larger the entropy of a species under known conditions, the closer it is to the reality [27]. The default parameters of Maxent were originally set through the testing of data from 266 species in six different geographical regions by early model developers [30]. They used the massive distribution data of the test species and a variety of experimental schemes to define optimal model parameters as the default to promote and simplify the application of the model; however, the purpose of application of these models was to predict the present-day distribution of a species, and the model did not have to be transferable after construction [30,86]. Adjustments to the simulation scheme are necessary to meet different simulation requirements depending on the research question, although the species distribution simulated by the Maxent model is between the potential distribution and the real distribution [86]. To simulate the potential distribution of a species, the default parameters of the Maxent model may not be applicable. For example, the parameters of our optimal model differ from the default parameters ( Figure 3, Table 2).
If only the default parameters are used, the transfer of the model will be reduced because of overfitting, although it can fit the current distribution of the species for the study area well.
The complexity of a Maxent model has a significant influence on its transfer ability. Both simple and complex models have advantages and disadvantages. Simple models may show a poor fit to the species distribution data of the study area, and while complex models can produce a better fit, the response relationship between species and environmental variables can deviate from the reality. The adverse effect of Maxent model complexity on prediction results when simulating potential distributions is more serious because of the need to transfer the model to other geographical areas after model construction [38,40]. The prediction ability of complex models will be reduced, and the simulation results may not be reliable or can deviate from the actual situation, because of overfitting when they are transferred to other geographical areas.
In this study, steps were taken to optimize the two main factors known to influence the prediction results of the Maxent model. First, as much species distribution data as possible were collected to reduce error caused by incomplete data collection; a total of 1877 sample points were collected. Generally, the species distribution points should not only adequately cover the geographical distribution and ecological space of a species, but also reduce the inaccuracy of sampling points. It has been shown that the sample size of species distribution data significantly affects the accuracy of model simulation results [87][88][89][90]. In this study, the modeling data covered the southern provinces of China and only one point was reserved in the same pixel. Second, we optimized the parameters of the model. We constrained the complexity of the model by reducing the number of variables through variable selection and using AICc values to select the optimal combination of feature types and RM. Along with decreasing the complexity of the model, reducing the number of variables can reduce the operation time and improve the interpretability. The Maxent model uses the default criterion (i.e., the AUC) to evaluate the prediction ability of the model, but the AUC may overestimate this ability and directly affect the species distribution results when test data and training data are affected by sampling error [40,91]. Nevertheless, using the AICc to select the feature type and the built-in RM can constrain the complexity of model [38].
Five factors (i.e., climate change, habitat change, overexploitation, pollution, and invasive alien species) are viewed as the principal threat to global biodiversity in the 21st century [92]. As an important aspect of global change, climate change mainly manifests as temperature rises, changes to precipitation patterns, and increasing extreme climate events; in addition, changing climate factors will also affect the growth and reproduction, phenology, and geographical distribution of species at the regional scale [1,2,93]. Current biodiversity conservation measures, such as the establishment of nature reserves, are usually based on the current distribution of species, but are facing increasing challenging as the effects of climate change increase [20,94]. It is of key importance to understand changes in suitable areas of distribution under the future climate change scenarios, and implement targeted conservation measures as early as possible to improve the effectiveness of biodiversity conservation [29,95]. With the exponential growth of global greenhouse gas emissions, the trend of climate warming in China will be further intensified in the future. The results of this study show that climate change will have a significant impact on the suitable distribution area of C. lanceolata, and the suitable habitat area will likely gradually decrease. In addition, we found that the prediction results using RCP4.5 data were different from other climate scenarios. Previous studies have shown that the climate scenario of RCP4.5 is the closest to the actual predicted climate situation in China [55]; thus, the suitable distribution area of C. lanceolata has not been significantly reduced over a short time period under this climate scenario. However, the suitable distribution area of C. lanceolata will still likely see a large reduction with the likely effects of climate change caused by continuing increases in greenhouse gas emissions.
Climate change, especially global warming, not only causes temperature changes in different regions, but also changes the distribution pattern of precipitation. When the change of these climatic factors is close to or beyond the adaptive threshold of current plant growth, it will lead to the migration of their distribution [7]. Water availability has a significant impact on plant height, leaf area, branch number, photosynthesis and growth [96], thus, precipitation can constrain species distribution and influence distribution in various ways [97]. The increase of precipitation during the driest month results in a longer growth season and helps species migrate to more suitable habitats within their distribution [98]. In addition, extreme high and low temperatures also have a significant influence on plant growth. The decrease of the minimum temperature of the coldest month results in premature freezing injuries to plants, and long-term low temperatures will lead to the death of plants at the distribution limit [97]; while the increase of the maximum temperature of the warmest month will destroy the water balance of plants, promote the coagulation of proteins and the internal mechanism of harmful metabolites, thus hindering plants' growth [99]. Climate extremes have been shown to have important impacts on species' distribution and diversity, and adding extreme climate variables in niche modelling can improve the predicted accuracy and limit of species distributions under future climate change scenarios [100][101][102][103][104].
Because of changes in environmental factors, particularly those of the climate, the suitable habitat of tree species has changed in China, which makes it difficult to correctly formulate a long-term forest management plan. Sustainable forest management planning at the regional scale includes predicting the spatial distribution of different types of forests and forest management division; it is therefore a spatial decision-making process. The impact of forest management on ecosystems and the evaluation of forest management effectiveness are long-term processes; therefore, a long-time scale must be adopted to measure the sustainability of forest management. When developing forest management plans, optimized model simulations and decision analysis of the management process should be carried out over at least one forest management cycle. During the process of long-term (>30-year) forest management planning, climate change will lead to changes in the environment, and thus suitable habitats, for tree species. Therefore, it is necessary to consider the impact of climate change in the process of long-term forest management planning. The Maxent model is a key tool for simulating the distribution of suitable habitats of tree species, and therefore predicting suitable distribution areas, under future climate change scenarios. Our optimized model can provide vital reference data for the selection of C. lanceolata afforestation site in the formulation of long-term forest management planning.
Accurate simulation of species distribution areas is of key importance for species protection and restoration. However, many biological and non-biological factors must be considered to achieve this goal. In practice, important factors are often ignored to simplify the data needed for the model for the feasibility of the research, which affects the operation results to a certain extent, leading to errors between the predicted results and reality. Different simulation results of a species may be generated because of different sample counts and parameters applied to the model by different researchers, even for the same species. Liu et al. used the data from the counties in the subtropical area in China as distribution sample sites and the WorldClim data with 2.5' (approximately 16 km 2 ) spatial resolution in combination with BIOCLIM model to predict the potential distribution area under the present and future climate conditions for C. lanceolata. The results indicated that the suitable distribution area would shrink along with distinct fragmentation when the greenhouse gas emission was doubled [105]. While Lu et al. used the process-based growth model and the WorldClim data with 2.5´spatial resolution as well as the digital version of Vegetation Map of China (1:1,000,000) to map the distribution and productivity of C. lanceolata. The results showed that a species is likely to experience a northward shift with minor changes in the south, and the central regions of China are likely to become more suitable for C. lanceolata under future climate conditions [106]. By contrast, with the optimized Maxent model, we not only obtain the potential distribution results with high accuracy and high spatial resolution, but also improve the transfer ability of the Maxent to predict the potential distribution of C. lanceolate in the future climate scenarios. Besides climate factors, there are many variables that affect the distribution of C. lanceolata, including land use, soil conditions, and topography. Additionally, human activity is an important factor affecting species distribution and the key factor determining the actual niche of a species. While human activity directly changes the habitat characteristics of species, we can also implement measures that can also affect how species distribution is influenced by climate change. We did not consider these factors because there are no suitable data on soil and land use under different greenhouse gas emission scenarios. The impact of climate change on soil and land use change is currently poorly understood, and the interaction between human activity and climate change is complex. However, as more data on variables affecting species distributions become available, niche models can better predict the likely effects of climate change.

Conclusions
In this study, we used present-day occurrence data to predict the potential suitable distribution area of C. lanceolata in China based on an optimized Maxent model under different climate change scenarios. The results indicate the following: (1) the number of variables used for the optimal model was greatly reduced compared to the default model after variable selection-using the AICc to select the parameters of the model reduced overfitting issues and complexity compared to using the AUC; (2) the optimal model incorporated five climate variables, with precipitation factors being more important than temperature factors; and (3) the predicted potential suitable distribution area of C. lanceolata of the present day matched the actual occurrence well. The centroids of the general-, medium-, and high-suitability distribution areas showed a degree of northward movement and the patches became more fragmentated under predictions for the 2050s and 2070s compared to the present-day under different climate change scenarios. To accurately simulate changes in suitable distribution area of a species under future climate change scenarios, responses to the influencing factors need to be fully considered in future research. It is also important that future niche models are used to comprehensively investigate the impact of human activities on species niches.

Acknowledgments:
The authors would like to thank our editors, as well as the anonymous reviewers for their valuable comments, and also Taylor & Francis (www.taylorandfrancis.com) for linguistic assistance during the preparation of this manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.  Table A3. The criteria of landscape spatial pattern for the potential suitable distribution area.

Landscape Metrics Equations Description References
Aggregation Index (AI) AI = g ii max·g ii × 100 1 < AI ≤ 100. The AI is calculated from an adjacency matrix, which shows the frequency with which different pairs of patch types (including like adjacencies between the same patch type) appear side-by-side on the map. The lower the AI value, the more discrete the type (class). [107] Fractal Dimension Index (FRAC) FRAC = 2ln(0.25P i j ) ln a i j 1 ≤ FRAC ≤ 2. The FRAC reflects the shape complexity across a range of spatial scales (patch size). The higher the FRAC value, the more complex the type (class). [108,109] Shannon's Diversity Index (SHDI) SHDI = − m i=1 P i ln P i SHDI ≥ 0. The SHDI, based on information theory, is a popular measure of diversity in community ecology, applied here to landscapes. The SHDI reflects the landscape heterogeneity and is sensitive to the uneven distribution of patch types in a landscape. The higher the SHDI value, the greater the diversity of landscape. [110] Contagion (CONTAG) 1 < CONTAG ≤ 100. The CONTAG expresses the degree of aggregation or extension trend of different patch types in the landscape. A high CONTAG value indicates that a dominant patch type has a good connectivity in the landscape, while low values indicate that there is a dense pattern with multiple elements in the landscape, and the fragmentation degree is high. [111,112] g ii is the number of like adjacencies (joins) between pixels of patch type (class) i based on the single-count method; max.g ii is the maximum number of like adjacencies (joins) between pixels of patch type (class) i based on the single-count method; P ij is the perimeter (m) of patch ij; a ij is the area (m 2 ) of patch ij; P i is proportion of landscape occupied by patch type (class) i; g ik is the number of adjacencies (joins) between pixels of patch types (classes) i and k based on the double-count method; and m is the number of patch types (classes) present in the landscape, including the landscape border, if present.