Optimization of the Fuzzy Matter Element Method for Predicting Species Suitability Distribution Based on Environmental Data

Over the years, with the efforts of many researchers, the field of species distribution model (SDM) has been well explored. The model of fuzzy matter elements (FME), which, combined with GIS to predict species distribution, has received extensive attention since its emergence. Based on previous studies, this paper improved FME, extended the scope of the membership degree and habitat suitability index, and explored the unsuitable areas of species. We have enhanced the limitation effect of key variables on species habitats, making the operation of FME more consistent with biological laws. By optimizing the FME, it could avoid the accumulation of predicted errors with multi-variables, and make the predicted results more reasonable. In this study, Gynostemma pentaphyllum (Thunb.) Makino was used as an example. The experimental process used several major environmental variables (climate, soil, and terrain variables) to predict the habitat suitability distribution of G. pentaphyllum in China for its current and future period, which includes the period of 2050s (average for 2041–2060) and 2070s (average for 2061–2080) under representative concentration pathways 4.5 (RCP4.5). The results of the analysis showed that the model performed well with a high accuracy by reducing the redundancy of the environmental data. The study could relieve the reliance on a large database of environmental information and propose a new approach for protecting the G. pentaphyllum in unsuitable areas under climate change.


Introduction
Currently, there are many researchers studying species distribution models (SDMs) in terms of the theory [1][2][3], model [4][5][6][7], and model evaluation method [8,9] in academic circles.For the detection of marginal peripheral populations, provenance distribution modelling may represent a valuable step forward in spatial analysis [10].To improve the accuracy of the model and deal with the uncertainty, ensemble models which are based on different consensus approaches could be used to balance the accuracy and enhance the robustness of SDM models [11,12].In particular, under the climate regime shifts, bioclimate envelope models, which are a kind of SDM, were developed for north America to guide assisted migration and successfully predict the transformation of Douglas-fir provenance to Europe [13].Besides, researchers have tried to measure the climatic requirements of species using the data that is beyond their realized niches, according to the advent of large biodiversity databases and some revised SDM analysis approaches [14].Furthermore, to evaluate habitat suitability for plants that have no specific habitat requirement, the fuzzy matter element model (FME) which is established on the basis of Fuzzy and GIS, could provide a quantified and effective analysis.Researchers have applied FME to predict the suitability distribution of Schisandra sphenanthera Rehd.et Wils. in the Qinling Mountains of China [15].
The traditional SDMs, such as the generalized linear model (GLM) [16], generalized additive model (GAM) [17], genetic algorithm for rule-set production (GARP) [18], Maxent [19], and random forest (RF) [20], are based on the location of sample sites of species.However, the FME fits the relationship between environmental variables data of sample sites and species characteristics information, such as the content of marker compounds which are extracted from the species [21].Therefore, the model could determine the appropriate range of various ecological environmental variables (EEVs), which provides a more accurate result for research and has a certain guiding significance for the protection of plant resources.
Over the years, researchers have explored the FME, and applied this method to predict species distribution.From 2012 to the present, our team applied the FME to study the suitability distribution of many medicinal plant species (Table 1).The methodology of using FME is based on the following process: (1) According to previous research and the specific conditions of the plant growth environment, the EEVs are selected to construct its habitat suitability index (HSI) system; (2) EEVs, together with the characteristic values (such as the content of marker compounds of medicinal plant species), could be used to construct the membership function, which also represents the membership degree of EEVs on the habitat suitability of species; (3) the weights of EEVs for habitat suitability are calculated by using the maximum entropy theory; and (4) the spatial distribution of habitat suitability range for various EEVs are obtained by GIS layer calculations.Thus, the model adds the membership degrees of all variables and combines them with the weight, and finally obtains the average value of the membership degree.Based on a large number of actual operations of FME, we found that FME needs to be optimized and improved in the following aspects: (1) The Gaussian membership function often used by researchers might not reasonably describe the relationship between characteristic values of species and EEVs of sample sites in the low membership range.For the Gaussian membership function, since the range of the single-variable's membership value is 0 to 1, it is possible that the HSI may present the result of the accumulated comprehensive error.It is also possible that the suitable habitat which is predicted from the model is not actually appropriated for the survival of species because of the accumulated comprehensive error; (2) any EEVs we choose for modelling have two effects on the survival of species, promotion and inhibition.For inhibition, the previous FME might not be adequately considered.Therefore, a better way is needed to describe the inhibitive effects of EEVs on the survival of species; (3) for classification of the suitability index, previous research on SDMs emphasizes the probabilistic interpretation from 0 to 1, and almost all models have limited the suitability index interval to [0, 1] [30,31].However, such a classification is not very conducive to unsuitable area analysis and research; (4) we also find that the FME has the disadvantages of containing too much EEV data, and redundancy, which leads to a huge workload for researchers.
To cope with the above issues, this research tries to provide the following optimization.Firstly, we propose a quadratic function instead of a Gaussian function to fit the relationship between species characteristic values and the EEVs, because the quadratic function could describe the symmetry, as well as the Gaussian function.Secondly, extending the numerical range of membership from [0, 1] to [−1, 1], and a new criterion of habitat suitability, are also proposed.Thirdly, EEVs are filtered and key EEVs are retained to participate in modeling.Lastly, we take bio-climatic variables, topographic variables, and soil variables as EEVs in FME model predictions of future scenarios.
This research aims to optimize the key steps of the FME method; improve the fuzzy membership function, making it more conform to the biological and ecological characteristics; and avoid the accumulation of errors due to function fitting.We used Gynostemma pentaphyllum (Thunb.)Makino as an example, which was studied by FME before [28].G. pentaphyllum is a herbaceous climbing plant [32].The gypenosides extracted from the G. pentaphyllum exhibit anti-tumor and anti-aging properties, lower cholesterol, and enhance immunity [33,34].We selected key EEVs to be incorporated in the modeling, which could change the previous demand for a large number of environmental variables and reduced the heavy workload.After screening, we tried to use six kinds of EEVs, and gypenosides data to show the optimized FME.BCC-CSM1-1 (BC) is an atmospheric-sea-land and sea-ice global coupling model of climate.This system is also used by China's national climate center for the historical test and interdecadal test, which makes it more reliable for participation in the coupled model of the CMIP5 [35].Therefore, it is reasonable to select the data under BC for future forecasting in China.As for representative concentration pathways (RCPs), RCP4.5 and RCP6.0 are both intermediate stable paths, but RCP4.5 has a higher priority than RCP6.0 [36].Hence, in this research, we chose RCP4.5 as a representative concentration pathway.Finally, we used bio-climatic data for the 2050s (average for 2041-2060) and 2070s (average for 2061-2080) of RCP4.5 scenarios to show the advantages of the FME in predicting the future climate suitability of species.

Study Area
The research area of this article was located throughout China.Through scientific and standardized processing, we obtained the gypenosides (marker compounds) content (%) of G. pentaphyllum.This article selected 39 sample sites (Figure 1) with their gypenosides content data to participate in the model construction.
pentaphyllum.This article selected 39 sample sites (Figure 1) with their gypenosides content data to participate in the model construction.

Collection of Environmental Variables and Future Scenarios
We chose EEVs, including 19 bio-climatic variables (Bio1-Bio19) (www.worldclim.org/current),eight soil variables (Toc, Tph, Soc, Sph, TTEX, Clay, Sand, and Silt) (HWSD, http://www.fao.org/soilsportal/en),three terrain variables (Aspect, Slope, and EL) (http://www.resdc.cn),and one light variable (Totalrad13, which means annual total solar radiation) (CERN, http://www.cern.ac.cn).We used three steps to filter the EEV data.Firstly, we screened the functions between marker compounds and environmental variables using R-squared, the goodness of fit.The lowest goodness of fit previously documented was 0.64 [28].In this research, we have improved the selection criteria and retained EEVs which fitted with an R-squared greater than 0.7.Secondly, in the process of correlation analysis and screening, we conducted a correlation analysis of all environmental variables, and classified those variables whose coefficient of correlation was greater than 0.8 into one category, because if the coefficient of correlation was equal to or greater than 0.8, we should be cautious of a serious multicollinearity problem [37].By using this method, the aim was to effectively prevent the model from overfitting [38].Finally, according to the growth habit of G. pentaphyllum and combining previous research on the habitat of G. pentaphyllum, we retained six EEVs (four bio-climatic variables, one topographic variable, and one topographic variable) to be used in the modeling (Table 2).

Collection of Environmental Variables and Future Scenarios
We chose EEVs, including 19 bio-climatic variables (Bio1-Bio19) (www.worldclim.org/current),eight soil variables (Toc, Tph, Soc, Sph, TTEX, Clay, Sand, and Silt) (HWSD, http://www.fao.org/soils-portal/en), three terrain variables (Aspect, Slope, and EL) (http://www.resdc.cn),and one light variable (Totalrad13, which means annual total solar radiation) (CERN, http://www.cern.ac.cn).We used three steps to filter the EEV data.Firstly, we screened the functions between marker compounds and environmental variables using R-squared, the goodness of fit.The lowest goodness of fit previously documented was 0.64 [28].In this research, we have improved the selection criteria and retained EEVs which fitted with an R-squared greater than 0.7.Secondly, in the process of correlation analysis and screening, we conducted a correlation analysis of all environmental variables, and classified those variables whose coefficient of correlation was greater than 0.8 into one category, because if the coefficient of correlation was equal to or greater than 0.8, we should be cautious of a serious multicollinearity problem [37].By using this method, the aim was to effectively prevent the model from overfitting [38].Finally, according to the growth habit of G. pentaphyllum and combining previous research on the habitat of G. pentaphyllum, we retained six EEVs (four bio-climatic variables, one topographic variable, and one topographic variable) to be used in the modeling (Table 2).For future climate scenarios, we used the same bio-climatic variables to provide insights into the future.We downloaded bio-climatic variables data for RCP4.5 of BCC-CSM1-1 (BC) in both 2050s and 2070s periods from WorldClim (http://www.worldclim.org/cmip5_30s).Considering the stability of EL and TTEX in a very short historical period, we directly involved these two current variables in the simulation of future scenarios.

Fuzzy Matter Element Model
The concept of habitat suitability evaluation has ambiguity and there are many evaluation indicators [15].We could construct FME based on matter element analysis.The matter name M and its variable C, in addition to the magnitude V, constitute a matter element R = (M, C, V), and the magnitude V in the matter element model has fuzziness, which represents the value of the variables.A composite fuzzy matter element R nm is composed of m things and n kinds of variables (Equation ( 1)).
where µ ij (i = 1, 2, . . ., n; j = 1, 2, . . ., m) is the fuzzy value corresponding to the i-th characteristic of the j-th object.In the fuzzy matter element model of G. pentaphyllum, M stands for different sample sites and C is the selected ecological environmental variable (EEV).

Fitting Function Optimization and Membership Degree Extension
Based on the selection criteria of the standardization method and traditional membership functions, we re-fitted the relationship between the six selected EEVs and the gypenosides content values.The five fitting functions are shown in Table 3, and Table 4 shows the fitting degree of TTEX.We dealt with the fitting function and converted the value of the function into the membership degree.The range of membership degree is [−1, 1] [39].In the process of converting the function obtained by the EEV data and species characteristic values into the membership degree, we chose [−2|Max|, 0) to deal with the negative value of the function as the research interval, in which |Max| represents the absolute value of the function's maximum value (Figure 2A).The membership degree was converted into [−1, 0) (Figure 2C) to study the changing process of the environmental variables' inhibition of species habitat.The main reason for using this range is to narrow down the error.When experimenting in the negative interval, if we choose [−|Max|, 0) as the study interval (Figure 2B), it would narrow the research scope of the whole independent variable and would not match the actual situation.But, if we use [−2|Max|, 0) for the study interval, the range of values is more reasonable, which is consistent with the actual situation.As a result, the error of model validation is smaller.As for the number of variables beyond the scope of study, we default its membership to a minimum of −1, indicating complete inhibition.2A).The membership degree was converted into [−1, 0) (Figure 2C) to study the changing process of the environmental variables' inhibition of species habitat.The main reason for using this range is to narrow down the error.When experimenting in the negative interval, if we choose [−│Max│, 0) as the study interval (Figure 2B), it would narrow the research scope of the whole independent variable and would not match the actual situation.But, if we use [−2│Max│, 0) for the study interval, the range of values is more reasonable, which is consistent with the actual situation.As a result, the error of model validation is smaller.As for the number of variables beyond the scope of study, we default its membership to a minimum of −1, indicating complete inhibition.Relationship between Bio1 and membership degree which was based on D. For A and D, we used the data of gypenosides value as the characteristics of G. pentaphyllum, and it was also the standard for evaluating the quality of G. pentaphyllum.The content of gypenosides reflects the adaptation of the G. pentaphyllum to the surrounding habitat, and the characteristics value of the positive and negative reflects the promotion and inhibition effect of the environment on the growth of G. pentaphyllum, respectively.For B, Cm and E, they reflect the influence level of the single variable Bio1 on the content of gypenosides, namely the degree of membership of Bio1 to the gypenosides.
Based on repeated experiments and experiences, our process of the membership degree is based on the following rules: (1) For function values greater than or equal to 0, we divide the function value by the maximum value of the function to obtain the membership degree [0, 1], as the red curve shows in Figure 2C; (2) For the negative function value, when the absolute value of the function is less than or equal to twice the maximum function value, we divide the negative function value by twice the maximum value of the function to obtain the membership degree [−1, 0), as the yellow curve shows in Figure 2C; (3) For the negative function value, when the absolute value of the function is greater than twice the maximum function value, the degree of membership is −1, as the green curve illustrates in Figure 2C.

Weight Decision and Habitat Suitability Evaluation
The optimized FME uses the information entropy method to determine the objective weights of the evaluation indicators.If the entropy weight of the EEVs's membership degree is smaller, it indicates that the degree of variation of such index values is greater.At the same time, the more Based on repeated experiments and experiences, our process of the membership degree is based on the following rules: (1) For function values greater than or equal to 0, we divide the function value by the maximum value of the function to obtain the membership degree [0, 1], as the red curve shows in Figure 2C; (2) For the negative function value, when the absolute value of the function is less than or equal to twice the maximum function value, we divide the negative function value by twice the maximum value of the function to obtain the membership degree [−1, 0), as the yellow curve shows in Figure 2C; (3) For the negative function value, when the absolute value of the function is greater than twice the maximum function value, the degree of membership is −1, as the green curve illustrates in Figure 2C.

Weight Decision and Habitat Suitability Evaluation
The optimized FME uses the information entropy method to determine the objective weights of the evaluation indicators.If the entropy weight of the EEVs's membership degree is smaller, it indicates that the degree of variation of such index values is greater.At the same time, the more information the EEVs provide, the greater the weight of such environmental variables on the habitat of the species.
In the calculation of the habitat suitability index (HSI), Equation ( 2) is employed.
where R k denotes an HSI composite matter consisting of m things, and R w is an information entropy weight vector of the indicator.R ξ is the compound fuzzy matter element of the correlation coefficient.In this study, the ArcGIS 10.2 (ESRI, Redlands, CA, USA) spatial analysis module was used to calculate the grid data of six EEVs, where K j is the total score of the j-th grid habitat suitability (Equation (3)).
where w i is the weight of the i-th index, and ξ ij is the i-th index membership degree of the j-th grid.The value range of K j [−1, 1] indicates that the larger the value is, the more suitable the grid cell is, for the growth of G. pentaphyllum.
Finally, the HSI of the 1 km × 1 km grid cell of G. pentaphyllum and the spatial distribution of potential habitat suitability of G. pentaphyllum were obtained.According to the previous researcher's definition of the HSI of the FME [28] and automatic classification in ArcGIS software, we define the classification of the HSI as Table 5.

Model Validation
After obtaining all degrees of membership between EEVs and characteristic values of species, all membership degrees are calculated to obtain the comprehensive impact on species habitats, and that is the effect of all EEVs on the habitat suitability of species.For FME, when calculating the average degree of membership (HSI), an additive operation was used.If all membership degrees are positive values, it is very easy to cause an accumulation of errors.Based on optimized FME, it could make the final average degree of membership of the model more consistent with the actual state of our species.In the function fitting process of the model, the value of the average fitting degree of all the functions was 83.6%, which varied from 72% to 89% (Table 3).Through the entropy weight method, we obtained the weight of the influence of the current EEVs on the habitat suitability of G. pentaphyllum, as shown in Table 6.The basis of FME modeling is the relationship between the marker component and the environmental data.Thus, these relationships are used in predicting the suitability of the species and could be applied to all study areas, since there is a corresponding functional relationship between the suitability of a sample site location and the marker component content of species at that location.Specifically, the higher the content of the marker component, the higher the suitability of the species.Therefore, we intend to find out this relationship, and use it to convert the marker component content of the species into an actual suitability index.Following this, we aim to compare the actual suitability index and the predicted suitability index.We used the current EEV data to test the optimized FME.The test is divided into four parts.Firstly, run the model to get the predicted HSI of 39 sampling sites in the current environment.Secondly, fit the relationship between the content (gypenosides) data and the predicted HSI of the sampling sites.Thirdly, convert the marker component content data into the actual HSI of the sampling sites by the function relationship.Finally, compare the matching degree between the actual and predicted HSI based on habitat suitability level classification (Table 5).

Results
The test results show that the matching degree between the estimated result and the actual data is as high as 87.18%, indicating that the model operation is very reasonable [40].After testing, we confirmed the reliability of the relationship between current EEV data and the suitability of G. pentaphyllum.Therefore, with this relationship, we predicted the probably extent of G. pentaphyllum habitat suitability in the 2050s and 2070s periods (Figure 3).Under the current EEVs, we predicted six different habitat suitabilities of G. pentaphyllum in China.The growth suitability areas of G. pentaphyllum are mainly located in the hilly and mountain terrain areas in the monsoon region of eastern China (Figure 3A).The highly suitable habitat is mainly distributed in the provinces of Yunnan, Guizhou, Sichuan, Shaanxi, Henan, Hunan, Guangdong, Jiangxi, Fujian, Zhejiang, and Guangxi Zhuang Autonomous Region.In addition, a small number of highly suitable regions are located in Shandong, Hubei, Hebei, and Hainan.In regards to climate type, highly suitable areas are mainly affected by the subtropical monsoon and monsoon humid climate.The moderately suitable habitat is mainly distributed in Yunnan, Hubei, Hunan, Anhui, and Jiangsu provinces.The marginally suitable habitat is mainly distributed in Shandong, Hebei, and southern Tibet.
For the unsuitable habitats of G. pentaphyllum, they are mainly located in the plateau of western and northern China, and a small part of the area is located in the plains of eastern China.The highly unsuitable habitat of G. pentaphyllum is mainly located in the western and northern parts of the Qinghai-Tibet Plateau, as well as in the desert areas in Gansu province, Inner Mongolia Autonomous Region, and Xinjiang Uighur Autonomous Region.The moderately unsuitable areas of G. pentaphyllum are mainly located in the southern and eastern parts of the Qinghai-Tibet Plateau, Inner Mongolia Autonomous Region, Ningxia Hui Autonomous Region, Xinjiang Uygur Autonomous Region, and Gansu Province, with a small portion located in the western part of the Tibet Autonomous Region.The marginally unsuitable regions of G. pentaphyllum are mainly located in parts of the Xinjiang Uygur Autonomous Region and northeast of China.A few marginally unsuitable regions are located in Gansu, Shaanxi, Shanxi, and Hebei provinces, as well as the areas where Sichuan and Yunnan provinces border the Qinghai-Tibet Plateau and the coastal areas of Guangdong province.
Under the selected climate change scenarios, the suitable region of G. pentaphyllum will undergo some changes in the 2050s and 2070s periods (Table 7), and the overall habitat suitability of G. pentaphyllum will move to the north (Figure 3B,C).In general, the habitat suitability of G. pentaphyllum in China will gradually shift to high latitudes during 2050s and 2070s periods, mainly occurring in the five regions (Figure 3B).In the 2050s period, the marginally suitable areas for the growth of G. pentaphyllum will first appear in the northern and southern regions of the Tianshan Mountains in Xinjiang Uighur Autonomous Region (I), and these regions will show signs of expansion during the 2070s period (Figure 3C).For the border between the North China Plain and the Northeast Plain (II), from the current to the 2050s, some areas will change from marginally unsuitable to marginally suitable habitats, and some areas will change from marginally suitable to moderately suitable habitats.Furthermore, in the 2070s, the marginally and moderately suitable areas will display the trend of expanding to the northeast.For the region III where the Loess Plateau and the Inner Mongolian Plateau border, from the current to the 2050s and 2070s, the moderately unsuitable habitat of G. pentaphyllum will gradually turn into marginally unsuitable habitat, and the marginally unsuitable areas will expand northwards.For region IV, in Yunnan province, the current highly suitable areas will gradually become moderately suitable areas.In the foothills of the southern Himalayas, the current marginally suitable areas will gradually become marginally unsuitable areas.In the coastal areas of Guangdong, Fujian, and parts of Hainan Island (V), there are clear signs of a decline in habitat suitability.

Optimization of Fitting Functions
The FME, which relied on the relationship between environmental variables and the characteristic values of species to fit the function, has been used in predicting species suitability distribution for several years (Table 1).In the past, researchers might have ensured that the value of the fitting function (Figure 2D) was greater than 0. By normalizing the fit function, researchers could get the membership function, and the value was from 0 to 1.This method conforms to the range of values of the membership functions of Zadeh [41].However, in practice, we should consider the potential restrictive impact of a single environmental variable on the habitat of a particular species.Because many EEVs have symmetry effects on the habitat of the species (Table 1), almost all researchers are accustomed to Gaussian functions to fit the relationship between the environmental variables and the characteristic values of species.However, we note that Gaussian functions are suitable for fitting a moderate or large membership degree of values, but are not very suitable for low membership relationships (Figure 2E) since they may easily cause the accumulation of membership errors and affect the final simulation results.
In our case, specifically, G. pentaphyllum would suffer from freezing injury at a low temperature in three to five consecutive days, and through a continuous 40-day test, it was found that the biomass of G. pentaphyllum was extremely small at a temperature of 10 • C [42].This experiment shows that G. pentaphyllum is difficult to grow in an environment with an average temperature of 10 • C. A researcher has previously pointed out that G. pentaphyllum begins to grow when the temperature reaches 10 • C in March, and stops growing when the temperature drops to 4 • C [43].We can conclude that the growth of G. pentaphyllum will be inhibited in an environment where Bio1 is below 10 • C, and the synthesis of the gypenosides is also affected by the inhibition of temperature.The lower the temperature, the stronger the inhibition, until G. pentaphyllum ceases to grow.From the membership degree (Figure 2C,E) obtained by the quadratic function (Figure 2A) and the Gaussian function (Figure 2D), their results are consistent in the range of 12 to 22 • C.However, when the Bio1 is 10 • C, the two ways show a difference, and their membership degrees (Figure 2C,E) are −0.07 and 0.27, respectively.Additionally, the membership degree (Figure 2C) obtained by the quadratic function shows the inhibitory effect of temperature on G. pentaphyllum correctly and intuitively.The membership function curve (Figure 2E) obtained by Gaussian function fitting does not perform well in the low membership range.
From the perspective of biology, the optimized model is more in line with the actual situation.Therefore, the optimization measures we have adopted are very reasonable.

Membership Extension and Classification of HSI
In the practice of applying fuzzy sets, the extension of the membership function is quite important [44][45][46].When it comes to practical research topics, the range of values of fuzzy membership could be extended.In this research, because the influence of any EEVs on the habitat of a species is two-sided, we extended the membership function to negative values, to indicate the adverse effects of EEVs on the habitat of the species, instead of defining the unsuitable habitats simply by dividing them by the index as in the past [28].We narrowed down the function value to [−1, 0) and [0, 1], setting different methods of membership induction according to the actual situation (Figure 2C).
In the classification of HSI, for non-negative HSI, this article considered previous research experience [28], in which K i represents HSI, and unsuitable habitats (K i < 0.3), suitable habitats (0.3 ≤ K i < 0.6), highly suitable habitats (K i ≥ 0.6).For negative HSI, we refer to the automatic classification in ArcGIS software, and classify the HSI with [−1, 0).Finally, the habitat suitability index is divided into six categories (Table 5), and the division of unsuitable areas can be a detailed description of the adaptation of G. pentaphyllum in harsh climate regions.This classification is meticulous and efficient.
The changes in the unsuitable area of the G. pentaphyllum habitat can be clearly observed (Figure 3).Under climate change, large-scale moderately unsuitable transitions to marginally unsuitable habitats were found in region (III).We could also see that there is no suitable area for G. pentaphyllum in the current environment in the northern and southern regions of the Tianshan Mountains (I) (Figure 3A).However, it firstly appeared marginally suitable areas in the 2050s (Figure 3B), and the area will expand in the 2070s (Figure 3C).At the same time, the phenomenon of transition from an unsuitable habitat to a suitable habitat has also been reflected in region (II) (Figure 3).This is the first time that the inhospitable area of the habitat has been categorized for long-term planning in the areas where the studied species are marginally unsuitable, moderately unsuitable, and highly unsuitable under the context of climate change.

Sample Size
Almost all SDMs need to take into account the sample size when predicting species distribution.For GLM and GAM, considering environmental conditions (known as background or pseudo-absence data), researchers recommend the use of a large number (e.g., 10,000) of pseudo-absences with equal weighting for presences and absences when using regression techniques [47].With presence-only records, by comparing them with the sample size of 1000, when the training data is increased to 50, the prediction accuracy of Maxent is stable [48,49].A certain number of presence points are fundamental to derive ecological niches.However, for models that use sample locations as the basis for computation, the real absence data are also important, because a much higher degree of uncertainty lies in the absence of a given species at a particular location, whether this is due to ecological or human-related reasons [11].FME mainly studies medicinal plants based on presence-only data.As a medicinal plant, G. pentaphyllum is widely distributed in China.However, because of unreasonable development, over-harvesting, and habitat destruction, the number of wild G. pentaphyllum specimens is being rapidly reduced, threatening the relevant resources [50].We collected 39 samples available in nine provinces.Through experiments, we found that the number of samples collected can support our research to proceed normally.At the same time, for the sample size of FME, we believe that it can be further studied to improve the FME model in the future.
As for SDMs, there were three general categories of methods, including mechanistic models, correlational models, and process-based simulations [51].FME takes into account the mechanism of biology and the correlation of environmental conditions at the location of occurrence of the species.However, compared with the classic SDM techniques, such as Maxent, GLM, GAM, RF, and consensus models, FME has a narrower range of research objects because of the need for marker components or other characteristic values of the species.

EEVs Screening
Excessive environmental variables involved in modeling can cause data redundancy [29].In the previous study, the number of modeling variables selected by our research team was from 12 to 21 (Table 1).Therefore, we need to screen for EEVs and keep key variables.The optimized FME can use several key variables to make relatively accurate predictions of species suitability, reducing the large number of types of environmental variables required for traditional FME.For plant species, according to current research, the main environmental variables affecting the habitat suitability of species are climate factors [52][53][54].Climatic factors are divided into two categories, precipitation and temperature.There is a strong correlation between most temperature variables or precipitation variables.If too many variables are selected in the same category, the model will be distorted from the data level.Therefore, for temperature, when constructing the model, several temperature variables that mainly affected the habitat of the species were sifted.Moreover, the differences among the variables were taken into account to ensure that the habitat range of the species was limited by fewer temperature variables.In the same way, the precipitation factor also needs to be deleted for reducing data redundancy.In this paper, after screening by the goodness of fit and coefficient of correlation, we built the FME with six EEVs (Table 2).Bio1 and Bio2 were used as temperature factors, Bio12 and Bio18 were used as precipitation factors, and EL and TTEX were used as terrain factors and soil factors.The accuracy of the model was finally verified, with a value of 87.18%.As for previous studies of G. pentaphyllum, Zhao [28] used 19 EEVs to build FME under the current period, and the RMSE of the model was 9.75%.

Model Accuracy for Predicting Future Suitability
In terms of the study of future distribution changes of species, many SDMs predict the future using the current location of species sample sites with future climate data [55][56][57].The FME uses the specific matching relationship between species characteristics and EEVs, and the inferred parameters could be used to project the new areas in the current period or under climate scenarios in the future.Hence, FME has unique advantages over many models that predict the future distribution of species [29].The incorporation of topographic factors and soil factors (Table 2) has improved the accuracy of the model in predicting the habitat of G. pentaphyllum in the future scenarios.

Suitability Change of G. pentaphyllum in Future
We have obtained maps and the percentage of areas of habitat suitability with G. pentaphyllum in China for three different periods (Figure 3, Table 7).The areas with highly suitable, moderately unsuitable, and highly unsuitable habitats are gradually decreasing, while the moderately suitable, marginally suitable, and marginally unsuitable habitat areas are gradually increasing (Table 7).In the process of change, the maximum increase in moderately suitable habitat during the change is 4.05%.The moderately unsuitable habitat has the largest decline, which is −5.40%.
The most important influence on the habitat of G. pentaphyllum is the bio-climatic factors (Table 6).In the future climate change process, climate warming is a universally recognized trend [58].Therefore, the habitat suitability of G. pentaphyllum generally migrates from south to north.Because China's territory and mountains are criss-crossing, and will change the local climate conditions in a small area [59], there will be inconsistent migration dynamics during the overall northward migration of habitats of G. pentaphyllum.For the five regions (I-V) that are most sensitive to the distribution changes of G. pentaphyllum (Figure 3B), temperature is the main influencing factor.Among all the EEVs, the weight of temperature factors accounts for up to 49.97% (Table 6), followed by precipitation (24.55%).At the same time, the role of elevation (21.63%) is also very obvious.In a previous study [28], the weights of temperature and precipitation factors were 48.24%, and 16.05%, respectively.Hence, regardless of the main EEVs or the weights affecting the habitat suitability of G. pentaphyllum, this paper is consistent with the previous results [28].From Figures 1 and 3, we can see that the areas where the habitat suitability changes are mostly distributed along the boundary of mountains.Both the future climate change and small regional terrain created the migration of G. pentaphyllum in China during the two periods of the 2050s and 2070s.

Conclusions
Over a long period of application, we have conducted in-depth thinking on the theoretical level from the previously constructed FME.In this article, we tried to improve the FME.We screened EEVs by the goodness of fit and coefficient of correlation to keep key variables, optimized the function fitting process between EEVs and characteristic values of species by the quadratic function, and extended the range of the membership degree and HSI to [−1, 1].Taking G. pentaphyllum as an example, we predicted its habitat suitability distribution in three different periods.At the same time, innovation of species' unsuitable classifications was introduced to show the dynamic changes of unsuitable areas under the climate change.Besides, the impact of climate change on species was meticulously demonstrated.It provides a new idea for the protection of unsuitable areas of G. pentaphyllum during climate change.

EL 0. 83 Table 4 .
The fitting degree of TTEX.dealt with the fitting function and converted the value of the function into the membership degree.The range of membership degree is [−1, 1] [39].In the process of converting the function obtained by the EEV data and species characteristic values into the membership degree, we chose [−2 │Max│, 0) to deal with the negative value of the function as the research interval, in which │Max │ represents the absolute value of the function's maximum value (Figure

Figure 2 .
Figure 2. Fitting relationship of Bio1 with the characteristic value (gypenosides content (%)) of G. pentaphyllum and membership degree.(A) The quadratic function, Bio1, and characteristic value fitted; (B) Relationship between Bio1 and membership degree with the negative value, for which we chose [−│Max│, 0) as the research interval of the function A; (C) Relationship between Bio1 and membership degree with the negative value, for which we chose [−2│Max│, 0) as the research interval of the function A; (D) The Gaussian function, Bio1, and characteristic value fitted; (E)Relationship between Bio1 and membership degree which was based on D. For A and D, we used the data of gypenosides value as the characteristics of G. pentaphyllum, and it was also the standard for evaluating the quality of G. pentaphyllum.The content of gypenosides reflects the adaptation of the G. pentaphyllum to the surrounding habitat, and the characteristics value of the positive and negative reflects the promotion and inhibition effect of the environment on the growth of G. pentaphyllum, respectively.For B, Cm and E, they reflect the influence level of the single variable Bio1 on the content of gypenosides, namely the degree of membership of Bio1 to the gypenosides.

Figure 2 .
Figure 2. Fitting relationship of Bio1 with the characteristic value (gypenosides content (%)) of G. pentaphyllum and membership degree.(A) The quadratic function, Bio1, and characteristic value fitted; (B) Relationship between Bio1 and membership degree with the negative value, for which we chose [−|Max|, 0) as the research interval of the function A; (C) Relationship between Bio1 and membership degree with the negative value, for which we chose [−2|Max|, 0) as the research interval of the function A; (D) The Gaussian function, Bio1, and characteristic value fitted; (E) Relationship betweenBio1 and membership degree which was based on D. For A and D, we used the data of gypenosides value as the characteristics of G. pentaphyllum, and it was also the standard for evaluating the quality of G. pentaphyllum.The content of gypenosides reflects the adaptation of the G. pentaphyllum to the surrounding habitat, and the characteristics value of the positive and negative reflects the promotion and inhibition effect of the environment on the growth of G. pentaphyllum, respectively.For B, Cm and E, they reflect the influence level of the single variable Bio1 on the content of gypenosides, namely the degree of membership of Bio1 to the gypenosides.

Figure 3 .
Figure 3.The distribution of suitability of G. pentaphyllum in the current, 2050s, and 2070s periods.(A) The current period; (B) the 2050s of RCP4.5 in BC; (C) the 2070s of RCP4.5 in BC.There are five ellipses in Figure 3B, which represent the five regions where the suitability of habitats changes significantly.(I) Represents the northwestern part of the Xinjiang Uighur Autonomous Region; (II) represents the bordering area between the North China Plain and the Northeast Plain; (III) represents the bordering area between the Loess Plateau and Inner Mongolia Plateau; (IV) represents the southern Himalayas and Yunnan province; (V) represents the coastal areas of Guangdong and Fujian, and Hainan island.

Table 1 .
Research on FME by our team.

Table 2 .
Explanatory six ecological environmental variables selected for FME.

Table 3 .
Fitting function of evaluation variables and fitting degree R 2 .

Table 4 .
The fitting degree of TTEX.

Table 5 .
The habitat suitability classification of the habitat suitability index.

Table 6 .
The optimal value, threshold value, and weights of each EEV value.

Table 7 .
The percentage of areas of habitat suitability distribution of G. pentaphyllum in three periods of China.