Predicting Potential Distribution and Evaluating Suitable Soil Condition of Oil Tea Camellia in China

Oil tea Camellia, as a major cash and oil crop, has a high status in the forestry cultivation systems in China. To meet the current market demand for oil tea Camellia, its potential distribution and suitable soil condition was researched, to instruct its cultivation and popularization. The potential distribution of oil tea Camellia in China was predicted by the maximum entropy model, using global environmental and soil databases. Then, we collected 10-year literature data about oil tea Camellia soil and applied multiple imputation and factor modeling for an in-depth analysis of soil suitability for growing of oil tea Camellia. The prediction indicated that oil tea Camellia was mainly distributed in Hunan, Jiangxi, Zhejiang, Hainan, East Hubei, Southwest Anhui and most of Guangdong. Climatic factors were more influential than soil factors. The minimum temperature of the coldest month, mean temperature of the coldest quarter and annual precipitation were the most significant contributors to the habitat suitability distribution. In the cultivated area of oil tea Camellia, soil fertility was poor, organic matter was the most significant factor for the soil conditions. Based on climatic and soil factor analyses, our data suggest there is a great potential to spread the oil tea Camellia cultivation industry.


Introduction
Tea oil, also known as "eastern olive oil" refers to the oil extracted from the seeds of oil tea Camellia which contain 20%-70% oil, with high oleic acid content [1].It has been used as a fine cooking oil in China for hundreds of years [2].Besides oleic acid, the oil is rich in palmitic, stearic, linoleic, and linolenic acids, and its consumption significantly contributes to reducing the risk of cardiovascular disease, enhancing immunity, lowering cholesterol levels, and preventing and treating hypertension [2][3][4].Oil tea Camellia is considered as one of the four major woody oil crops in the world, together with oil palm, olive, and coconut [5].Camellia oleifera Abel. is the most widely distributed and the main source of commercial production of the oil, but C. reticulate Lindl., C. chekangoleosa Hu, C. yuhsiensis Hu, C. meiocarpa Hu and C. vietnamensis T. C. Huang ex Hu are also well dispersed.According to previous studies, all parts of oil tea Camellia are valuable, and its byproducts also have great economic potential in agriculture, industry, and medicine [6][7][8].Besides its economic value, oil tea Camellia has also demonstrated a role in water and soil conservation, biological fireproofing, and the improvement of ecological environments [9].At present, oil tea Camellia is widely distributed in southern China, and the forestland covers 18 provinces/municipalities/autonomous regions [10].The cultivated area of oil tea Camellia in China is about 3.33 million ha [11].However, there is a common problem of inferior quality and low yield in oil tea Camellia forestland, which is caused by improper planting sites and a lack of scientific guidance [12].A recent change in the Chinese government's policy has encouraged farmers to plant more oil seeds instead of corn.Moreover, according to the "National Oilseed Development Plan (2016-2020)" published by China's National Development and Reform Commission (NDRC), in collaboration with China's Ministry of Agriculture (MOA) and the State Forestry Administration, China's total planted area for oil seed production is predicted to increase by 1.4% to 23.3 million hectares (Mha 2 ).The plan also sets a target for total oil seed production that the present output of 56.25 million tons will be increased to 59.8 million tons by 2020 [13].To satisfy the requirement of the industrial development, it is valuable and necessary to understand the potential distribution of oil tea Camellia.
With the development of advanced statistical methods and geo-information science, ecological niche modeling has been a valuable tool to assess the potential geographical distribution of species, which is applied to several fields of research, such as ecology, biogeography, and evolution [14,15].The maximum entropy model (MaxEnt) has many advantages, such as avoiding commission errors, which makes it quite applicable to species distribution modeling [16][17][18].The method makes use of known occurrences, and pseudo-absence data resampled from the set of pixels where the species in question is not known to occur [19].
On the basis of suitable environmental conditions, the suitability of soil is still worth studying.In research of suitable soil conditions for oil tea Camellia, soil quality assessment is widely concerned, as it is the main reference to forecast distribution areas [20].However, the evaluation system established based on soil physical, chemical, and biological indicators are complicated, and there is subjectivity involved in selecting soil quality indicators according to the soil functions of interest [21,22].Moreover, testing a large number of indicators demands considerable labor, financial and material resources, and the results of studies evaluating one soil type are difficult to extrapolate to others, even in adjacent areas [23,24].In this study, the basic nutrient indicators were focused to increase the feasibility of universal use.The aims of this study were to: (i) ascertain the potential distribution of oil tea Camellia in China by the MaxEnt model, using global environmental and soil databases, and (ii) collect the related literature data from the last 10 years to find the basic nutrient suitable conditions for the growth of oil tea Camellia.In the predicted suitable region, the feasibility of the cultivation and popularization can be determined by testing of soil conditions.

Databases
The environmental data layers were obtained from the WorldClim dataset (http://www.worldclim.org) in the grid format (30 arc-seconds).The data cover the years 1960-1990, including 19 yearly environment variables (Table 1).The soil data were obtained from the Harmonized World Soil Database (HWSD), which includes 36 soil-based indicators (Table 2).The Chinese Virtual Herbarium (http://www.cvh.ac.cn/) and related literature were used to extract the locations of oil tea Camellia.Finally, 355 points (of which 270 points were C. oleifera, and the remaining were C. chekangoleosa, C. meiocarpa, C. vietnamensis and C. reticulate) were selected to use as records of species presence.
A database of oil tea Camellia soil conditions was compiled, by surveying the peer-reviewed publications in Web of Science and China National Knowledge Infrastructure Database (CNKI), using subjective words "Camellia oleifera/oil tea" and "soil".All searches were conducted on 31 December 2017 and spanned 10 years (2008-2017).In total, 518 articles in CNKI and 83 in Web of Science were selected.Furthermore, the following criteria were applied to select the papers and data for review: (i) research articles and dissertations only, (ii) the data were not repeated in other articles, and (iii) a detailed description of the experimental place and soil was provided.Finally, 160 groups of effective soil data were obtained and analyzed.

Modeling Procedure
The Maximum Entropy Species Distribution Modeling (MaxEnt, version 3.4.1,Princeton University, Princeton, NJ, USA) was used, which employed a set of four possibilities, including logistic regression, bioclimatic rules, range rules and negated range rules.The algorithm runs either 500 iterations of these processes or until convergence.This model produced prediction values ranging from 0 to 100, representing cumulative probabilities of occurrence.According to the natural discontinuity method, the prediction values above 62 were defined for high suitability and values below 36 were defined for low suitability and unsuitable.Different colors corresponded to different fitting indices in the potential distribution map.
The Jackknife test was used to inform us about the importance of individual variables for MaxEnt predictions.The receiver operating characteristic (ROC) area under the curve (AUC) analysis is widely employed for comparing the performances of MaxEnt species distribution models [16][17][18]25].The ROC plots the sensitivity values and the false-positive fraction for all available probability thresholds.The AUC provides a single measure of model performance, independent of any particular choice of threshold, allowing to evaluate the accuracy of prediction.

Statistical Analysis
Missing data is a common problem in the collection of literature data.In this study, the multiple imputation by chain equation (MICE) method was used to estimate the missing values.Soil indicators of oil tea Camellia were analyzed using principal component analysis, correlation analysis, data redundancy analysis and regression analysis, by SPSS 20.0 (IBM, Chicago, IL, USA) and the R-project (https://www.r-project.org).Both layer superposition and area calculation were achieved by ArcGIS 10.5 (ESRI, Redlands, CA, USA).

Predicted Potential Geographical Distribution of C. Oleifera
Figure 1 indicated that 8.25% of the area was moderately suitable, followed by 7.97% in the low suitability class, and only 4.94% corresponded to high habitat suitability.The land area of China is 9.32641 million square kilometers.Among the most suitable area for oil tea Camellia predicted by MaxEnt, was about 0.461 million square kilometers, which was mainly distributed in Hunan, Jiangxi, Zhejiang, Hainan, East Hubei, Southwest Anhui, and most of Guangdong.Areas, such as Guangxi, Fujian, East Guizhou, Northwest Hunan, and south of Hubei and Anhui were moderate growing areas for oil tea Camellia.In addition, parts of Yunnan and Chongqing, East Sichuan, Central Anhui, and south of Shanxi, Henan and Jiangsu were also potential growing regions.The AUC scores for the training and test data were 0.933 and 0.915, respectively, indicating a high level of accuracy in the model prediction (Figure 2a).The Jackknife test revealed that among the 55 variables used for the model development, 10 climatic factors and 10 soil factors with greater influence on their distribution were selected according to the degree of influence (Figure 2b).Overall, the climatic factors were more influential than the soil factors.The minimum temperature of the coldest month (MTCM), mean temperature of the coldest quarter (MTCQ), and annual precipitation (AP) were the most significant contributors to the habitat suitability distribution.This result showed that low temperature and moisture were the main factors limiting the distribution of oil tea Camellia.Among the soil variables, subsoil (30-100 cm) properties were more important than topsoil (0-30 cm) attributes.Clay cation exchange capacity (s-cec-clay and t-cec-clay), total exchangeable bases (s-teb) and silt fraction (s-silt) of the subsoil were the most significant contributors to the habitat suitability distribution.

Suitable Soil Conditions of Oil Tea Camellia
In the collected literature data, soil organic matter, soil pH, total nitrogen, phosphorous, and potassium, and effective nitrogen, phosphorous, and potassium were soil indicators, which were the most commonly reported.Figure 3 indicated the numerical distribution range of each soil indicator, in which there were many abnormal values in effective phosphorous and total nitrogen.The main distribution ranges were soil organic matter 1.29%-2.75%,soil pH 4.30-5.00,total nitrogen 0.41-1.25 g•kg −1 , total phosphorous 0.11-0.35g•kg −1 , total potassium 12.24-24.82g•kg −1 , effective nitrogen 58.33-124.60 mg•kg −1 , effective phosphorous 2.51-7.56mg•kg −1 , and effective potassium 33.27-68.51mg•kg −1 According to the classification data of China's second soil survey, the soil fertility of oil tea Camellia was poor.In addition to some uncharacteristically high levels of total potassium and effective nitrogen, all other soil attribute values were considered low.Factor analysis was conducted on the soil data collected from the literature.Figure 4 showed that three common factors could largely explain the soil conditions.FA1 (factor 1) was closely related to the organic matter, total nitrogen, total phosphorous, and effective nitrogen.Moreover, there was a strong positive correlation between FA1 and organic matter.Also, the correlation coefficient of FA2 and pH of soil reached up to 0.9.FA3 was only related to effective phosphorous.The factor score was y = 0.436 × FA1 + 0.307 × FA2 + 0.128 × FA3.According to the factor score of the samples, the soil conditions of Hainan were the most suitable for growing oil tea Camellia.Next, a comprehensive evaluation of the soil conditions was undertaken, which identified that organic matter and total nitrogen were the most significant factors for the soil conditions, whereas, total potassium had little effect.

Comparison of the Potential Distribution and Previous Prediction
Cui applied MaxEnt and the genetic algorithm for rule set production (GARP) models to research the potential distribution of wild C. oleifera [10].It was shown that the highly suitable growing regions were around Wuyi Mountain, Nanling Mountain, and Wuling Mountain.In this study, we focused on the cultivated oil tea Camellia and predicted the highly suitable growing region was larger than that previously determined.Meanwhile, this study indicated that climatic factors were more important than soil factors on the distribution of oil tea Camellia, which was in agreement with Zhu and Zhuang [26,27].Furthermore, our study demonstrated that minimum temperature and precipitation were the main restrictive factors on potential distribution.Similarly, Cui suggested that the major environmental factors affecting the distribution of wild C. oleifera were mean monthly diurnal temperature range, and precipitation during the driest and warmest quarters, respectively [10].In addition, Hu and Yin proposed that the minimum temperature of the coldest month has the greatest influence on the geographical distribution of Liaodong oak (Quercus wutaishanica Mayr) and Indian sandalwood (Santalum album L.) [28,29].These conclusions all showed that minimum temperature and precipitation were key restrictive factors for the distribution of various plants.

Difference Between the Potential Distribution and the Actual Planting Area
At present, the cultivated area of oil tea Camellia in China is about 3.33 Mha 2 [11].The main producing areas are Hunan, Jiangxi, and Fujian, which are among the top three provinces.In this study, the high suitability area was predicted to be about 46.07 Mha 2 , of which agricultural acreage was around 15.72 Mha 2 .Therefore, it is conservatively estimated that at least 27.02 Mha 2 of non-forest land is suitable for planting oil tea Camellia, which is 8.11-fold more than the current cultivated area.Adequate cultivated areas are a reliable guarantee for the development of oil tea Camellia cultivation industry.
In this study, Hainan was highly suitable for planting oil tea Camellia in our prediction, yet it was not mentioned in China's government report of "The planning of national C. oleifera industry development (2009-2020) [30].Zheng considered that there were wild and cultivated resources of oil-tea Camellia in Hainan, intensively distributed among 38 townships of nine cities and comprising an approximate area of 1167.3 ha 2 [31].The reason for its small area maybe that the climate of Hainan is suitable for the growth of many kinds of plants, and the species diversity is high, so the area of a certain forest is relatively small.Another explanation is that in Hainan, oil-tea Camellia dispersedly grows in the understory or on the margin of rubber tree (Hevea brasiliensis Müll.Arg.), betel nut (Areca catechu L.), and secondary shrub forests, but few on the sloping land as pure forest.In general, it was consistent with the previous researches, in which the planting area of oil tea Camellia was mainly limited to three boundaries and nine regions [30].However, the difference was that Shandong and Tibet had few places with low suitability.Fang demonstrated that Southeast Tibet had a tropical or subtropical monsoon climate, where the forest coverage rate was 46.1%, and the climate was very suitable for plant growth [32].

Supplement and Evaluation of Missing Data
Missing data is the most common problem in data mining analysis [33].The predictive mean matching method of multiple imputation was used in this study.Figure 5b highlighted that no missing data contributed to 70.46% of all data, and the most missing variable was total potassium (Figure 5a).The amount of information contained in the data essentially met the statistical requirements.A comparison of the distribution of the initial and interpolated data indicated that the trend in the interpolation value well-matched the observed values, namely, the interpolation value approximated the actual value (Figure 6).

Conclusions
In our report, we predicted the potential geographical area of oil tea Camellia in China.Although the predicted region was consistent with the existing main production region, the potential cultivable area was much larger.Minimum temperature and precipitation were the most significant contributors to the habitat suitability distribution.The nutrient content of oil tea Camellia soil was poor, and organic matter and total nitrogen were the most significant soil factors in the cultivable area of oil tea Camellia.Finally, based on our results, our data affirm a great potential to extend the oil tea Camellia cultivation industry in China.

Figure 1 .
Figure 1.The potential geographical distribution of oil tea Camellia in China using the maximum entropy species distribution modeling (MaxEnt).

Figure 2 .
Figure 2. Evaluation of model and the importance of variables.(a) Roc of MaxEnt modelling; (b) Jackknife test for variable importance of oil tea Camellia habitat suitability distribution.

Figure 3 .
Figure 3.The numerical distribution range of each soil indicators.

Figure 4 .
Figure 4. Factor analysis of soil conditions.(a) Eigenvalues of principal factors; (b) Coefficient of factor loading.

Figure 5 .
Figure 5. Description of the missing data.(a) Histogram of missing data; (b) pattern and proportion of missing data.

Figure 6 .
Figure 6.Comparison of interpolated data and missing data.

Table 2 .
Indicators of soil data.