Mapping Global Environmental Suitability for Sorghum bicolor (L.) Moench

: Sorghum bicolor (L.) Moench, called sweet sorghum, is a drought-resistant and heat-tolerant plant used for ethanol bioenergy production, and is able to reduce the competition between growing crops for energy vs. growing crops for food. Quantitatively mapping the marginal lands of sweet sorghum is essential for the development of sorghum-based fuel ethanol production. However, knowledge of the contemporary marginal lands of sweet sorghum remains incomplete, and usually relies on sample data or is evaluated at a national or regional scale based on established rules. In this study, a novel method was demonstrated for mapping the global marginal lands of sweet sorghum based on a machine learning model. The total amount of global marginal lands suitable for sweet sorghum is 4802.21 million hectares. The model was applied to training and validation samples, and achieved high predictive performance, with the area under the receiver operating characteristic (ROC) curve (AUC) values of 0.984 and 0.978, respectively. In addition, the results illustrate that maximum annual temperature contributes more than do other variables to the predicted distribution of sweet sorghum and has a contribution rate of 40.2%.


Introduction
Fossil fuels are non-renewable energy resources that continue to be consumed as a driver of social and economic development. This consumption not only depletes fossil fuel reserves but also leads to the intensification of global climate change [1]. At the 21st Conference of the United Nations Framework Convention on Climate Change in December 2015, the Paris agreement was adopted. Its purpose was to limit global warming to below 2 • C and to "pursue efforts" to hold it to no more than 1.5 • C [2,3]. The main strategies to reduce global greenhouse gas emissions measures include reducing energy demands, improving energy efficiency, and developing renewable energy [4][5][6]. Based on Europe's 2009 Renewable Energy Directive, each member state needs to supply 10% of transport energy from renewable sources-a proportion of which will come from biofuels by 2020 [7]. Therefore, the development of biofuels is imperative.
Several researchers have suggested that bioenergy can play an essential part in reducing long-term carbon dioxide emissions due to the high net energy gain and low production costs of bioenergy [8][9][10]. [8][9][10]. The technology for producing ethanol from grain crops is mature, but it creates conflict between using cropland to produce energy or food supplies [11]. Sweet sorghum, a bioenergy plant used for ethanol production, could reduce this competition because its seeds can be used for both food and fuel [12,13]. The net energy balance of sweet sorghum is high, and its carbon dioxide balance is more favorable than that of maize, sugar beets, and sugarcane [14][15][16].
Quantitatively mapping the marginal lands of energy plants is key to developing plans for ethanol fuel production [17]. Hao et al. evaluated the marginal lands in China from the standpoint of water resources, which showed that the marginal land resources suitable for sweet sorghum are much larger than those suitable for cassava [18]. Fu et al. analyzed the marginal land resources suitable for developing bioenergy in Asia, which illustrated that China has more development potential in the bioenergy sector than other countries [19]. Tuck et al. mapped the distribution of marginal lands of sweet sorghum in Europe, and revealed that it could potentially be grown in up to 25% of Southern Europe [20]. These studies were carried out on national or regional scales based on a multiple-criteria evaluation method, while knowledge of the global marginal lands of sweet sorghum remains incomplete.
In the multiple-criteria evaluation process, the construction of the rule set was mainly based on the prior knowledge of the experts. However, the multi-factor impact mechanism was so complicated that the expert's prior knowledge was sometimes not applicable. In this study, a novel method was demonstrated to map the marginal lands of sweet sorghum at the global scale using a machine learning model. The machine learning model is a data-driven method capable of identifying the complex relationships between the related environmental factors and occurrence records of sweet sorghum. Based on the investigated patterns, our model predicted the global potential marginal lands of sweet sorghum at a high spatial resolution for the first time. The technical flowchart is shown in Figure 1.

Materials and Methods
The boosted regression trees (BRTs) model adopted in this study is a kind of machine-learning algorithm capable of analyzing complex non-linear relationships such as the probability of the occurrence of a given species and multiple environmental variables [21,22]. To make an accurate map of the marginal lands of sweet sorghum, the BRT model required three types of data sets: (a) a set of high-spatial-resolution global environmental variables affecting the growth of bioenergy plants such as sweet sorghum; (b) a global georeferenced dataset for known sweet sorghum areas; and (c) a set of absence points that define unsuitable environment conditions for sweet sorghum. All data used in

Materials and Methods
The boosted regression trees (BRTs) model adopted in this study is a kind of machine-learning algorithm capable of analyzing complex non-linear relationships such as the probability of the occurrence of a given species and multiple environmental variables [21,22]. To make an accurate map of the marginal lands of sweet sorghum, the BRT model required three types of data sets: (a) a set of high-spatial-resolution global environmental variables affecting the growth of bioenergy plants such as sweet sorghum; (b) a global georeferenced dataset for known sweet sorghum areas; and (c) a set of absence points that define unsuitable environment conditions for sweet sorghum. All data used in this research are based on the WGS-84 geographic coordinate system, and a global environment suitability map with 5 × 5 km 2 spatial resolution was produced for sweet sorghum. In this study, C++ programming and GIS software were employed to preprocess the above datasets.

Land-Cover and Environmental Variables
Previous studies show that the distribution of several bioenergy plants is affected by environmental factors such as climate, soil, and topography conditions [23][24][25]. Land cover was adopted as an additional condition to reduce the risk of competition with existing food crop production. In this research, comprehensive environmental factors were used to obtain gridded maps of the global marginal lands of sweet sorghum. The rationale for our inclusion of each environmental factor is described in the following sections. The datasets adopted in this research are listed in Table 1. The WorldClim database (http://www.wclim.org/) contains freely available global climate datasets at a 1 × 1 km 2 spatial resolution. The datasets of version 2.0 span the period from 1970-2000 and were derived from world-wide weather stations using ANUSPLIN-SPLINA software [26]. Data for precipitation and air temperature are important for plant growth. Several studies have described the link between precipitation and plant growth [27,28]. High temperature can cause sterility in male sweet sorghum, which significantly compromises crop yields [29]. Leaf expansion in sweet sorghum is severely inhibited by night temperatures below 5 • C [30]. Therefore, this research incorporates mean annual precipitation, as well as maximum and minimum annual temperature layers.

Soil
There is evidence that soil quality affects plant growth [31][32][33]. Soil quality is determined by several parameters, such as effective soil depth, soil type, and soil water content. Map layers for effective soil depth and soil class type were obtained from the World Soil Information website (http://www.isric.org/). The soil water content, which quantifies available soil water for vegetation, was downloaded from the Consortium for Spatial Information (http://www.cgiar-csi.org/).

Topography
Topography plays an important role in the process of vegetation growth. For example, soil moisture is easy to lose in areas with steep slopes, which are not conducive to plant growth. Because the global Digital Elevation Model (DEM) dataset (90 m) provided by the Shuttle Radar Topography Mission (SRTM) (http://srtm.csi.cgiar.org) does not cover regions at high latitudes, a DEM dataset downloaded from NASA's Earth Observatory Group (https://www.nasa.gov/) that includes high-latitude areas was used. Based on this dataset, the global spatial distribution of land surface slope was obtained.

Land Cover
To ensure the safety of food production, land cover type was used as a limiting condition. The global land use dataset at approximately 5 × 5 km 2 spatial resolution, which was derived from the MODIS Terra and Aqua based on supervised decision tree and artificial neural network classification algorithms [34], was downloaded from NASA's Earth Observatory Group (https://lpdaac.usgs.gov/). There is some literature that uses several land cover types such as shrub land, forest land, and grassland for planning the cultivation of bioenergy plants [17,19]. However, this study takes the opposite approach by regarding some land cover types, including urban, barren, and cropland, as unsuitable for sweet sorghum growth.

Occurrence Records
The database adopted in this study contains information on the known global occurrences of sweet sorghum (http://doi.org/10.15468/39omei), which include records of sweet sorghum from 1831-2015. This database can be downloaded from the Global Biodiversity Information Facility (http://www.gbif.org), and it contains 45,012 records with georeferenced location (latitude and longitude) information. This study assumed that the existence of sweet sorghum in these locations is reasonable. In other words, the georeferenced occurrence records reflect natural environment conditions suitable for the growth of sweet sorghum. To match the spatial resolution of the other environmental variables, the occurrence records were rasterized. During this process, some occurrence points were merged into a single grid cell because their locations were close to each other. In total, 6052 grid cells for sweet sorghum were assembled.

Absence Records
The absence records, which correspond to areas where environmental factors are not suitable for this bioenergy plant, are essential for mapping the spatial distribution of sweet sorghum [35]. As mentioned above, the grid cells where the minimum temperature is below 5 • C, or the land cover type belongs to urban, barren, or cropland classes are less likely to grow sweet sorghum. Additional absence points were randomly selected from the limit region, the total number of which was equal to the total quantity of occurrence records.

Modeling
Version 3.3.1 of the 64-bit version of R language was adopted to tune parameters, build the model, and assess performance. The dismo and gbm packages were used to train the BRT model. To train and test the performance of the BRT, 75% of the sample data were randomly selected as training data, and the remaining data were used as the validation dataset. To evaluate the predictive performance of the BRT model during tenfold cross-validation, the area under the receiver operating characteristic (ROC) curve (AUC) was used, which exhibited a number of desirable properties when compared to overall accuracy [36]. Based on a research experience carried out by Messina et al. [37], the main tuning parameter values were set as follows (tree.complexity = 4, learning.rate = 0.005, bag.fraction = 0.75, step.size = 10, cv.folds = 10, max.trees = 1000), and the other parameters of the BRT model were held at their default values. A detailed description of the BRT model can be found elsewhere [38][39][40].

The Predicted Distribution of Sweet Sorghum
The final map derived from the BRT model, which shows the predicted potential distribution of sweet sorghum, is presented in Figure 2. Sweet sorghum was predicted to grow primarily in the tropics and sub-tropics, with concentrations in southern North America, northeastern South America (Brazil), central Africa (Central African Republic, Guinea, Ivory Coast, and Mozambique), Southeast Asia, and northern Oceania. The areas of highest environmental suitability in Southeast Asia are mainly concentrated in Myanmar, Thailand, Laos, and Cambodia. India is also suitable for sweet sorghum, but only a small amount of land can be used to grow the bioenergy plant because most of the land is needed for food production. In China's case, the environmental suitability in the southern part of the mainly concentrated in Myanmar, Thailand, Laos, and Cambodia. India is also suitable for sweet sorghum, but only a small amount of land can be used to grow the bioenergy plant because most of the land is needed for food production. In China's case, the environmental suitability in the southern part of the country is higher than that of the north. In Europe, there are few areas suitable for sweet sorghum growth because the environmental suitability is not high. Based on occurrence probability, the average level of environmental suitability for sweet sorghum was calculated by latitude and longitude. According to this figure, the change in environmental suitability from south to north is obvious, while the change from west to the east is irregular. For example, Figure 2 shows that the average level of environmental suitability from 60 °S to 70 °N increased and then decreased, and the mean value of environmental suitability from 30 °S to 10 °N was higher than 0.8. Additionally, the magnitude of the rise or fall of environmental suitability was more pronounced for changes in longitude. Figure 2 also shows that the BRT model applied to training samples and validation samples achieved high predictive performance, with AUC values of 0.984 and 0.978, respectively. The spatial distribution of the validation samples illustrates that sweet sorghum has a wide geographic distribution, mainly concentrated in India and central and southern Africa (Figure 3). Compared with the above two areas, the distribution of validation samples in the remainder of the regions was relatively scattered. The results also reveal that the predicted distribution of sweet sorghum covered the area of the validation samples well. This map reflects the potential land resources for sweet sorghum growth. Based on occurrence probability, the average level of environmental suitability for sweet sorghum was calculated by latitude and longitude. According to this figure, the change in environmental suitability from south to north is obvious, while the change from west to the east is irregular. For example, Figure 2 shows that the average level of environmental suitability from 60 • S to 70 • N increased and then decreased, and the mean value of environmental suitability from 30 • S to 10 • N was higher than 0.8. Additionally, the magnitude of the rise or fall of environmental suitability was more pronounced for changes in longitude. Figure 2 also shows that the BRT model applied to training samples and validation samples achieved high predictive performance, with AUC values of 0.984 and 0.978, respectively. The spatial distribution of the validation samples illustrates that sweet sorghum has a wide geographic distribution, mainly concentrated in India and central and southern Africa (Figure 3). Compared with the above two areas, the distribution of validation samples in the remainder of the regions was relatively scattered. The results also reveal that the predicted distribution of sweet sorghum covered the area of the validation samples well. This map reflects the potential land resources for sweet sorghum growth.
A threshold environmental suitability value of 0.5 was used to classify each 5 × 5 km 2 unit on our final binary map as suitable or unsuitable for sweet sorghum growth. The total amount of marginal lands suitable for sweet sorghum was 4802.21 million hectares, as shown in Table 2 61 million hectares). The amount of marginal land resources suitable for sweet sorghum in each of the top ten countries was greater than 100.07 million hectares, while the amount of the land resources in each remaining country was below 100 million hectares. A threshold environmental suitability value of 0.5 was used to classify each 5 × 5 km 2 unit on our final binary map as suitable or unsuitable for sweet sorghum growth. The total amount of marginal lands suitable for sweet sorghum was 4802.21 million hectares, as shown in Table 2. Africa had the largest marginal land area suitable for the bioenergy plant with 1549.32 million hectares, and South America was second, with 1226.08 million hectares. Europe had the smallest suitable land area, with 121.02 million hectares-less than the Republic of Indonesia. For a single country, Brazil had the largest amount of marginal land resources suitable for sweet sorghum, with 676.86 million hectares, followed by Australia (602.94 million hectares), the United States (260.29 million hectares), the Republic of Congo (188.64 million hectares), and China (159.61 million hectares). The amount of marginal land resources suitable for sweet sorghum in each of the top ten countries was greater than 100.07 million hectares, while the amount of the land resources in each remaining country was below 100 million hectares.    Figure 4 shows that the most important environmental factor in the BRT model was maximum annual temperature, which contributed 40.2% to the predicted distribution of sweet sorghum. Environmental suitability for sweet sorghum was also influenced by land cover, which contributed 30.9%. The names of the land cover types are described in Table A1. These were followed by minimum annual temperature (23.2%), mean annual precipitation (2.6%), and soil water content (1.6%). Slope (0.7%), soil class (0.6%), and effective soil depth (0.2%) made very little contribution to the BRT model. Figure 4 shows that the most important environmental factor in the BRT model was maximum annual temperature, which contributed 40.2% to the predicted distribution of sweet sorghum. Environmental suitability for sweet sorghum was also influenced by land cover, which contributed 30.9%. The names of the land cover types are described in Table A1. These were followed by minimum annual temperature (23.2%), mean annual precipitation (2.6%), and soil water content (1.6%). Slope (0.7%), soil class (0.6%), and effective soil depth (0.2%) made very little contribution to the BRT model.

Discussion and Conclusions
By combining high-dimensional environmental and land-cover factors and a large volume of occurrence records, this study evaluated the global distribution of environmental suitability for sweet sorghum at a 5 × 5 km 2 spatial resolution based on the BRT model. It is remarkable that all the data used in this research are available at no charge. This initial map serves as a baseline for understanding the potential marginal lands of sweet sorghum, which is important for forming strategies for developing biomass energy sources.
Previous studies have discussed the distribution of sweet sorghum. For example, sweet sorghum was thought to be mainly distributed in tropical and subtropical regions, which extend from Sierra Leone along the moist belt surrounding the coastal tropical forest to approximately 15 °S, the Mediterranean region, and the islands of Southeast Asia [41,42]. Based on survey data collected in 79 villages spanning the entire cereal growing zone from 1976 to 2003, the change in the diversity and geographical distribution of sweet sorghum were compared at different scales (country, region, and village) in Niger [43]. In Kenya, the ecological and geographical distribution of sweet sorghum was

Discussion and Conclusions
By combining high-dimensional environmental and land-cover factors and a large volume of occurrence records, this study evaluated the global distribution of environmental suitability for sweet sorghum at a 5 × 5 km 2 spatial resolution based on the BRT model. It is remarkable that all the data used in this research are available at no charge. This initial map serves as a baseline for understanding the potential marginal lands of sweet sorghum, which is important for forming strategies for developing biomass energy sources.
Previous studies have discussed the distribution of sweet sorghum. For example, sweet sorghum was thought to be mainly distributed in tropical and subtropical regions, which extend from Sierra Leone along the moist belt surrounding the coastal tropical forest to approximately 15 • S, the Mediterranean region, and the islands of Southeast Asia [41,42]. Based on survey data collected in 79 villages spanning the entire cereal growing zone from 1976 to 2003, the change in the diversity and geographical distribution of sweet sorghum were compared at different scales (country, region, and village) in Niger [43]. In Kenya, the ecological and geographical distribution of sweet sorghum was analyzed, and the results derived from the knowledge autecology of wild sweet sorghum and cluster analysis illustrated that crop-wild gene flows are frequently and predominantly found in sweet sorghum growing areas [44]. These studies were mainly based on the sample data, but they did not identify the potential patterns between the sample data and environmental factors. In this study, the relative contribution of different environmental factors was calculated and patterns between related environmental factors and the known occurrences of sweet sorghum were analyzed. For instance, maximum annual temperature and land cover were the main influential factors for evaluating the potential distribution of sweet sorghum, with contribution rates of 40.2% and 30.9%, respectively. The multiple-criteria evaluation method adopted in previous studies relied mainly on the prior knowledge of experts. The determination of the threshold of the rule set was somewhat arbitrary, and the rationality of the threshold could not be verified, which made the multiple-criteria evaluation method less suitable for global-scale research. The BRT model, which has been used for evaluating the potential distribution of several species (i.e., cassava and fruit bat), can effectively solve the problem [45,46]. In the present study, the BRT model had good predictive performance for both training and test data sets, with AUC values of 0.984 and 0.978, respectively. Therefore, this data-driven method is more reasonable for the assessment of marginal lands when compared with the multiple-criteria evaluation method used in previous works.
The estimated map derived from the BRT model illustrates that global marginal land resources suitable for sweet sorghum are abundant, with 4802.21 million hectares. In terms of potential marginal land resources, Africa (1549.32 million hectares) was found to be the most suitable region for the development of sweet-sorghum-based ethanol fuels, especially in sub-Saharan countries. South America (1226.08 million hectares) was second, followed by Asia (822.75 million hectares), Oceania (623.95 million hectares), North America (459.09 million hectares), and Europe (121.02 million hectares). Note that the development of sweet-sorghum-based ethanol fuels is a complex system, and subsequent research will be further analyzed from the perspectives of economy and technology. Moreover, the potential reduction in emissions and the potential energy savings should be considered in order to analyze the value of the development of fuel ethanol in the areas suitable for sweet sorghum using a biophysical, biogeochemical model in future work.   Urban and built-up 14 Cropland/Natural vegetation 15 Permanent snow and ice 16 Barren or sparsely vegetated