A Land Evaluation Framework for Agricultural Diversification

: Shortlisting ecologically adaptable plant species can be a starting point for agricultural diversification projects. We propose a rapid assessment framework based on an ecological model that can accelerate the evaluation of options for sustainable crop diversification. To test the new model, expert-defined and widely available crop requirement data were combined with more than 100,000 occurrence data for 40 crops of different types (cereals, legumes, vegetables, fruits, and tubers/roots). Soil pH, texture, and depth to bedrock data were obtained and harmonised based on the optimal rooting depths of each crop. Global baseline temperature and rainfall data were used to extract averages at each location. To evaluate the ability of the method to capture intraspecies variation, a test was performed using more than 1000 accession records of bambara groundnut ( Vigna subterranea (L.) Verdc.) as an exemplar underutilised crop. Results showed that a suitability index based on soil pH and an index that combines the thermal suitability moderated by the soil pH, texture, and depth suitability have the potential to predict crop adaptability. We show that the proposed method can be combined with traditional land use and crop models to evaluate diversification options for sustainable land and agrobiodiversity resources


Introduction
A major barrier to diversification of agriculture is the lack of knowledge about the potential suitability of neglected and underutilised crops [1,2]. Crop suitability assessment is an integral part of land resource management projects, and several techniques have been developed to facilitate such assessments using land potential and soil productivity methodologies [3,4]. Although the required data varies, all these methods use qualitative environmental thresholds as a major determinant for shortlisting a group of crops in a particular area. Land suitability assessment programs are often executed at different scales and harmonising the data inputs and results in a way that is useful for decision making is challenging. Classification guidelines of the Food and Agriculture Organization of the United Nations (FAO) [5] were primarily developed to cater to the need for a standard framework in land evaluation and crop selection. Since then, many methods have been devised to describe crop suitability as a function of land biophysical processes at limited spatial scales; a few have included considerations of economic and social externalities [6]. However, these land system models often lack robustness in methodology and the data to be used for minor crops. The rigidity of a model can limit wider application of its outputs, as detailed biophysical information can be missing for many locations, crops, and scales [7]. Furthermore, information about crop productivity, as a function of management practices, costs, and risks, is largely missing from many land assessment methods [8,9]. The application of fuzzy-threshold methods has improved the development of these techniques, as detailed comparison between many options can be explained quantitatively [10]. This method has been used to address environmental and spatial planning issues, but its application in enhancing land-use decisions has not been fully explored. Methods that rely only on soil limitations have recently gained popularity for utilisation of marginal lands [11]. Addition of climate information to the data used in these methods requires more investigation.
Provided that up-to-date methodology [12] and local data describing the soil-plantatmosphere-management relationships are available, measurements of suitability of a crop at a location (biomass/yield and income) can be estimated. Deriving an estimate for the production of a crop at a location should be the ultimate goal of any land-use analysis framework [13]. However, this is often not possible due to lack of data, the most important of which are phenological information, recorded yields, primary economic data such as costs, management and agricultural inputs, and detailed pedoclimatic information. This problem increases when suitability assessment is needed for underutilised or neglected crops, where limited information and research is available [14,15]. Also, as Muller et al. [4] argue, although successful at a local scale, data requirements prevent these models from performing acceptably at larger scales. Following attempts to combine land evaluation methods with crop modelling, newly developed hybrid methods have appeared. For example, Bonfante et al. [16] developed and tested a hybrid land assessment methodology to demonstrate the impact of climate change on maize varieties in Italy. Application of these methodologies to minor crops and their landraces will require some compromise in defining unknown crop growth parameters [17].
The use of mechanistic ecological models with biophysical thresholds has gained acceptance due to their ability to adapt to a wide range of current and future biophysical conditions. Shortlisting based on ecological adaptability can be a starting point for agricultural diversification in a region. Empirical methods have been used for climate change impact studies with only climate information [15,18]. Recently, there have been some attempts to enhance the niche-based models with soil fertility (soil organic matter) and with other soil information [19,20]. Providing data and computational power for these models is becoming easier, and the application of fuzzy methods in this area is on the increase [21,22]. Although these models cannot be reliably used to predict yield [15], they provide initial estimates of suitability prior to detailed assessments and local experiments. Insights generated by these methods have been used to provide a global outlook of the evolution of agricultural lands [22]. Furthermore, combination of mechanistic species distribution modelling with pedoclimatic data has been shown to produce results comparable with empirical crop models [23]. The application of this method for agricultural diversification using neglected and underutilised species remains unexplored.
Land capability assessment projects could benefit from a simple methodology that estimates initial suitability and likely productivity of cropping options [24]. Application of the law of the minimum provides a robust and yet simple method for predicting the suitability of a crop in a certain environment. EcoCrop is one example of such method [18,21] which is also the name of a database containing ecological niche information for more than 2500 agricultural species [25]. This database was developed by the FAO Land and Water division in collaboration with experts in agriculture and agro-biodiversity to provide qualitative and quantitative crop environmental descriptors.
The objectives of the work reported in this paper were i) to examine the improvements that can be made to the EcoCrop model with the addition of soil information and ii) to develop a global resource suitability assessment framework for use with underutilised crops based on available data at global and local scales. To enhance the EcoCrop model, two climate variables and three soil variables were included from globally available databases to build a pedoclimatic database. To test the enhanced models, occurrence data for 40 crops were used to evaluate their performance to predict the suitability of individual crops. As genetic variation exists within crop species, we chose an exemplar crop to test the thresholds of within species variation for climate and soil variables used in the enhanced EcoCrop model. Finally, we propose a framework that can be used to delineate locationspecific options for crop diversification. Figure 1 summarises the employed methodology and shows the sources of data and processing steps used to derive the final results.

Data
Species occurrence data for 40 crops representing both major and underutilised crops were obtained from the GBIF (Global Biodiversity Information Facility) database [26]. The chosen species represented a combination of crop types including cereals, legumes, vegetables, tubers/roots, and fruits (Table 1).     (Table 1) combined with accession data locations for bambara groundnut IITA (International Institute of Tropical Agriculture) data.
Agroecological requirements for individual crops were extracted from FAO EcoCrop [25] (Appendix A). To demonstrate the ability of these environmental thresholds to encompass genetic variation, the same methodology was used to assess more than 1000 accessions of bambara groundnut (Vigna subterranea (L.) Verdc.) obtained from IITA (International Institute of Tropical Agriculture) [27].
Climate data were acquired from the WorldClim interpolated global climate dataset at 1 km spatial resolution [28]. Soil data were acquired from the SoilGrids global dataset that provides gridded data based on a database of 150,000 soil profiles and physically based covariates including geomorphological data and global datasets of satellite images at 250 m spatial resolution [29]. Soil chemical and structural properties known to affect crop yield were identified. Soil texture was defined by three binary variables describing suboptimal and optimal conditions from "light" to "heavy" texture classes. This form of data specification was adopted due to the format of the FAO EcoCrop database that describes texture using qualitative keywords such as "light", "medium", and "heavy" rather than quantitative variables. This information was used to prespecify soil texture in the form of % sand and % clay using known soil texture classifications [30]. An expert-defined optimal soil depth based on the EcoCrop data was used to determine the effective depth for aggregating soil properties (pH, % sand, and % clay) within the standard SoilGrids depth intervals using numerical integration methods. For each coordinate, soil and climate data were extracted from the soil and climate layers and harmonised for use in calculating suitability indices.

Computational Aspects
All data processing and analysis used open source software. Species presence and accession data were manually downloaded from the GBIF and IITA databases and stored in Comma-separated values (CSV) format. A script was developed in R statistical language version 3 [31] to automatically extract scientific names from each record. This script excluded records without coordinates and preserved only those records in the GBIF dataset that were real observations in the field (about 95% of total observations) [32]. Another automatic script was developed to extract records of climate (temperature and rainfall) and soil (pH, % sand, % clay, and depth to bedrock) from worldClim and International Soil Reference and Information Centre (ISRIC) data layers. The scripts were designed to run on a central server to facilitate data extraction for a large number of coordinates.
Another R script was written to calculate both individual and combined suitability indices for each occurrence record and to aggregate the results for analysis and to display as tables and figures. All R scripts were run on a Macintosh computer (MacOS 11); the run time for calculating the indices and for preparing the results was about 15 hours. All scripts and data are available at https://github.com/geoej/LEFAD [33].

Deriving the Indices
Calculation starts with determining a temperature suitability index for all possible growing seasons starting from January and then proceeding monthly. The index specifies suitability for optimal and suboptimal conditions for each month, : where is the average monthly temperature for the location i, and are the marginal or absolute temperature averages for a species to survive, and and are the optimal minimum and maximum temperature averages for the species to grow and produce yield based on the original EcoCrop model [18]. This suitability index distinguishes suboptimal from optimal conditions by assigning a lower index to temperature ranges that might result in plant germination but are generally less hospitable for sustained growth and may lead to yield loss.
To reduce the computational burden, only average temperature as an indicator of thermal requirement was used. This allows the algorithm to run once instead of twice for and as proposed by Ramirez-Villegas et al. [15].
Next, the algorithm calculates the length of growing season by determining the growing days from sowing/transplanting to the end of the growing season or next harvest (for perennial crops). For any location, the seasonal temperature suitability for a crop is the minimum of all monthly indices (n) that fall within season s: where is total temperature suitability for season s at location i. The same calculation is repeated for rainfall suitability, except that rainfall is aggregated for all months within all possible seasons (n) in the calendar year first: where is the sum of rainfalls ( ) for season s at location i and , , , and are the rainfall requirements for marginal and optimal conditions in millimetres/season based on data provided by the FAO EcoCrop database.
For optimal depth suitability, , a binary system was developed: where is the optimal depth of soil the crop requires to establish its rooting system and is depth to bedrock at location i. The rooting system of plants can often change in response to environment, but due to lack of information describing that relationship, the cutoffs in Equation (5) were used to simulate the impact of soil depth on plant growth.
A pH suitability index was calculated by defining optimal and suboptimal conditions for plant growth: The pH value for any location, i, is dependent on the optimal soil depth and was aggregated for three effective depths: low (0-50 cm), medium (0-150 cm), and deep (0-<200 cm) with the trapezoidal rule used to integrate soil data at different depths [29].
where N is the number of standard depths that can fit into the optimal soil depth for a crop for any specific soil property (i.e., pH, % sand, % clay, etc.), ( ) is the value of soil property at depth k, and a and b are target depths in centimetres. The optimal soil texture requirements (light, medium, and heavy) were converted to % sand and % clay based on the particle-size classification method for linking texture classes to sand and clay percentages by United States Department of Agriculture (USDA) soil survey manual [30]. The sand and clay values were "depth standardised" as for pH. The final rule for deriving the texture index was is the optimal soil texture required for a given crop. Soil texture was considered as a function of and variables which are the relative presences of sand and clay particles. To reduce the computational burden, a simple set of rules were selected to reflect the soil texture and its relation to soil structure as an important driver of soil quality [4]. For example, if a crop grows optimally in "heavytextured" soils, then a soil containing more than 65% sand particles will be inhospitable for that crop. Likewise, a clay content >15% poses a serious growth limitation on crops that prefer "light-textured" soils. Soils that contain <52% sand and <27% clay were considered loamy and suitable for crops that require "medium-textured" soils. These limits were chosen based on the USDA soil survey manual [30,34] that relates soil particle-size percentages to textural classes (see Appendix B). Table 2 shows a range of final suitability indices for a season s, ( ) and the maximum of all possible yearly seasons (n) will determine final suitability ( = max ( , , … , )).

Accounting for Intraspecies Variation
To test the ability of the expert-defined species threshold data to describe within species variation, an assessment was undertaken using multiple cultivars of bambara groundnut (Vigna subterranea (L.) Verdc.) as an exemplar underutilised crop [1]. Soil and climate data were extracted based on accession locations [27] and checked for consistency, and kernel density plots (frequency per 0.1 increment) of their pedoclimatic distribution were created. According to the EcoCrop database, the optimal soil texture for bambara groundnut is "light". Therefore, to create quantitative thresholds, soils with clay particles of 10%-20% ("sand" and "loamy sand" texture classes) were considered optimal [30].

Results
Of the 28 indices used to calculate suitability at the locations where crops were observed, the pH suitability index and the combined weighted average of thermal suitability index with all soil indices (numbers 1 and 2, Table 3) estimated the presence for all crops better than 80%. Although the pH index best predicted the presence in terms of overall mean, the interaction index (No. 2) had the lowest standard deviation among all indices. The other soil indices did not perform well compared to the climate indices. Giving more weight to pH suitability had a generally positive impact on those indices that averaged all the variables. Table 3. A correlation analysis of mean value of "predicted presence" versus "actual presence" of all crops in Table 1 Figure 3 shows a boxplot comparison of the indices for the groups of crops. Again, the best result in terms of mean and SD was for indices at the top of the list for all groups of crops with the exception of cereals that were more variable (reflected in the long tail of the boxplot). The results show that, despite poor performance, the soil-weighted averages had the lowest variation, suggesting that all available soil information could still be used advantageously in the suitability assessments.

Intraspecies Test
The majority of locations where bambara groundnut accessions were collected fell between the solid lines, representing the optimal soil pH conditions based on expert knowledge provided by the FAO EcoCrop database (Figure 4). Some locations had soils containing more than 20% clay-values which are suboptimal for the growth of bambara groundnut due to poor drainage/aeration issues [35]. For depth to bedrock, the majority of locations where accessions were collected had a depth >175 cm. . Density plot frequency of bambara groundnut accessions (locations) against pH, % clay, and depth to bedrock (cm) with thresholds demonstrating optimal (solid lines) and suboptimal (dashed lines) conditions for 5 standards soil depths [29]. Figure 5 shows the same pattern for monthly average temperature as that shown for pH. The grey curve, which represents average temperature for collected accession locations, is within the optimal temperature range for the majority of calendar months. The cumulative seasonal rainfall for all calendar seasons showed very low precipitation for all locations where accessions were collected, suggesting that the majority of accessions might have been collected from irrigated fields. Therefore, the amount Number of accessions of required precipitation alone was not a suitable criterion for determining suitability as information about management practices of the accessions are largely unknown.
(a) (b) Figure 5. Density plot frequency of bambara groundnut accession locations per average monthly temperature (a) and seasonal rainfall (b): Expert-defined thresholds are drawn for optimal (vertical solid lines) and suboptimal (vertical dashed lines). The grey curves in Figure 5a show the mean value, and the blue and red curves show the minimum and maximum monthly temperatures.

Intraspecies Variation
Analysis of the accession data for bambara groundnut showed that the species threshold information could correctly represent intraspecies variation. Although some locations fell outside the ranges defined by expert knowledge, it was evident that the within species variation of bambara groundnut could be reliably represented by these expert-defined thresholds when compared with the known distribution of the crop as compiled from Food and Agriculture Organization Corporate Statistical Database (FAOSTAT) [36]. Threshold values can also be validated using statistical methods. For example, Ramirez-Villegas, Jarvis, and Läderach [15] used simple population statistics to derive these parameters for grain sorghum. Nevertheless, this requires extensive presence/accession data to be available for all crops.
Georeferenced accession data can be useful in determining ecological niches at the sub-taxa level and can be used to select specific genotypes that can perform well at a particular location [37]. The method proposed here could also potentially benefit breeding programmes for desirable traits such as drought and heat tolerance by selecting accessions that are collected at places with high temperature and low precipitation ( Figure 5).

Combined Suitability Assessment
Many Ecocrop-based models consider crop suitability only as a function of climate information [14,15,21]. In this work, pH was chosen as representative of soil chemical properties that can affect nutrient availability, with texture and depth to bedrock as representatives of soil physical properties. Overall, the results (Table 3) showed that local climate information alone is not a good indicator of suitability. Furthermore, within major crop groups, the climate-only indices had the highest variability, particularly for the indices that used aggregated seasonal rain; this is evident in the bottom half of the group boxplots ( Figure 3). The main reason for this could be that information about the cropping systems are not exhaustively documented by the gene banks [38], and therefore, distinguishing between rainfed, irrigated, and other management practices for all locations is not possible.
Using soil pH as an indicator of crop suitability resulted in an overall median accuracy of estimation of 89%. However, a suitability index that took account of the interaction between thermal requirements with weighted averages of all soil properties produced the lowest standard deviation (Table 3 and Figure 3). This confirms improvements reported by Piikki et al. [19] for common bean (Phaseolus vulgaris L.) in Tanzania using soil organic matter. Also, Velazco et al. [20] reported the same results using similar edaphic variables. However, their study focused on habitat prediction only for trees, shrubs, herbs, and palm plants at a continental extent. Comparison between the pH-only index and the more complex weighted average interaction index (rows number 1 and 2, Table 3) suggests that adding more local information with a reasonable amount of accuracy (see Section 4.4) reduces variation in the final results. Setting aside the costs of soil remediation and unless for specific acid or alkali tolerant crops, pH-related issues can be remedied with proper management strategies. Therefore, it might be argued that pH alone is not a suitable indicator for crop suitability because many crops can grow in a range of soils provided pH issues can be remedied. Therefore, the use of the more complex interaction index is a better reasonable solution to delineate local options.

Combining with Traditional Land System Models
The index based on quantitative and qualitative environmental thresholds introduced here can be combined with traditional land models such as the FAO land evaluation system [5] to classify suitability based on designated suitability classes (S1: 80-100, S2: 60-80, and S3: 40-60; N1:20-40 and N2: 0-20), after combining the index with the local land-use units (e.g., agriculture land, etc.) using known land use classification methods [39]. This can provide an initial estimate of crop suitability for a large number of crops; for example, there are data for more than 2500 species in the FAO EcoCrop inventory. Moreover, suitability of cropping systems such as monocropping, intercropping, and mixed-cropping for both arable and marginal lands [11] can be assessed by combining crops from different groups that have the same index at a particular location. This method can also be used to delineate adaptation strategies for climate change in agroecological zones [40] where change in season length and occurrence or intensity of precipitation have affected current practices. More robust diversification strategies can be investigated by combining other important factors including germplasm availability and socioeconomic factors such as desired income level for farmers by delineating distance to market [41] and crop macro-and micronutrients to alleviate nutrition deficiency.
Bouma et al. [42] defined a classification system for land-use modelling approaches (empirical, mechanical, qualitative, and quantitative) based on available soil data at different spatial scales (global, continental, regional, watershed, catena/farm/field/plot, soil horizon and structure, and molecular interaction) and available soil/land knowledge levels (1 = user, 2 = expert, 3 = semiquantitative, and 4 and 5 = different levels of quantitative knowledge) when dealing with landuse questions. This is a useful schema for determining what type of approach is required for dealing with different users with different levels of knowledge at different scales. The approach described in this article is at level two in terms of "crop knowledge" (agrobiodiversity knowledge) and at four in terms of "soil and climate knowledge". However, the output "crop suitability information" still remains at level two (expert/estimated knowledge), since information such as "bambara groundnut is a moderately suitable crop for this field" is insufficient for farmers but can be enough for a regional planner looking for possibilities beyond current practices at the mesoscale (regional or national).
Other soil variables such as organic content, salinity, and bulk density can be added to this methodology, provided reliable data are available. This will require the development of analytical methods that can convert "qualitative" information such as "soil fertility level = medium", "saline tolerance = low", etc. used in describing favourable conditions for a crop, into quantitative scales such as organic carbon content (g/kg), cation exchange capacity (cmol + /kg), and volumetric % of coarse fragment in soil. Figure 6 shows a decision framework for global diversification projects in which selection can be initiated for a location following a baseline study of the local farming systems to ascertain local priorities for crop diversification. The results of shortlisting using pedoclimatic analysis can be combined with land system models to elevate the information to level 3 (semiquantitative knowledge) in Bouma's classification. Combining the results with crop models will improve the output to levels 4 and 5 (quantitative knowledge ) with the inclusion of other data types such as solar radiation, crop water and nutrient use efficiency, etc. Such quantitative modelling efforts can be carried out with either experimental data from known crop genotypes/varieties or landraces [43] or with simulations of suitability for accessions collected in similar pedoclimatic conditions using globally available datasets [37,44]. Finally putting all of these insights together in open access software tools can help with rapid delineation of alternative cropping options.

Limitations in Method, Data, and Computational Aspects
The two datasets used for species occurrence data for crop groups (GBIF) and bambara groundnut accession (IITA) contained data of different quality. For instance, although GBIF has introduced the concept of coordinate uncertainty in its database, IITA does not include that information. Also, coordinates are documented without adequate metadata (in the case of numerical coordinates, that means latitudes and longitudes are recorded without additional decimal points). Also, both of these databases do not usually record details of plant and environmental interactions such as the type of cropping system from which the presence/accessions were observed/acquired [38].
As Hengl et al. [29] state, predicted soil variables in the SoilGrids dataset are of different levels of certainty, particularly for areas where soil profile data are scarce. The uncertainty level is also different among predicted soil properties. For example, although the predicted pH values had an overall R-square of 83.4% and root mean square error (RMSE) of 0.5, the depth to bedrock model

Location
Stop could only explain close to half of the variation (R-square of 54.0%). This issue is manifested in Figure  4 for depth to bedrock, where the expert-defined thresholds and soil depth where the accessions were collected do not agree.
The global cross validation for the climate dataset using the weather station data showed high accuracy for temperature with a correlation coefficient of 0.99 and RMSE of 1.1 [28]. However, the model accuracy was low for predicted precipitation (RMSE of 49.5) due to lack of local weather station data or poor model specification. This confirms the findings here that the amount of reported precipitation at a location may not be a good indicator of suitability as it is possible to grow a crop economically and productively with irrigation [45]. Therefore, the present analysis should firstly be restricted to rain-fed systems.

Conclusions
Assessing potential of land for crop diversification using neglected and underutilised crop species at a specific location requires a practical approach that takes advantage of available data and knowledge. In this research, we used quantitative (temperature, rainfall, soil pH, and season length) and qualitative (optimal soil depth and soil texture) data to test a new method for estimating suitability of a wide range of crops. The analysis using more than 100,000 curated data representing crop presence showed that a suitability criterion based on soil pH and/or a moderated average suitability of temperature, pH, depth, and soil texture can predict the presence of a crop regardless of its type. The suitability index developed here distinguishes itself from traditional crop suitability evaluation methods in considering both climate and soil chemical and dynamic properties as well as potentially available local data. It can also be applied to a wide range of agricultural species, particularly those that are neglected and underutilised.
This method can be combined with traditional land evaluation systems and crop models to expand the range of cropping options in agricultural diversification projects. It can also be utilised in areas with shortage of resources such as water and where agrotechnology transfer is limited to propose diversification options. Although the output of this method is neither an estimate of crop biomass nor yield, it takes advantage of the availability of big environmental data and improvements in agricultural knowledge to improve our understanding of production systems involving minor crops. The same index can be used to compare many options at regional and local scales, where detailed data are available. The next logical step after suitability classification is to develop field trial programmes with local stakeholders that will lead to development of harmonised data inventories for the purpose of detailed yield estimations and crop insurance models such as those used for major crops.
Supplementary Materials: The following are available online at https://github.com/geoej/LEFAD ( [33]); computer codes and input and output data related to the analysis referred to in the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.