Simulation Model for Time to Flowering with Climatic and Genetic Inputs for Wild Chickpea

: Accurate prediction of ﬂowering time helps breeders to develop new varieties that can achieve maximal efﬁciency in a changing climate. A methodology was developed for the construction of a simulation model for ﬂowering time in which a function for daily progression of the plant from one to the next phenological phase is obtained in analytic form by stochastic minimization. The resulting model demonstrated high accuracy on the recently assembled data set of wild chickpeas. The inclusion of genotype-by-climatic factors interactions accounted to 77% of accuracy in terms of root mean square error. It was found that the impact of minimal temperature is positively correlated with the longitude at primary collection sites, while the impact of day length is negatively correlated. It was interpreted as adaptation of accessions from highlands to lower temperatures and those from lower elevation river valleys to shorter days. We used bootstrap resampling to construct an ensemble of models, taking into account the inﬂuence of genotype-by-climatic factors interactions and applied it to forecast the time to ﬂowering for the years 2021–2099, using generated daily weather in Turkey, and for different climate change scenarios. Although there are common trends in the forecasts, some genotypes and SNP groups have distinct trajectories.


Introduction
Chickpea (Cicer arietinum L.), the second most cultivated grain legume crop, is grown in more than 50 countries across the globe from subtropical regions in South Asia, East Africa, and Australia to more pole-ward regions of Australia, and temperate North and South America. It occupies more than 13.5 million ha globally (FAOSTAT) [1,2]. Its cultivation is of particular importance to food security in the developing world where chickpea seeds are a primary source of human dietary protein 1 [3]. Its wild progenitor, Cicer reticulatum, is a quantitative long day (LD) plant that exhibits responsiveness to vernalization [4,5]. Domestication resulted in the widespread adoption, early in the domestic crop's history, of photoperiod-responsive, vernalization-insensitive genotypes suited to spring cropping, which has left modern cultivated chickpea germplasm with greatly depleted genetic diversity [4,6].
The temperature regime and water availability, day length or competition for resources with other crops in rotation set the time limit for the chickpea to reach its reproductive phase [5,7]. The chickpea is sown in spring in temperate areas, such as the Mediterranean, and North America, and grows while the day length and temperature increases. It is examples of GWAS on agriculturally important traits in crops. Genotype-by-environment (G × E) interactions can be investigated, using bioclimatic variables at a site of an accession's origin as a GWAS phenotype; associations found in these studies indicate climatic adaptation [51]. Typical GWAS designs require the controlled planting of replicated accessions, which can quickly become logistically daunting and expensive across many sites.
Crop models may complement GWAS approaches by accounting for the influence of environmental factors [26]. However, the existing models use some "genetic coefficients" instead of actual genomic data [52] to consider genotype influence. This limits the ability of these models to simulate gene-by-environment interactions and to forecast the phenological characteristics of cultivars across different geographical locations and genotypes [15]. Due to the depleted genetic diversity of cultivated chickpeas, there is a compelling need to re-introduce genetic variation into crop lines, which can be achieved by including wild relatives in breeding programs. Mathematical models and tools that combine genetic and climate data to predict agronomic traits will greatly benefit breeders by simulating the performance of any given well-characterized genotype in any given well-characterized environment [53,54].
Recent crop model comparison under the Agricultural Model Intercomparison and Improvement Project (AgMIP) framework [55,56] revealed that one model is not enough to make forecasts in a changing climate, due to the increasing uncertainties in model predictions with elevated future temperatures [57,58]. Consequently, studies employing ensembles of crop models that give valuable information about model accuracy and uncertainty are needed. Four crop models, as well as their ensemble, were used to simulate an attainable soybean yield in Southern Brazil [31]. Ensemble simulations, such as the mean or median of simulated values, gave better results than any individual model of 27 wheat models tested in four contrasting locations for their accuracy in simulating multiple crop growth and yield variables [58]. To improve the accuracy in the investigation of the elevated CO 2 effect on wheat, four models were calibrated to the observed data in [59].
In this work, we applied an approach to model building developed earlier and used to model the flowering time in Vigna radiata [60]. We developed a new dynamical model using the flowering time of two species of wild chickpeas (Cicer reticulatum L. and C. echinospermum) collected at 21 different sites in Turkey and grown in 4 distinct environmental conditions. In contrast to our previous study that used a non-linear regression approach [61], the new dynamical model uses the daily values of climatic factors and GWAS results for genetic information and genotype-by-environment interactions to model a daily progression rate from sowing to flowering. We further used the model to validate the hypothesis of adaptation of genotypes to environmental factors at the geographic origin. Finally, we constructed an ensemble of models using bootstrap resampling, and used it to forecast the time to flowering for the years 2021-2099, using generated daily weather in Turkey. Though chickpea is a promising crop for Russia, forecasting was performed for Ankara, Turkey, in order to use the same sowing days and to be able to compare the results with the measured data.

The Overview
Our methodology consists of model construction using cross-validation and testing to select the best model with the lowest validation score; investigation of impacts of different factors to the model using a permutation test; collection of the ensemble of models with probable G × E interactions using Bootstrap resampling; forecast of time to flowering using generated daily weather for Ankara; and different scenarios (see Figure 1).  These steps are further described below.

Data Set of Wild Chickpeas
The data set consists of wild chickpea (Cicer reticulatum L. and Cicer echinospermum). Accessions were collected at 21 sites in five regions in Turkey by von Wettberg and coauthors [62] (see a map in Figure S1). The data set spans a large elevational gradient such that C. echinospermum populations generally occur at lower elevations than C. reticulatum, and temperatures are higher and rainfall is lower, on average, in low C. echinospermum sites, compared to higher C. reticulatum sites. The three most easterly and highest elevation C. reticulatum sites are distinguished by having the lowest monthly average temperatures (−4.8 to −2.2 • C versus −2.2 to −0.6 • C) in winter; vegetative phase frost in spring is more frequent in these sites.
These wild accessions were planted in climatically distinct sites in Turkey (Sanliurfa and Ankara, autumn-and spring-sowing) and Australia (Floreat, near Perth, WA and Mt. Barker, in extreme southern WA). As they were grown in contrasting environments, the phenotype data on time to flowering are highly diverse.
To build the simulation model for time to flowering, we used 2174 accessions sown in autumn on 290, 294 or 339 day of year (DOY) in Sanliurfa and Ankara (see Figure 1). The time to flowering for this data set ranges from 117 to 221 day (see Figure 2). We used 2088 samples from the data set sown in spring as a negative control for model testing. Due to the artificial watering conditions not included in the model, these samples cannot be used for model training, validation or testing. The time to flowering ranges from 64 to 152 days for these samples (see Figure S2) [63].
Here, we used the same six suggestive polymorphic sites associated with flowering time identified earlier [64] in 825 out of 2174 accessions (see Figure 1). These 825 accessions of Cicer reticulatum represent 15 out of 21 collection sites (see Table S1). These SNPs were identified as the best SNPs after running a mixed linear model (MLM) in TASSEL, which associated flowering time (phenotype) with the genotypes, using site/year/season as a factor to account for their effect on phenotype (Table S2). Some alleles are fixed in certain collection sites (Table S3).
These SNPs are already prominently figured in the previous paper, where a full description and rationale are presented [61]. We added them into the analysis here to illustrate the much improved power of the new tests in the context of reanalyzing the same exact phenotypic and genetic data used before. Genes related to flowering time, such as CONSTANS-LIKE 3 gene, FLC-like gene, CONSTANS-LIKE 9 gene, FLC EX-PRESSOR gene, PTL-like gene and AGL61 gene are found within 5 mb of these SNPs. To access the genotype-environment interactions, we grouped plants into 18  The data set of the autumn-sown accessions includes 30 distinct combinations of alleles at 6 positions (Table S2) because not all of the 3 6 combinatorial combinations are found in the sequenced plants. These combinations are hereinafter referred to as genotypes by the number assigned in the last column of Table S2. The same genotypes may originate from different collection sites, and several genotypes may be found at one site. Genotype #1 corresponds to unsequenced accessions, collected around Siverek village (see a map in Figure S2).

Climate Data
The data on climatic conditions for each day in the period of field experiments were taken from a publicly available site https://rp5.ru/Weather_in_the_world accessed on 10 September 2019 (the website provides weather forecasts for 172,500 locations as well as observational data reported from weather stations) and NASA [65] (these data were obtained from the NASA Langley Research Center (LaRC) POWER Project funded through the NASA Earth Science/Applied Science Program):

1.
D is day length.

2.
T n is minimal temperature.

3.
T x is maximal temperature.

5.
S is solar radiation.
The measurements for these factors, such as minimal and maximal temperatures ( • C), are provided for each day as floating point values.

A Model with Climatic Factors
We model an impact v(i, t) of the weather conditions X(t) at each day t, t = 0, . . . , m i on the fitness u of a plant i, i = 0, . . . , I − 1 to start flowering as the linear combination of N non-linear control functions that depend on climatic factors (1): where β n , n = 0, . . . , N − 1 are coefficients; F n , n = 0, . . . , N − 1 are non-linear control functions; vector X(t) = (D(t), T n (t), T x (t), P(t), S(t)) combines climatic factors; and I is the number of plants.
Consequently, the plant's fitness to start flowering at each day can be written as follows: where H() is a Heaviside function. The number of days to flowering m i predicted by the model is defined by (3): where U is the fitness threshold that the plant needs to reach to be able to flower; v(i, t) depends on the environment at day t; and the interaction of the plant with the environment, v(i, t), can be considered the resource accumulation rate of a plant. Several forms of dependence are proposed in the literature. We propose a more general approach in which this function is determined automatically in analytic form.

Genotype-to-Climatic Factors Interactions
Formula (1) represents only the influence of climatic factors on the time to flowering. In [52], the framework was developed for combining weather and SNP data. We further enhance the method by introducing non-linearity and automatic function selection. Our genetic information (Table S2) A model then takes the following form: where new coefficients β (3k+j+1)N+n define the effect of genotype-by-climatic factor interaction, and B min is the minimal coefficient value. Thus, the form of the function adapts to the allele combination of a plant by changing the weights of the control functions. The six polymorphic sites define 18 SNP groups, and thus the number of coefficients for genotype-by-climatic factor interactions equals to 90 for N = 5 climatic control functions.

Analytic Form of the Control Function
Several forms of control functions have been proposed in the literature. Here, we propose a more general approach in which the control functions are determined automatically in analytic form, using grammatical evolution (GE) [66,67]. In GE, the analytic function form is built by decoding the sequence called "word" of W integers called codons. Decoding is performed according to simple rules of substitution that establish a correspondence between codons and either an elementary arithmetic operation, '+', '−', '*', '/', or an expression, X, (X − X b ), 1/(X − X b ), where X is a name of a climatic factor and X b is a constant base value of this factor. These constants are specific to factors, so there are T b x base maximal and T b n minimal temperatures, P b precipitation, D b day length and S b solar radiation.

Model Adaptation
The vector of model parameters θ include coefficients β n , n = 0, . . . , N − 1 and β (3k+j+1)N+n together with fitness threshold U, the minimal coefficient value B min , W GE codons (integer numbers, see Section 2.6) and constants T b x , T b n , P b , D b and S b . Parameters are inferred automatically by stochastic minimization of the deviation of the model output from data (Formula (6)).
where α and λ are regularization parameters, τ i and m i (θ) are a number of days from planting to flowering measured experimentally and predicted by the model with parameters θ, respectively. Regularization parameters α and λ were set to small values (10 −5 ) to limit the effect of penalty in terms in the objective function. The differential evolution entirely parallel (DEEP) method [68,69] was used in this work. Differential evolution was proposed by Storn and Price in 1995 as a heuristic stochastic optimization method [70]. DEEP was developed by us for application in the field of bioinformatics [68,71]. It includes several recently proposed enhancements [72].

Model Cross-Validation and Negative Control
The work started by fitting the model (1) to the data set of autumn-sown accessions. To obtain a model with better predictive ability, a 4-fold cross-validation of the model was performed. The whole data set containing 2174 plants was split into training, validation and test sets. To assess the model ability to predict the time to flowering in prospective data sets, we saved 200 randomly selected records as the test set. Then, 3/4 of the remaining records were randomly selected for training and 1/4 for validation. The split was performed 100 times; the model was fitted to the training set and the accuracy of prediction was estimated on a validation set as a root mean error in time to flowering between the predictions and data. Next, the best model with the smallest prediction error was selected and tested on the test set (model M, see Figure 1). The insignificant difference in mean square errors in the time to flowering for the training and validation sets in cross-validation runs and the high-prediction accuracy of flowering time in the test data set ensures that the model was not overfitted and can be generalized to independent prospective data sets.
The set of 2088 spring-sown samples with time to flowering depending on artificial experimental conditions was used for a negative control test, which is considered successfully passed if the model adaptation fails (see Figure 1).

Estimation of Impacts of Climatic Factors and Genotype Information to the Model
A permutation test was performed to analyze the contribution of climatic factors T x , T n , P, S and D to time to flowering. The essence of the method is to rearrange the data of the factor of interest between dates and record the change in the error. The impact of the climatic factor is then an average error increase over all the permutations. The impacts are then normalized and converted to percentages over all factors (see Figure 1).
To assess the impact of genotype information on flowering time prediction, we fitted a model with G × E interactions (5) to the same data set, using the same 4-fold crossvalidation approach with 100 splits (see Section 2.8). The percentage of increase in the average score for validation runs between models with and without genotype information can be considered the estimate of the impact of the these interactions to the model accuracy.
Such an estimate uses information from over 100 runs for both cases, i.e., with and without genotype information, and thus is more reliable than, for example, the estimate using one best model and the test set.

Model Ensemble with Defined Interactions for Forecasting
In order to make forecasting with our model more reliable, we constructed an ensemble of models with a fixed analytic form of control functions obtained in cross-validation runs for model (1) (see Section 2.8). To be able to statistically analyze the estimates of the influence of genotype-by-environment interactions to time to flowering and calculate the confidence intervals, the coefficients β n:n≥N were estimated using the Bootstrap resampling method [73]. In brief, model adaptations were performed using data sets resampled from the original one with repetitions, i.e., some samples may have been omitted while others were duplicated. We applied a Mann-Witney-Wilcoxon criterion to the resulting samples of each parameter β n:n≥N to find coefficients that were significantly non-zero.
The following heuristic procedure was applied to construct the ensemble of models for forecasting. The resulting empirical distributions for each parameter β n:n≥N are represented as histograms, and the most probable parameters belong to the highest bins of the histogram. Consequently, the models with the majority of parameters with this property are considered the best models and selected into the ensemble. We required that more than 97.5% of coefficients β n:n≥N belonged to the highest bins in the histogram.

Synthetic Weather Generation and Time to Flowering Forecast
The daily weather forecasts for Ankara, Turkey, were produced, using the weather generator program MarkSim, covering the period between 2021 and 2099 years. To achieve better results, 30 replications were recorded. MarkSim was designed to simulate weather from known sources of monthly climate data [74][75][76][77]. MarkSim implements several global models and takes into account the socio-economic development scenarios described by the four representative carbon dioxide concentration profiles (RCPs) adopted by the Intergovernmental Panel on Climate Change (IPCC) in the fifth assessment report (AR5) in 2014. The profiles correspond to a wide range of possible changes in future anthropogenic emissions of greenhouse gases and are called rcp26, rcp45, rcp60 and rcp85 in accordance with the possible violation values for radiation earth balance in 2100 with respect to the preindustrial epoch (+2.6, +4.5, +6.0 and +8.5 W/m 2 , respectively) [78].
The safest socio-economic scenario for global climate change due to the proposed nature preservation actions is called rcp26. rcp85 is the most dangerous scenario, while rcp45 and rcp60 lay in between these extremes. Following [79], we used GFDL-ESM2M [80] and HadGEM2-ES [81] models to forecast the climate in Ankara, Turkey. HadGEM2 is a comprehensive model of the Earth system developed by the Hadley Center, U.K. The standard atmospheric component has 38 levels extending to a height of 40 km. The horizontal resolution is 1.25 • latitude and 1.875 • longitude (≈112.5 km) (MetOffice). GFDL-ESM2M is a comprehensive model of the Earth system developed by the Laboratory Geophysical Hydrodynamics (GFDL) of the National Oceanic and atmospheric research (NOAA). This is one of the preferred models in the Coupled Model Intercomparison Project 5 (CMIP5). This version has a resolution of 2.5 • longitude, 2 • latitude (≈220 km) horizontally and 24 levels vertically.

A Model with Climatic Factors
The model cross-validation and testing were performed to select the best and most reliable model (see Figure 1). The data set was split into training, validation and test sets. The number of samples was 1482, 492 and 200, respectively. Optimization was performed with the training set, and then the model with a lowest score equal to the sum of squared differences between the model solution and data in terms of the number of days from sowing to flowering (18,580.0) for the validation set was selected and run on a test set. The histograms for the training and validation sets are presented in Figure 3.  To verify the validity of the constructed model, we compared the mean values of the root mean scores with a Mann-Whitney-Wilcoxon criterion. The difference between the training and validation scores, 30.7 and 36.6, respectively, was not statistically significant (p = 0.14 > 0.01).
F 0 shows that the resource accumulation rate increases with positive minimal temperature, but with growing solar radiation, the weight of this term decreases. Linear terms F 1 and F 2 indicate that increasing minimal temperature by one degree above ≈13.6 • C or day length values by one hour above ≈7 h lead to the increase in the resource accumulation rate by the values of ≈8 or ≈3, respectively. Functions F 3 and F 4 correct the resource accumulation rate at high temperatures, i.e., slow down the rate increase in this case.
The best model with optimized parameters accurately describes the training and validation data (see Figure 4); the Pearson correlation coefficient is ≈0.98 in both cases.
The model solutions for the validation and test sets closely coincide with the experimental data in terms of the mean, median, maximal and minimal difference in the number of days to flowering (see Table 1).  For a negative control test for the model, we used a set of 2088 samples sown in spring (see Figure 1). The time to flowering of these samples cannot be described by the model due to the artificial experimental conditions (i.e., water addition). We performed the same model adaptations and did not find any acceptable parameter combinations. The numerical limit for time to flowering of 152 days is predicted for most samples (see Figure 5).

Impacts of Different Factors to the Model
To assess the impacts of climatic factors to the model, we performed a permutation test with 100 permutations (see Figure 1). The minimal temperature T n has a maximal impact on the model-over 85%-and the day length impact is 14.4%. The impacts of maximal temperature, solar radiation and precipitation are almost negligible-0.02%, 0.08% and 0%, respectively (see Figure 6). To assess the impact of genotype information, a model with G × E interactions (5) was built using the same methodology with 4-fold cross-validation and 100 splits (see Figure 1). The root mean sum of squared differences between the model solution and the data in terms of the number of days from sowing to flowering was used as the score (see Figure 7).  The impact of genotype information on model accuracy was estimated as the difference in means between the root mean scores for the validation set for the model (7) (that equals 36.6) and the model with the genotype information (8.6). The decrease in the root mean score for adding the genotype information equals to 28 or 77% of the value for model (7).

Comparison of Genotypes
We used the results of the permutation test and applied a Mann-Whitney-Wilcoxon criterion to compare the impacts of climatic factors for all pairs of genotypes (see Table S5). The effect of maximal temperature T x is the same for all pairs, while the effect of minimal temperature T n is different for all pairs, except #20-#21 (originated from Egill and Kalkan, respectively), #9-#27 (originated from Barisptepe and Sarikaya, respectively) and #18-#29 (originated from Egill and Besevler, respectively). Table S4). The impact of minimal temperature is positively correlated with the longitude at primary collection sites with Pearson's r = 0.71 (p = 0.002 < 0.05) and Spearman's ρ = 0.80 (p = 0.0005 < 0.05), while the impact of day length is negatively correlated with longitude with Pearson's r = −0.71 (p = 0.003 < 0.05) and Spearman's ρ = 0.80 (p = 0.0005 < 0.05). The impacts of minimal temperature and day length are correlated in the same way with elevation but with lower significance, p < 0.1 (see Table 2). Note that elevation is correlated with longitude with r = 0.66 (p = 0.007 < 0.05) and ρ = 0.70 (p = 0.004 < 0.05).

The impact of climatic factors varies for accessions from different collection sites (see Section 2.2 and
According to the description of site environments (see Section 2.2 and [62]) the climate is cooler at higher elevations and the day length is limited in valleys surrounded by mountains. Thus, our results can indicate that accessions from highlands are adapted to lower temperatures, while those that are originated in valleys are used to shorter days.
Day length has a significant Spearman's correlation (ρ = 0.70, p = 0.005 < 0.05) and insignificant Perason's correlation (r = 0.37, p = 0.17 > 0.05) with latitude. While the difference in latitude between collection sites is relatively low, this can indicate the photoperiod sensitivity of the wild accessions.
The impact of minimal temperature has a significant Spearman's correlation with latitude and weakly significant correlation coefficients with elevation, while correlations between other pairs of climatic factor impacts and geographical characteristics are statistically insignificant (see Table 2). Table 2. Correlations (Pearson's r and Spearman's ρ) of impacts of maximal and minimal temperature-T x and T n , solar radiation S, day length D, with geographical characteristics of collection sites. For genotypes #27, #17, #10, #9 and #4 (originated from Baristepe, Cudi and Sarikaya) the impact of minimal temperature T n is over 90%. In contrast, for genotypes #29 and #30 (originated from Besevler, Sarikaya and Kalkan) the effect of T n is significantly smaller and equals to approximately 70%. The impact of day length D for these two genotypes is about 30%, while all other genotypes have D impact less than 25% (see Table S7).
The comparison of genotypes by reached fitness u (2) is done with a bootstrap approach. We applied a Mann-Whitney-Wilcoxon test to find pairs with statistically significant differences in u. Genotypes #29 and #30 (originated from Besevler, Sarikaya and Kalkan) again differ from all other ones (see Figure 8).
Such diversity in response to environmental factors can be beneficial for breeding programs that can include wild relatives of cultivated chickpea.

Comparison of SNP Groups
We used the results of the permutation test and applied a Mann-Witney-Wilcoxon criterion to compare the impacts of climatic factors for different SNP groups (see Table S6). The mean impacts of climatic factors for different SNP groups are approximately the same.
The results of the permutation test showed that the difference in impact to flowering time of minimal temperature T n , solar radiation S and day length D between all pairs of alleles in all SNP groups is statistically significant according to Mann-Whitney-Wilcoxon criterion. In contrast, the difference in impact to the flowering time of maximal temperature T x is insignificant between all pairs of alleles in all SNP groups.

Model Ensemble with Genotype-to-Climatic Factors Interactions
The difference in mean values of the root mean scores between training (7.7 days) and validation sets (8.6 days) (see Figure 7) for the model with the genotype information (see Section 2.5, Formula (5)) was statistically significant according to a Mann-Whitney-Wilcoxon criterion at the level of 5% (p = 0.03) and insignificant at the level of 1%. Thus, the constructed model is weakly validated. This can be explained by the fact that the number of samples with and without genotype in the training and validation sets are not balanced.
In order to construct a reliable model, we applied a bootstrap approach and performed B = 500 model adaptations using data sets resampled from the original one with repetitions (see Figure 1). The analytic form of the climatic control functions was taken to be the same as that found in the model (7) and fixed. We constructed histograms that represent empirical distributions for coefficients β n:n≥N for genotype-to-climatic factors interactions.
It was found that 75 out of 90 coefficients are significantly non-zero, according to a Mann-Whitney-Wilcoxon criterion (see Table S7). Interactions of all SNPs with control functions, and, consequently, with minimal day temperature, day length and solar radiation, influence the time to flowering in our model because for any SNP, there are allele combinations for which the coefficients of at least some control functions are statistically and significantly non-zero.
Following the heuristic procedure described in Section 2.10, we calculated for each model the number of coefficients β n:n≥N in the highest bins and constructed the histogram (see Figure 9) that represented the empirical distribution of the number of such coefficients with mean and standard deviations equal to 55.81 and 4.83, respectively.
In order to select the models containing more than 97.5% of the coefficients from the highest bins, we calculated the threshold value B = 55.81 + 1.96 × 4.83 = 65.28 using the mean, standard deviation of the empirical distribution and the z score 1.96 for the 97.5 percentile.
We selected 8 models with a number of coefficients β n:n≥N in the highest bins greater than threshold value B (see Table S8). The ensemble models are of high accuracy (see Table 4), i.e., the mean deviation of the model solution from the experimental data is less than 6 days.

Forecasting
To achieve more reliable results, we generated 30 replications of weather forecasts for Ankara for each year from 2021 to 2099, using the MarkSim weather generator for representative carbon dioxide concentration profiles rcp26, rcp45, rcp60 and rcp85 described in Section 2.11 (120 weather forecasts in total). Between 2021 and 2099, an increase in maximum temperature is expected for rcp26 by 0.7 degrees, for rcp45 by 1.5, rcp60 2.5, and rcp85 5.0. Minimum temperature will increase for rcp2.6 by 0.5 degrees, rcp45 by 1.0, rcp60 by 1.5, and rcp85 by 2.5.
Generated weather forecasts were used to predict time to flowering for 30 genotypes and 16 SNP groups, using the ensemble of models for sowing dates DOY 339 and DOY 294 (see Figures 10-13, respectively).
For sowing date DOY 339, the time to flowering decreases in the majority of cases, though in some models it can vary (see Figure 10). For all genotypes and for all models, except m144, the time to flowering has the same behavior. Its value for the first year of forecast (2021) is different for each genotype but for all of them it is less than the experimental value measured for the data set. The time to flowering stays approximately at the same level until the last year of forecast (2099) for rcp26. For rcp45, it continues to decrease until 2060-2070 and then stays approximately at the same level or even increases, for example, for model m107 and genotype #1 (unsequenced accessions) and for model m389 and genotypes #22, #9 (originated from Baristepe and Savur); these examples are marked with an asterisk ('*') in Figure 10 (see Section 2.2 and Table S2 for genotype numbers). For rcp60 and rcp85, the time to flowering decreases almost monotonically. The behavior of the genotypes in model m144 differs from that scenario. Genotypes #8, #21, #26 (originated from Baristepe, Kalkan and Sarikaya) and #1 (unsequenced accessions) have a significant drop in the time to flowering between the value measured in the experiment and the year 2021 forecast (this example is marked with two asterisks ('**') in Figure 10). The forecasts for later years for Genotypes #8, #21, #1 (originated from Baristepe and Kalkan) are either constant or continue to decrease, while the forecast for Genotype #26 (originated from Sarikaya) increases. The forecasts for time to flowering for other genotypes fluctuate around the value found in the data set for rcp26 and rcp45, and have a trend to decrease for rcp60 and a more significant trend to decrease for rcp85. For sowing date DOY 339, the trajectories for the time to flowering forecast for different SNP groups show different behaviors (see Figure 11). For models m107, m144, m204 and all RCPs, the trajectories cluster closely together. For models m240 (SNP2AA), m242 (SNP1AA, SNP6AR, SNP3AA), m389 (SNP1RR, SNP2RR, SNP6RR), and m445 (SNP4AA, SNP5AA), one, two or three trajectories do not belong to the main cluster but have a greater drop in the time to flowering between the observed value in the data and year 2021 forecast; they stay separated from other trajectories at the lower level until 2099. Thus, modeling predicts that these SNP groups are the most sensitive to the climate change. For sowing date DOY 294 for rcp26 for all models except m144 time to flowering drops significantly between an observed value in the data set and year 2021 forecast and later stays approximately at the same levels for all genotypes (see Figures 12 and 13). For rcp45, rcp60 and rcp85 and all models except m144 time to flowering for all genotypes drops significantly between an observed value in the dataset and year 2021 forecast and continues to decrease later for genotypes with relatively bigger forecasted value at 2021 so that for rcp60 and rcp85 all trajectories converge almost in one point at 2099.

Discussion
The lifecycle of the chickpea and its phenology is strongly predicted by its geographic origin and local environmental factors as demonstrated for domestic chickpea cultivars and landraces originating from the Mediterranean and southern India [8]. Here, we validate the hypothesis that adaptation to environments at sampling sites is blueprinted in genomes by dynamically modeling the genotype responses to changing climate conditions. Usually, a limited number of genetic coefficients are included in dynamical crop models, such as DSSAT or SSM [52]. We implemented a solution in which the analytic form of the impact of climatic factors and their interactions with genotype to the fitness of a plant to flower is automatically inferred by a stochastic optimization technique. We performed model parameterization on a wild chickpea data set collected at 21 different locations in Turkey [62] grown in two contrasting environments. GWAS analysis of the data identified six polymorphic sites responsible for the flowering time variation, independent of the environmental conditions [64].
The constructed model (7) is similar to that developed in [16] due to the absence of dependency on precipitation in the sowing-flowering period and the possibility to estimate resource accumulation. The model [16] has a linear function for the sowing-emergence interval and is multiplicative-for emergence-flowering. The model (7) is a combination of two types of functions, as only the data on flowering were available. The advantages of our model are the automatically constructed analytic form and the presence of genotypeto-climatic factors interactions.
The absence of dependency on precipitation can be considered a limitation of the constructed model, as watering is important for plant growth [5,7,82]. This can be due to the properties of the available data that include only one phenotype-the time to flowering, measured at two locations. Additional data, such as time to emergence, may improve this situation.
In comparison to regression models developed earlier [61], formula (7) is simpler and thus more interpretative. That is because the daily values of the climatic factors are used as the model input instead of summaries over different periods, which are not always well defined.
The estimated impact of genotype-by-environment interactions on model accuracy amounts to 77%, which is more than four times greater in comparison to [61] (17.2%). This can be due to the model adaptation to a more consistent data set.
Though the amount of accumulated data is constantly growing, the understanding of the role of temperature and day length in adaptation to different habitat types is still incomplete [83]. Our model predicts the diversity in response to climatic factors for different genotypes. According to the permutation test, minimal temperature had the maximal impact to the model-over 85%-while the day length impact was 14.4%. The impacts of maximal temperature, solar radiation and precipitation were almost negligible. While the minimal and maximal temperatures were strongly correlated for our data set (see Table S9) the precipitation had weak correlation with all other factors. It is worth noting that minimal temperature and day length, which had the largest effects on time to flowering, were moderately correlated (r = 0.66, p < 2.2 × 10 −16 ). The effect of maximal temperature was the same in all genotype pairwise comparisons, while the effect of minimal temperature was different for the majority of pairwise comparisons. The impact of minimal temperature was over 90% for genotypes #27, #17, #10, #9 and #4 (originated from Baristepe, Cudi and Sarikaya) and, in contrast, it was significantly smaller-approximately 70% for genotypes #29 and #30 (originated from Besevler, Sarikaya and Kalkan). The impact of day length for these two genotypes was about 30% in contrast to less than 25% for all other genotypes. Such diversity in response of different genotypes to climatic factors can be beneficial for breeding programs that target different environments.
The influence of the geographic site of origin on plant phenology was further confirmed by analyzing the impacts of climatic factors to accessions from different populations. We found that the impact of minimal temperature was positively correlated with the longitude at primary collection sites, while the impact of day length was negatively correlated, which can be explained by the adaptation of accessions from highlands to lower temperatures and those that were originated in valleys to shorter days.
To increase the reliability of the model predictions, we constructed an ensemble of eight models using bootstrap resampling. We found that the majority of coefficients that define the genotype-by-climatic factors interactions are significantly non-zero, according to a Mann-Whitney-Wilcoxon criterion. Thus, interactions of all SNPs with minimal day temperature, day length and solar radiation influence the time to flowering in the model ensemble. SNPs interacted with temperature in the previous model [61].
We used weather forecasts by MarkSim generator for Ankara, for each year from 2021 to 2099 and for four representative carbon dioxide concentration profiles (RCP) to predict the time to flowering for 30 genotypes and 16 SNP groups, using the ensemble of models. Though there are common trends in the forecasts, some genotypes and SNP groups have distinct trajectories. Time to flowering decreases almost monotonically for rcp60 and rcp85. Its value for the first year of forecast (2021) is different for each genotype but for all of them, it is less than the experimental value measured for the data set. A numerical study of the function of the daily impact of weather conditions on the fitness of a plant to start flowering showed that there are ranges of values of the minimum temperature and day length that are not realized in the experimental data, but provide positive values to this function and thus may reduce the time to flowering. These ranges of values can be reached, for example, with climate change, when the increase in the minimum temperature will occur earlier during the year, i.e., with a shorter day length. The acceleration of the onset of flowering was explained by climate warming and reduction of the length of the growing period by increasing evapotranspiration [83]. The considerable variation in crop responses to future climate was described in previous studies for South Asia and East Africa [84]. According to the constructed forecasts, the selection of appropriate genotypes can be used to match the growing period, which is in agreement with previous studies with the CROPGRO-chickpea model in Northeastern Ethiopia [43].

Conclusions
The model developed in this work illustrates the approach based on the combination of Grammatical Evolution and DEEP for construction of dependencies of plant phenotype on climatic factors. Due to extensive cross-validation and a negative control, the obtained parameters of the model provide high accuracy on the available data set.
The inclusion of genotype-to-climatic factors interactions accounted for 77% of accuracy in terms of root mean square error and allowed us to construct the ensemble of models to forecast the time to flowering for future years and different climate change scenarios. Though there are common trends in the forecasts, some genotypes and SNP groups have distinct trajectories.
Our results confirm the influence of the geographic site of origin on the chickpea's phenology and simultaneously, the diversity of behaviors of genotypes originated from closely located sites. The time to flowering is reduced in constructed forecasts for years 2021 to 2099 and a changing climate. Though there are common trends in the forecasts, some genotypes and SNP groups have distinct trajectories. This genotypic and phenotypic diversity of wild chickpeas can be used in breeding programs to produce varieties with the desired flowering time.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/agronomy11071389/s1, Figure S1: A map of southeastern Anatolia with collection sites, Figure S2: Histogram of time to flowering in the dataset, Figure S3: Histograms for coefficients for GxE, Table S1: List of SNPs associated with flowering time, Table S2: Number of times the reference allele for a SNP associated with flowering time is present in plant genotypes, Table S3: Reference allele frequency at six polymorphic sites associated with flowering time for different collection sites, Table S4: Relative impacts of climatic factors for accessions from different collection sites, Table S5: Comparison of impacts of factors between genotypes in per cent, Table S6: Comparison of impacts of factors between SNP groups in per cent, Table S7: Interactions between SNPs and control functions,  Table S8: Coefficients of 8 functions in ensemble, Table S9

Data Availability Statement:
The data analyzed in this study are available in this paper.