Weather ‐ Based Neural Network, Stepwise Linear and Sparse Regression Approach for Rabi Sorghum Yield Forecasting of Karnataka, India

: Sorghum is an important dual ‐ purpose crop of India grown for food and fodder. Prevailing weather conditions during the crop growth period determine the yield of sorghum. Hence, the crop yield forecasting models based on weather parameters will be an appropriate option for policymakers and researchers to develop sustainable cropping strategies. In the present study, six multivariate weather ‐ based models viz., least absolute shrinkage and selection operator (LASSO), elastic net (ENET), principal component analysis (PCA) in combination with stepwise multiple linear regression (SMLR), artificial neural network (ANN) alone and in combination with PCA and ridge regression model are examined by fixing 90% of the data for calibration and remaining dataset for validation to forecast rabi sorghum yield for different districts of Karnataka. The R 2 and root mean square error (RMSE) during calibration ranged between 0.42 to 0.98 and 30.48 to 304.17 kg ha − 1 , respectively, without actual evapotranspiration (AET) whereas, these evaluation parameters varied from 0.38 to 0.99 and 19.84 to 308.79 kg ha − 1 , respectively with AET inclusion. During validation, the RMSE and nRMSE (normalized root mean square error) varied between 88.99 to 1265.03 kg ha − 1 and 4.49 to 96.84%, respectively without AET and including AET as one of the weather variable RMSE and nRMSE were 63.48 to 1172.01 kg ha − 1 and 4.16 to 92.56%, respectively. The performance of six multivariate models revealed that LASSO was the best model followed by ENET compared to PCA_SMLR, ANN, ENET weather ‐ based models can be effectively utilized for the district level forecast of sorghum yield.


Introduction
Sorghum (Sorghum bicolor (L.) Moench) is an essential dual-purpose food crop of India grown in an area of 4.09 million hectares (Mha) with a production of 3.47 million tons and productivity of 849 kg/ha (www.indiastat.com, 2019). It is also being utilized in industries for ethanol, adhesives, starch and paper production. Sorghum is mainly cultivated under rainfed conditions during kharif (rainy) as well as during rabi (winter) season mainly concentrated in the southern and central India. With the advent and introduction of high yielding varieties and hybrids during the past few decades, India's sorghum production and productivity have shown remarkable growth. Climate plays a vital role in deciding the production and productivity of sorghum. Any changes in climate that lead to moisture reduction in the root zone might reduce the productivity and production to a great extent [1]. Karnataka and Maharastra are the two major states contributing significantly towards national sorghum acreage during rabi season. These two states occupy about 90% of rabi sorghum area and 81.5% of production in India (Directorate of Economics and Statistics data of 2012-13 to 2016-17). It is also grown in Andhra Pradesh, Gujarat, Rajasthan, Uttar Pradesh and Tamil Nadu in small areas primarily for fodder.
Rabi sorghum is grown during October to November months. Sorghum is a hardy crop that can tolerate higher temperatures and moisture stress to a larger extent. Sorghum requires a temperature range of 15 to 40°C with an annual rainfall ranging from 400 to 1000 mm for successful cultivation (https://www.indiaagronet.com/indiaagronet/crop%20info/jower.htm). It is grown on various soil types, but the clayey loam soil rich in organic matter found to be the ideal one. It requires a welldrained soils although it withstand water logging to some extent compared to maize. Sorghum grown during rabi season has excellent grain quality thus mainly used for food purpose in India, but several production constraints in rabi season such as moisture stress/drought, infestation of many pests and diseases have led to a decline in yield. As sorghum is a short-day plant, its sensitivity to photoperiod and temperature determines it's yielding potential [2]. Hence, this becomes necessary to study the importance of weather parameters on enhancing the yield and quality of sorghum. Timely and reliable estimate of crop acreage and yield estimation play an important role and helps todevelop food policies, economic plans and food security programs for a country [3]. Forecasting of crop yield well before harvest is crucial, especially in regions characterized by climatic uncertainties. These forecasts enable planners and decision-makers to predict and plan for how much to import or export depending upon the case of shortfall or of surplus production. Mainly there are two approaches to forecast crop yield: crop simulation and empirical statistical models [4]. Crop simulation models are process-based and input data-intensive. Though crop simulation models are precise, due to lack of availability of sufficient data sets make their application limited to smaller areas rather than their application to regional scales.
At the same time empirical statistical models can be used as a common alternative to processbased simulation models due to their simplicity and lesser input data requirement. Hence, empirical statistical models using historical crop yield and weather data with simple regression techniques have been largely used as simple alternative to process-based simulation models [5,6]. Since the regression based empirical models offer interactions between crop yield and weather parameters, they can be employed to update other advanced models [5,7,8]. Calibrated and tested statistical models can be used for successful crop yield forecasting based on weather information.
Predominantly sorghum is cultivated under rainfed conditions, its productivity is largely affected by weather elements [9]. Grain sorghum yield is significantly influenced by in-season management practices, season of growing, quantum of rainfall and its distribution, root zone soil moisture at sowing, and other prevailing climatic conditions [10]. The sorghum growth and completion of different phenophases is inversely proportional to variation in temperature and grain yield is directly proportional to rainfall variability [11]. Rabi sorghum yield of three genotypes, M-35-1, Vasudha and Yeshoda of western Maharashtra were quantified through statistical methods, correlation and regression analysis using meteorological variables of 15 years and developed yield prediction models which showed very high coefficient of determination ranging from 0.88-0.90 for M-35-1 to 0.75-0.80 for Yeshoda [12]. Similar to this, statistical yield forecasting model based on weather indices was developed for rice and wheat for eastern Uttar Pradesh [13] and found that models could explain the variability of yield to an extent of 51% to 79% for rice and 65% to 92% for wheat. Most of the earlier studies developed statistical yield forecasting models by using multiple linear regressions (MLRs) [14][15][16]. However, over-fitting when the number of samples is less than the number of predictors, and multi-collinearity when the independent variables are correlated, are some of the pitfalls of MLR [17]. To overcome these pitfalls, feature selection either by stepwise multiple linear regression (SMLR), least absolute shrinkage and selection operator (LASSO) or elastic net (ENET) method can be used. Further, feature extraction (e.g., principal component analysis) statistical techniques can also be used [18]. Only few researchers have developed forecasting models using PCA in combination with MLR [17,19]. Meanwhile the study to assess the performance of SMLR or LASSO or ENET or artificial neural network (ANN) along with PCA for forecasting the crop yield is rare. In this context, the main objective of our study is to develop and select a statistical forecasting model for sorghum using various regression techniques for major sorghum growing regions of Karnataka apart from evaluating the yield forecasting ability and efficiency of developed weather indices based models.

Study Area
The district-wise acreage and production estimates were compiled for 12 major sorghum growing districts of Karnataka, which contributes to 26% and 36.3% of acreage and production of the country, respectively. The details of the geographical position along with average yield and standard deviation for the districts under study are provided in Table 1. Table 1. The co-ordinates, altitude, mean and standard deviation of yield data for 12 major sorghum growing districts considered in this study.

Sl. No.
Name of the District Latitude Longitude Elevation (meters)

Data Collection
Time series data of sorghum yield (Sorghum bicolor L.) of nearly 26 years (1993-2018) for major sorghum growing districts of Karnataka has been obtained from the Directorate of Economics and Statistics, Government of Karnataka.
Grid level daily weather data was collected from the India Meteorological Department (IMD). The data on weather variables viz., maximum and minimum temperature (Tmax and Tmin °C), relative humidity (RH %), actual evapotranspiration (AET) and rainfall (mm) were used for calculation of different weather indices. The data corresponding to 39th to 4th standard meteorological week (SMW) spread over 18 weeks of the crop cultivation was used in this study. Weekly averages were worked out from the daily weather data for Tmax, Tmin and RH, whereas the weekly total of rainfall and AET was considered. Out of the total dataset of 26 years, 24 years data was used for model calibration while remaining two-year data was used for model validation.

Calculation of Weather Indices
Unweighted and weighted weather indices were developed for each weather variable. Unweighted indices were calculated by summation of individual or interaction of weather variables while the weighted weather indices were generated from the sum product of individual or interaction of weather variables and its correlation with detrended yield of sorghum. The first index (unweighted) represents the total amount of weather parameters received by the crop during the period under consideration. While the latter one (weighted) represents distribution of weather parameters with special reference to its importance in different weeks in relation to the detrended yield of crop [20]. This kind of weather indices based model approach was successfully used for forecasting yields of rice, wheat, sugarcane and potato for Uttar Pradesh, India [21]. The formula for the calculation of unweighted and weighted weather indices are mentioned below.
Unweighted weather indices: Weighted weather indices: where, Xiw/Xii′-Value of i-th/i′-th weather variable understudy in w-th week; r j iw/r j ii′w-Correlation coefficient of detrended yield with i-th weather variable/product of i-th and i′th weather variables in w-th week; m-Week of forecast Totally 42 weather variables ( Table 2) were generated as per the procedure explained by Das et al. [22]. One method used 30 weather variables excluding AET and other used all the 42 variables to know the impact of AET on yield of sorghum.

Detrending of Yield Time-Series Data
The fluctuations in yield time series data over the years is influenced by technology differences, climatic variability etc., leading to nonlinear and non-stationary trend which have to be removed before computing the basic correlation function to improve the prediction performance of the model [23]. Thus, detrending the yield data becomes necessary to remove the long-term mean changes from time series. One of the most popular methods is by performing a predetermined function i.e., simple linear regression model or a second-order polynomial regression model against time. Many researchers have applied this method to detrend the yield data of crops and to examine the impacts of climate variability [24][25][26][27]. In the present investigation, simple linear regression model has been applied to detrend the yield of sorghum.
Simple linear regression model can be fitted against time using the method of least squares.
where Yt is the crop yield at time t; time t is the predictor, and β0 and β1 are the coefficients. The residuals (detrended yield) of this model were used for indices calculation.

Multivariate Techniques
The details of multivariate analysis methods were used in this study to develop good crop yield prediction model are given as follows:

(i) Principal component analysis
In our study, principal component analysis (PCA) was performed using all the 42 or 30 weather indices for each district. PCA aims to reduce the dimensionality of a data set while retaining most of the information. All the input variables were normalized by subtracting the minimum from each value and divide by the range (x − min)/(max − min) before PCA analysis. The principal components (PCs) with eigenvalues more than 1 were only considered [28]. PCA was performed to reduce overfitting because of high dimension and interdependency among independent variables. The firstPC deciphers most variability present in the dataset, and each subsequent PCs deciphers remaining variability [29].

(ii) Shrinkage regression models
Shrinkage regression models are used for handling multicollinearity by penalising the magnitude of regression coefficients. Ridge regression retains all predictors in the final model by imposing different penalties. Least absolute shrinkage and selection operator (LASSO) is a regression analysis that performs both variable selection and regularization to enhance the prediction accuracy and interpretability of the statistical model [30]. Elastic net (ENET) regression combines both the LASSO and ridge regression methods by learning from their shortcomings to improve model performance [31]. The two parameters, namely lambda and alpha, are needed to be optimized. The optimal lambda values for ridge, LASSO and ENET regression were selected by minimising the average mean square error in leave-one-out cross-validation [32]. The alpha is fixed at 0 for ridge regression and 1 for LASSO regression. In ENET the alpha can take any value between 0 and 1. In our study, we have taken constant alpha value of 0.5 for ENET. The alpha value of 0.5 provides an equal combination of penalties whereas alpha < 0.5 will have a heavier ridge penalty applied and alpha >0.5 will have a heavier lasso penalty. In the present study, the 'glmnet' package was used for implementing LASSO and ENET in R software version 3.6.1 [33].

(iii) Artificial Neural Network
Artificial neural networks (ANNs) are computational models motivated by central nervous system equipped for machine learning and pattern recognition. They are usually presented as systems of interconnected "neurons" that can compute values from inputs by feeding information through the network [34]. In the current examination, three layers viz., input, hidden and output feed-forward ANN has been utilized. Each layer comprises of neurons or nodes interconnected with one another. The number of neurons in the input and output layer is fixed by the dataset utilized. The principle issue in the usage of ANN is to find the ideal number of hidden neurons or nodes. In the present study, the number of hidden nodes is selected by the "train" function of the "caret" package using the method "nnet" with 10-fold cross-validation in R software version 3.6.1 [35]. All 30 and 42 indices without or with AET were used as inputs, whereas yield was the dependent variable.

(iv) Principal components analysis-stepwise multiple linear regression and principal components analysis-artificial neural network
PCA followed by SMLR or ANN is the combination of feature extraction and selection method for data analysis. The multicollinearity problem among weather variables was sorted by using PC scores as regressors for SMLR and ANN to develop the crop yield models [17]. PCA decomposes the original data matrix X into two matrices P and T as X = TP t . The matrix P is usually referred to as loadings matrix and the matrix T represents an orthogonal score matrix. The superscript t indicates transpose of a matrix. Loadings are linear combinations of the original variables. The matrix T contains the original data in the rotated coordinate system.

Testing the Performance of the Model
The performance of statistical models used for forecasting was tested using R 2 , root mean square error (RMSE), normalized root mean square (nRMSE) and modelling efficiency (EF) which were calculated using the formula given below: Mi: model output; M andσ : mean and standard deviation of model output, respectively; Oi: observations; O and σ : mean and standard deviation of observations, respectively. R 2 values nearer to 1 and RMSE values close to 0 indicate the better model performance. According to nRMSE, the model performance is judged as excellent, good, fair and poor when the values are in the range of <10%, 10-20%, 20-30% and >30%, respectively [36]. EF ranges from −∞ to 1. EF values closer to 1 indicate accurate model predictions while 0 EF value means that the model does not predict better than the average of the observed values [37].

Results
Sorghum being a dry land crop, water stress management is of prime importance to achieve sustainable yield levels. Potential evapotranspiration (ET) is a key factor that decides the sorghum production under water deficit conditions and also an important factor required in crop simulation models to predict the yield. Hence, an attempt was made in the present investigation to know the impact of ET on sorghum yield without and with the inclusion of actual evapotranspiration in the predictive models.

Weather Variables without Actual Evapotranspiration (AET)
The sorghum yield forecasting model developed using LASSO is presented in Tables 3 and A1. The maximum R 2 was recorded in the Belagavi district (0.98) with RMSE of 31.47 kg ha −1 whereas, the minimum R 2 of 0.72 was observed in the Koppal district with 174.70 kg ha −1 RMSE. Most of the Z variates showed a positive influence on the yield of sorghum. The important meteorological variables included in the LASSO model were Tmax, followed by Tmin and RHI revealing the linear relationship between temperature and humidity on yield of the crop. The RMSE during validation varied from 88.99 kg ha −1 to 863.87 kg ha −1 . The nRMSEV during validation revealed that LASSO model performance for Dharwad (5.28%) and Gulbarga (8.19%) districts were found excellent whereas good performance was recorded for Belagavi, Raichur and Davanagere and it was fair for Bijapur, Haveri and Bidar. However, predictions were found unsatisfactory for Gadag, Bellary and Shivamogga districts with nRMSEV of 81.85%, 40.01% and 32.16%, respectively. We have worked out modelling efficiency factor (EF) of LASSO model over different districts which ranged from 0.99 for Belagavi district to 0.60 for Koppal district.

Weather Variables with Actual Evapotranspiration (AET)
The addition of AET as one of the weather variables had improved the LASSO model's accuracy to forecast rabi sorghum yield for different districts (Tables 3 and A3

Weather Variables without Actual Evapotranspiration (AET)
The data on ENET predictions of rabi sorghum yield revealed that Belagavi and Davanagere recorded maximum R 2 value of 0.98 and 0.97 with RMSE of 32.70 and 48.30 kg ha −1 , respectively. However, minimum R 2 was recorded in Koppal district (0.73) with RMSE 171.60 kg ha −1 . The RMSE obtained from validation ranged between 96.89 to 870.98 kg ha −1 . Though almost all the weather variables were included in the ENET model for yield predictions, Tmax, Rain and RHI were found most important. Validation of the ENET model revealed that the yield predictions were excellent for Dharwad (5.13%) and Gulbarga (8.67%) districts. However, nRMSEV value revealed that the predictions were good for Belgavagi, Raichur and Davanagere, fair for Bijapur, Haveri and Bidar whereas, weak predictions were noticed in Gadag, Bellary and Shivamogga districts with 75.75%, 40.34% and 33.05%, respectively (Tables 4 and A2). The modelling efficiency ranged from 0.62 for Koppal to 0.99 for Belagavi district.

Weather Variables with Actual Evapotranspiration (AET)
The data pertaining to ENET model predictions including AET is depicted in Tables 4 and A4. The results showed that the maximum and minimum R 2 were recorded in Davanagere (0.99) and Koppal (0.74) district. The RMSE of calibrated data ranged from 22.47 to 169.05 kg ha −1 recorded in Bijapur and Koppal district, respectively. However, the highest RMSE during validation was recorded in Bellary (898.80 kg ha −1 ) and the lowest was in Gulbarga (90.81 kg ha −1 ). Based on nRMSEV values, the model performance was found excellent for Dharwad, Gulbarga and Davanagere with 4.59, 7.71 and 8.94%, respectively. Functional predictions were noticed for Belagavi and Raichur however, it was fair for Bijapur, KoppalShivamogga and Bidar. The model's performance was found inadequate for Gadag, Haveri and Bellary districts which recorded more than 30% of nRMSEV. The modelling efficiency of ENET was improved for Bijapur, Gulbarga, Haveri, Raichur, Bellary, Bidar and Davanagere districts by inclusion of AET.

Weather Variables without Actual Evapotranspiration (AET)
The sorghum yield predictions developed using PCA feature extraction method followed by SMLR showed that even though maximum R 2 of 0.94 was recorded for Haveri district with RMSE of 61.42 kg ha −1 during calibration, the predictions were found to be poor based on nRMSEV value (31.16%). This was followed by Davanagere district (R 2 = 0.90 and RMSE = 96.99 kg ha −1 ) having excellent yield predictions (9.90%). Minimum R 2 of 0.54 was recorded for Koppal with 187.56 kg ha −1 RMSE. The maximum RMSE during validation among different districts was recorded in Bellary district (761.81 kg ha −1 ) while minimum was observed in Gulbarga (117.14 kg ha −1 ). It can be noticed that the yield predictions of sorghum developed by PCA_SMLR were found excellent for Dharwad, Davanagere and Gulbarga districts with nRMSEV value of 4.83, 9.90 and 9.94%, respectively. However, Belagavi and Shivamogga showed good performance under PCA_SMLR model while it was fair for Bijapur, Raichur and Bidar with nRMSEV of 27.75%, 26.77% and 21.33%, respectively. In contrast, the model's performance was poor for Gadag, Haveri and Bellary districts (Table 5).

Weather Variables with Actual Evapotranspiration (AET)
The yield predictions as obtained by PCA_SMLR adding AET as one of the weather variables did not improve the model performance ( Table 5). The R 2 ranged from 0.51 to 0.90. The minimum RMSE was recorded in Dharwad (58.22 kg ha −1 ) with excellent prediction having an nRMSEV value of 5.07%. Maximum RMSE of 144.09 and 144.07 kg ha −1 was recorded in Bellary and Koppal district, respectively. The model performance was also found excellent for Shivamogga and Gulbarga districts with 9.06 and 9.16%, respectively other than Dharwad. Fair predictions were found for Belagavi, Bijapur, Haveri, Raichur and Bidar while it was poor for Gadag, Koppal and Bellary with nRMSEV of 92.56, 31.47 and 38.78%, respectively.

Weather Variables without Actual Evapotranspiration (AET)
The models' predictive performance as indicated by R 2 and RMSE during calibration using ANN alone ranged between 0.49 to 0.97 and 42.20 to 254.98 kg ha −1 , respectively. During validation with independent dataset, the RMSEV and nRMSEV ranged between 95.83 to 1265.03 kg ha −1 and 4.81 to 58.59%, respectively. The ANN model's performance was found excellent for Dharwad, good for Bijapur, Gulbarga and Koppal, fair for Bidar and Davanagere districts (Table 6).
Meanwhile, for PCA_ANN model, R 2 varied between 0.42 to 0.96 with RMSE ranging from 66.01 to 304.17 kg ha −1 and recorded RMSEV and nRMSEV between 91.37 to 1028.88 kg ha −1 and 4.49 to 56.09%, respectively. However, the performance of PCA_ANN model was found better with respect to nRMSE of validation as compared to ANN with excellent prediction for Dharwad, Shivamogga and Davanagere (nRMSEV = 4.49, 8.71 and 9.65%, respectively). It was found good for Bijapur, Gadag and Gulbarga, fair for Haveri whereas it was found poor for remaining districts viz., Belagavi, Koppal, Raichur, Bellary and Bidar ( Table 7). The variable importance of ANN (10 most important indices) and PCA_ANN has been depicted in Figures 1 and 2. The variable importance of ANN revealed Z21 as the most important variable followed by Z121. Among the PCs, PC1 was the most important variable to be included in the model followed by PC2, PC3, PC4 and time.

Weather Variables with Actual Evapotranspiration (AET)
During calibration, the R 2 and RMSEobtained by ANNincluding AET, range between 0.53 to 0.95 and 37.63 to 261.63 kg ha −1 , respectively (Table 6). ANN's model performance was found excellent for two districts, namely, Dharwad (4.44%) and Haveri (8.86%). The predictions were found fair for Bijapur, Gulbarga, Koppal, Raichur and Shivamogga districts based on nRMSEV values. During validation, the RMSE was found minimum for Haveri (66.97 kg ha −1 ) while the maximum was recorded in Bellary district (1172.01 kg ha −1 ). b The PCA_ANN model, with the inclusion of AET did not show much effect on yield prediction accuracy. The maximum R 2 of 0.95 was observed in Haveri and minimum R 2 was recorded in Raichur (0.38). The RMSE during calibration ranged between 54.47 to 308.79 kg ha −1 , while during validation, it ranged between 63.48 to 1040.65 kg ha −1 . The excellent model performance was noticed in only two districts namely Dharwad and Davanagere (nRMSEV = 4.16 and 5.11%, respectively). Good predictions were recorded for Belagavi and Gulbarga district while fair predictions were obtained for Bijapur and Haveri and for the remaining districts, predictions were found poor (Table 7).

Weather Variables without Actual Evapotranspiration (AET)
The data obtained for ridge regression model to predict yield levels of sorghum revealed that maximum R 2 (0.85) and least RMSE (79.84 kg ha −1 ) during calibration was recorded in Bijapur district while the minimum R 2 (0.58) and the highest RMSE (269.44 kg ha −1 ) was in Raichur. However, the RMSE during validation was found more in Bellary (1071.57 kg ha −1 ) and the lowest was recorded in Haveri (84.59 kg ha −1 ). The nRMSEV values depicted that the model performance was excellent only for Dharwad district with 3.98%. Most of the districts recorded nRMSE values greater than 30%depicting its poor performance for Bellary, Gadag, Bidar, Belagavi and Raichur districts with 49.63%, 46.81%, 38.06%, 34.81% and 33.46%, respectively. The modelling efficiency of ridge regression model ranged from 0.40 (Shivamogga) to 0.67 (Haveri) indicating poor efficiency compared to the other multivariate models (Table 8).

Weather Variables with Actual Evapotranspiration (AET)
The inclusion of AET as one of the weather parameters in ridge model has not shown that much improvement in the model performance. Maximum R 2 (0.86) and least RMSE (73.22 kg ha −1 ) during calibration was observed in Bijapur district. The minimum R 2 of 0.58 was in Bellary however, the highest RMSE during calibration was found in Raichur (254.31 kg ha −1 ). The RMSE during validation ranged between 86.09 to 1102.69 kg ha −1 . The predictions were found excellent for Dharwad district with 4.25% nRMSE during validation, good for Haveri (11.40%) while the model performance was poor for Bellary, Bidar, Raichur, Belagavi and Gadag.

Cross-Comparison of Models Based on their Performance
The performance of the six multivariate models used in this study was compared to know the better performing model to predict rabi sorghum yields with more accuracy. The cross-comparison revealed the order as LASSO > ENET > PCA_SMLR > PCA_ANN > ANN > Ridge where AET was not included as a weather variable. In case of AET included models, the order of performance was found as LASSO > ENET > PCA_SMLR > ANN > PCA_ANN > Ridge.
All the six multivariate models exhibited excellent performance in predicting the sorghum yield for Dharwad district without and with AET. For Gulbarga district, predictions were found excellent using LASSO, ENET and PCA_SMLR models without and with AET. For Shivamogga district, PCA_ANN model gave excellent predictions without AET whereas it was PCA_SMLR model with AET. PCA_SMLR, PCA_ANN models performed excellently in predicting sorghum yield of Davanagere without the inclusion of AET whereas LASSO, ENET as well as PCA_ANN gave excellent predictions for Davanagere district including AET as one of the weather variables (Table 9).

Discussion
The influence of weather parameters on sorghum yield has been reported by many researchers [10,38,39]. Rainfall at the seedling stage while minimum temperature and rainfall at the grain filling stage had greater influence on sorghum yield as reported by eshmukh et al. [40] who analysed the weather data of more than 15 years (2001-2015). The sorghum phenology is inversely proportional to change in temperature and grain yield is directly proportional to change in rainfall [11]. A preharvest estimate of crop yield is of prime importance in formulating agriculture policies of the government. Thus, in the present study, an attempt was made to develop reliable models based on weather variables using recent technologies like penalized regression, neural network etc., to forecast rabi sorghum yield with minimal prediction error rather than relying on basic statistical model fetching simple regression equation. Most commonly used weather parameters in the models are temperature, humidity and precipitation. As it is evident from several findings that evapotranspiration (ET) is a predominant key factor in deciding the yield levels of rabi sorghum, the present study was executed with and without AET. Field experiments conducted by Howell and Hiler [41] revealed that the yield response of grain sorghum was highly dependent on the timing of the ET deficit. Plaut et al. [42] observed a sigmoidal ET-yield function for sorghum. The outcome of the present findings also revealed that the inclusion of AET as one of the weather variables has improved the performance of some yield prediction models. Kumar et al. [43] assessed SMLR and LASSO regression models to predict wheat yield which showed that R 2 , RMSE and MAPE were 0.81, 195.90 and 4.54 per cent, respectively for SMLR. In contrast it was 0.95, 99.27, 2.7 percentage, respectively for LASSO regression [44].
The regression equations through stepwise linear regression analysis to estimate grain yield of rabi sorghum was developed which revealed the greater influence of maximum temperature on sorghum yield than other weather parameters [39]. In the present study, developed model equations showed the greater influence of maximum temperature on rabi sorghum yield than other weather variables. Singh et al. [43] used LASSO model for forecasting wheat yield at different growth stages of crop. The mean square error (MSE) and RMSE of LASSO regression were better than SMLR, which leads to the improvement of crop yield forecasting. In the present study, an attempt was made to forecast yield using the LASSO model which also improved yield forecasting accuracy with better RMSE values during prediction.
The results of PCA_SMLR attempted in the present study to predict rabi sorghum yields are line in with the outcomes of Sharma et al. [45]. They analyzed the effect of high precipitation on rainfed sorghum yields using MLR with and without PCA. The results showed that although the R 2 values appear smaller, the regression relationships were found significant. The forecasts of rabi sorghum yield developed by ANN and PCA_ANN in the present study are in corroboration with the findings of Uno et al. [46] reporting the potential of ANN for the development of in-season yield mapping and forecasting systems for corn in eastern Canada and they opined that greater prediction accuracy (about 20% validation RMSE) was obtained with an ANN model than with conventional empirical models based on normalized difference vegetation index, simple ratio or photochemical reflectance index. The present findings are also in confirmation with Das et al. [47] while comparing ANN and PCA_ANN for rice and coconut yield forecasting for India's west coast. Arvind [48] carried out multistage wheat yield estimation using six different multivariate weather-based models viz., SMLR, PCA_SMLR, ANN, PCA_ANN, LASSO and ENET. The nRMSEV ranged between 5.0 to 15.9% for ENET, 4.2 to 16.9 for LASSO, 3.7 to 20.0% for SMLR, 6.2 to 15.9% for PCA-SMLR, 7.8 to 22.3% for ANN and 11 to 16.1% for PCA-ANN revealing excellent performance of LASSO and ENET compared to other models. Similar outcomes were obtained in the current investigation wherein, nRMSEV varied between 5.28% to 81.85% and 4.54% to 65.45% without and with AET, respectively for LASSO, 5.13% to 75.75% and 4.59% to 69.08% without and with AET, respectively for ENET which may be due to prevention in overfitting and reducing the model complexity by penalizing the magnitude of regression coefficients.

Conclusions
In the present study, six multivariate models were examined to forecast rabi sorghum yield based on different weather variables without and with AET. The results revealed that the performance of LASSO model was found more reliable based on the nRMSE values obtained during validation compared to other multivariate models considered in this study. The next best model was ENET which performed similar to LASSO. Thus, it can be concluded from the present findings that LASSO and ENET were the best models for district-level yield forecast of sorghum in Karnataka compared to PCA_SMLR, ANN, PCA_ANN and ridge regression models. Prevention of overfitting and reducing the regression coefficient by penalization has made LASSO and ENET perform better. 8