Exploration of Machine Learning Approaches for Paddy Yield Prediction in Eastern Part of Tamilnadu

: Agriculture is the principal basis of livelihood that acts as a mainstay of any country. There are several changes faced by the farmers due to various factors such as water shortage, undefined price owing to demand – supply, weather uncertainties, and inaccurate crop prediction. The prediction of crop yield, notably paddy yield, is an intricate assignment owing to its dependency on several factors such as crop genotype, environmental factors, management practices, and their interactions. Researchers are used to predicting the paddy yield using statistical approaches, but they failed to attain higher accuracy due to several factors. Therefore, machine learning methods such as support vector regression (SVR), general regression neural networks (GRNNs), radial basis functional neural networks (RBFNNs), and back-propagation neural networks (BPNNs) are demonstrated to predict the paddy yield accurately for the Cauvery Delta Zone (CDZ), which lies in the eastern part of Tamil Nadu, South India. The performance of each developed model is examined using assessment metrics such as coefficient of determination (R 2 ), root mean square error (RMSE), mean absolute error (MAE), mean squared error (MSE), mean absolute percentage error (MAPE), coefficient of variance (CV), and normalized mean squared error (NMSE). The observed results show that the GRNN algorithm delivers superior evaluation metrics such as R 2 , RMSE, MAE, MSE, MAPE, CV, and NSME values about 0.9863, 0.2295 and 0.1290, 0.0526, 1.3439, 0.0255, and 0.0136, respectively, which ensures accurate crop yield prediction compared with other methods. Finally, the performance of the GRNN model is compared with other available models from several studies in the literature, and it is found to be high while comparing the prediction accuracy using evaluation metrics.


Background
Due to the proliferation of the global population and living standards, demand for food grains is predicted to upsurge by 60%, notably in the middle of the 21st century [1].The present change in climatic conditions threatens the crop yield that raises the risk to the farmers and associated dependence.Considering this urgent need, sustainable crop prediction is mandatory through a forecasting system that can precisely evaluate the crop conditions, crop kind, and its yield [2]. Crop yield methods are time-dependent and nonlinear by nature due to the amalgamation of an extensive array of interrelated factors influenced by non-arbitration and exterior features [3].Conventionally, farmers made the crop yield prediction based on their previous practices and reliable historical evidence to make essential cultivation decisions.Notably, statistical methods adapt several regression approaches to associate historical crop yields to historical weather statistics that can be used to create yield predictions under changed weather settings [4] such as the availability of water resources, rainfall, temperature, drought, etc. Due to the swelling accessibility and enhanced quality of the observed historical data, statistical methods have a great scale of accuracy [5,6].In addition, remote sensing-specifically, satellite and airborne multispectral scanning, photography, and video-enables precision weed management over the generation of sensible and precise weed maps [7].Furthermore, recently developing machine learning (ML) algorithms have greater ability of statistical methods to discover weather-yield relations [8].
Machine learning (ML) approaches are used for crop prediction using several mathematical and statistical methods, namely artificial neural networks, fuzzy information networks, decision tree, regression analysis, clustering, principal component analysis, Bayesian belief network, time series analysis, and Markov chain model.The application of these machine learning techniques in crop cultivation shows more tremendous advantages due to the availability of many data from several resources to obtain hidden knowledge [9].Considering the need for machine learning techniques, a wide range of literature surveys is essential to derive a novel proposition to predict the crop yield's accuracy further.

Existing Methods-ML Algorithms for Yield Prediction
The forecasting agriculture process plays a crucial role in yield prediction using several advanced methodologies.There are dozens of research works that have been already carried out to attain high accuracy of crop yield prediction.Some of the notable pieces of literature are illustrated below (Table 1):

Ref No
Year Methodologies Inferences [10] 2016 Weighted histograms regression  Proposed the design strategy for selecting soybean varieties to exploit maximum yield in the best season based on the knowledge attained from heterogeneous historical data. The outcomes with the existing regression algorithm proved that the proposed algorithm offered an optimal selection of seed varieties.
[11] 2016 Regression Analysis (RA)  Focused on analyzing the environmental constraints that impact the crop yield, namely area under cultivation, annual rainfall, and food price index. RA analyzed the factors and groups them into explanatory and response variables that aids in attaining a decision.
[12] 2017 Gaussian process component and spatio-temporal structure  Presented a scalable, accurate, and inexpensive technique to forecast crop yields using accessible remote sensing statistics (open source). The proposed scheme improved the accuracy of the yield prediction pointedly along with a novel dimensionality reduction technique.

Generalized regression neural network and radial basis function neural network
 The suggested method forecasted the yield of potato crops that were sown in flat and rough regions.Among the two methods, a generalized regression neural network was greater accuracy.
[13] 2017 Improved genetic algorithm-back propagation neural  Proposed algorithm used to advance the yield-irrigation water model for forecasting the yield for various irrigation schemes under subsurface drip irrigation.

network prediction algorithm
 It offered more precise predictions of the yield with an average error of about only 0.71%.[8] Remote sensing and machine learning algorithms  Discussed research growths accompanied within the last fifteen years on machine learning-based methods for accurate crop yield prediction and compared with remote sensing methods. Concluded that the fast developments in sensing tools and machine learning techniques could deliver cost-effective and wide-ranging resolutions for improved crop and decision making. [14]

Multiple linear regression and radial basis function artificial networks
 Demonstrated the applications of the proposed algorithm to compute the probability of working days. Performance criteria were considered, such as RMSE, MAPE, and R 2 . Radial basis function offered the highest R 2 compared with multiple linear regressions.
[15] Aggregated rainfall-based modular artificial neural networks and support vector regression  Predicted the extent of monsoon rainfall using modular artificial neural networks. Predicted the extent of chief Kharif crops yielded considering the rainfall data and area using support vector regression. [16] Hybrid particle swarm optimization imperialist competitive algorithm, support vector regression  Evaluated the performance of a proposed method to forecast apricot yield and identified significant factors affecting the yield. With the suggested scheme, greater prediction accuracy with a root mean square error (RMSE) of 12% of the average yield and 50% of the standard deviation for the validation dataset using predicted weather data.
[20] Artificial neural network  Evaluated five different ANN methods, namely generalized feed-forward, multilayer perceptron, Jordan/Elman, principal component analysis, and radial basis function. Among these models, multilayer perceptron offered the best prediction.
[21] Machine learning and big data  Various machine learning algorithms were examined to verify the usefulness in predicting crop yield.
 Prediction of crop yield using machine learning methods in big data computing pattern was demonstrated.[22] Support vector machine, random forest, and neural network  Used the enhanced vegetation index from MODIS and solarinduced chlorophyll fluorescence from GOME-2 and SCIAMACHY as metrics to predict crop production. The machine learning method offered the best yield prediction compared with the regression method. [23] Hybrid genetic algorithm-based back-propagation neural network (GA-BPNN) model  The proposed scheme was used to offer complimentary data on maize growth at the vital growth phase. The hybrid concept enhances the yield significantly compared with the pure back-propagation scheme. [24] Proximal Sensing (PS) and machine learning algorithms  PS surveyed the soil and crop variables potentially for variations in yield. Four algorithms were demonstrated: linear regression, elastic net, k-nearest neighbor, and support vector regression to forecast potato yield from soil and crop data properties collected over proximal sensing. [25] Partial least squares and radial basis function neural network.
 Carried out to estimate the feasibility of using Vis/near-infrared spectroscopy to determine the potassium concentration and petioles of distinct variety and mixed lettuce leaves of two varieties. Partial least squares offered R 2 of 0.83, residual predictive deviations of 1.95, and RMSE of 39.07.

Objectives
Considering the above inferences, crop yield prediction needs a more accurate and reliable method to attain more precision using evaluation metrics.Based on these needs, this work focused on the following objectives:

•
To assess the paddy crop yield data from high potential real-time locations.

•
To estimate the crop yield prediction using a statistical model (MLR).

•
To demonstrate advanced machine learning techniques such BPNNs, RBFNNs, GRNNs, and SVR for crop yield prediction.

•
To analyze the adapted machine learning techniques using evaluation metrics such as R 2 , RMSE, MAE, MSE, MAPE, CV, and NSME.

•
To select and recommend the best accurate prediction technique to evaluate the crop yield.

Data Collection
The historical data (paddy crop) of the CDZ, which lies in the eastern part of Tamil Nadu, South India, is considered for this study.The CDZ has a total geographic land area of 14.47 lakh hectares.It covers several districts of Tamilnadu namely Thanjavur, Thiruvarur, Nagapattinam, Trichy, Ariyalur, Cuddalore, and Pudukkottai districts (Figure 1).In this zone, paddy is the principal crop.In the rice-based cropping system, it is either single or double cropped.In this work, 50 fields in the Thirichirapalli, Perambalur, and Pudukottai districts in CDZ (Figure 1) are collected for two seasons (June 2018-September 2018 and October 2018-January 2019.).There are two main reasons for selecting these three regions: the foremost percentage of paddy yield is harvested in these regions of Tamilnadu; the soil types plays critical role i.e., Thirichirapalli mostly has alluvial soils, Perambalur has mostly heavy clay soils, and Pudukottai consists of alluvial and laterite soils.To get a better overview of the independent variables (features), they can be grouped into soil information (pH value), humidity (rainfall), solar information (temperature), nutrients (nitrogen, phosphorus, and potassium), and field management (Urea).The data were collected from the meteorological department of India [26], agricultural department of Tamilnadu [27], and the statistical department of Tamilnadu [28].The complete description of the considered parameters is illustrated in Table 2. Furthermore, the data collection of the selected sites such as mean rainfall, temperature, fertilizer, nitrogen, phosphorous, potassium, pH value, and yield are obtained from the digital sources.A total of 280 samples are analyzed initially, and the repeated and insignificance data are merged to attain 100 rows of data.From the finalized data, a minimum, maximum, mean, and standard deviation are attained and illustrated in Table 3.The central part of data collection is creating a training network that can forecast the atmospheric components, namely temperatures, rainfall, etc., for a specific station.Rainfall is a significant factor of agriculture production, and its dissimilarity can affect crop production.The temperature is one of the essential factors of the metrological parameter that supports any crop growth.In this work, data are collected from online sources such as data.gov.inand indiastat.org.The datasheets are prepared based on the retrieved sources for analysis.Notably, this work adapted annual abstracts about a crop for two periods in a year.The input datasets are prepared with several samples and arranged in an Excel sheet and later loaded into MATLAB for analysis.Among the loaded datasheet, 70% of the datasets are used for training, and 30% of the dataset is considered for testing.The test data offer an independent degree of neural network performance employing MSE.
To attain the designated objectives, the following steps need to demonstrate the application of the selected machine learning algorithms.
Step 1: Collect the data using available sources.
Step 3: Develop the machine learning model to assess the crop yield.
Step 4: Predict the crop yield using adapted techniques.
Step 5: Determine the evaluation metrics for each model.
Step 6: Recommend the best-rated technique for crop yield using observed outcomes.

Statistical Analysis
Statistical analysis is adapted primarily, namely multiple linear regression (MLR), to determine the effect of some independent variables on dependent variables to compute the linear dependence of the variables [29].It defines an association between known (x) and unknown variables (y) based on the random noise and its parameters, and it is expressed as below: where   denotes a predicted rate;   = (1,  1 ,  2 ,  3 , … . .  ) are the terms for the explanatory vector variables;  = ( 0 ,  1 ,  3 , … . .  )  represents a vector coefficient;   denotes a random error for i th observation.

Machine Learning Techniques
Soft computing is a collection of practices applied in many fields and falls under several computational intelligence categories.It includes fuzzy systems (FS), evolutionary computation (EC), artificial neural network (ANN), probabilistic reasoning (PR), etc.As stated earlier, the ANN model is adapted in this work to determine the performance for crop yield prediction, namely support vector machine (SVM), generalized regression neural network (GRNN), radial basis function neural network (RBFNN), and back-propagation neural network (BPNN).There are seven input parameters: rainfall, fertilizer, temperature, nitrogen, phosphorus, potassium, soil pH, and one output can be obtained, likely crop yield.The complete process is carried out in MATLAB 2018(b) software to implement the models based on the proposed algorithm.
Further, normalization is adapted to prepare data reduction and remove the data redundancy for machine learning applications.It aids in amending the numeric columns in the specified dataset into a standard scale without deforming in their ranges.Generally, it must lie in the data range of 0 and 1, which is essential before applying to any soft computing models.Typically, three types of normalization techniques are used: Min-Max normalization, Z-score normalization, and decimal scaling.In this work, the Min-Max normalization technique is considered for data preparation using the following equation: where Min (P) and Max (P) indicate the minimum and maximum value of attribute P, respectively.

Support Vector Machine (SVM)
SVM is a novel supervised computational machine learning method for classification and regression that depends on statistical learning theory advancements (Figure 2), and the required input parameters are shown in Table 4.It can train nonlinear models based on the principles of structural risk minimization (SRM) that minimize an upper bound on the generalization errors rather than empirical error minimization as implemented in neural networks [15].It was realized based on the Vapnik-Chervonenkis theory (VC) conventionally and emerged as a general mathematical framework recently to determine dependencies from finite sample sets.This theory integrates fundamental concepts with associated learning principles, precise formulation, and a self-consistent mathematical model.Donald F. Specht proposed GRNN with a variation of the radial basis function neural network (RBF) in 1991.It is a one-pass neural network with highly parallel construction [30].It is an algorithm based on function approximation (estimation) and a statistical technique named kernel regression.GRNN can be trained very quickly, and data propagated forwarded only once, unlike other neural network algorithms.The desired output can be determined by considering an average of assigned weights of the training output data set.
The weight of each result can be calculated using the Euclidean distance function between the training and testing data.If the Euclidean distance is more than the total weight, the output is less than the additional weight and they should be assigned to the output.
GRNN comprises four layers: an input layer, a pattern layer, a summation layer, and an output layer (Figure 3).The size of input neurons in the input layer depends on the total number of the experimental parameters.The input layer feeds the input to the pattern layer, and each neuron presents a training pattern and output.The primary purpose of the pattern layer is to calculate the Euclidean distance along with the activation function and forward it to the summation layer.The summation layer has two sub-parts: a numerator (N) and denominator (D).The numerator part consists of the addition (summation) of the multiplication of training output data and activation function, and the denominator part has the acquisition of all specified activation functions.This summation layer feeds both the numerator and denominator parts to the output layer.

Radial Basis Functional Neural Network (RBFNN)
An ANN adapts RBF as an activation function such as the input layer, hidden layer, and linear output layer.It is derived from the concept of function approximation, which is a well-known and popular alternative model to the MLP that has a more straightforward structure and a quicker training process [14].Basically, it is used to detect the minimum number of hidden layers or perceptions in a single hidden layer until a minimum error value is stretched.The input layer has nodes that match the number of datasheet input parameters.An invisible layer found its response using a radial basis function in every perceptron.In general, a Gaussian function and an output layer create a linear weighted sum of hidden neuron outputs and supply the response to the network.The structure of the RBFNN network is depicted in Figure 4 and the required input parameters are illustrated in Table 5.An ANN adapts several layers that can approximate multifaceted mathematical functions to process the data.BPNN is the most broadly adapted algorithm for an ANN application that takes an error gradient as a back-propagation [13].In this work, the proposed BPNN algorithm is considered to adjust the simulated value to attain more crop prediction accuracy.It comprises four different stages: initialization of weights, feed-forward, back-propagation of errors, and updating of weights and biases.The comprehensive architecture of the proposed model is depicted in Figure 5 and the input, hidden, and output parameters are given in Table 6.

Model Performance
Different standard statistical performance evaluations evaluate various conventional predictor model performances.The most widely used statistical measures are coefficient of determination (R 2 ), root mean square error (RMSE), mean absolute error (MAE), mean squared error (MSE), mean absolute percentage error (MAPE), coefficient of variance (CV), and normalized mean squared error (NMSE).The derivative functions of such parameters are given in the following equations [5,[31][32][33]: CV =     (8) where Ai and Pi are measured and predicted values, respectively; N is the number of observations;   and   are the X and Y value of observation 'i', respectively;  ̅ and  ̅ are the mean X and Y, respectively; σx and σy are the standard deviations of X and Y, respectively; and Si represents an intertemporal variance.

Results and Discussions
In this section, statistical and proposed machine learning models are demonstrated in a virtual platform.The statistical analysis adapts several components: rainfall, fertilizer, temperature, nitrogen, phosphorous, and potassium.The higher correlation and lower error scale model will be considered the best technique for crop yield (kg/acre) prediction.

Statistical Analysis
The first case represents the outcome of the statistical approach, namely MLR that offers a multiple R and R 2 of about 0.9427 and 0.8888 (Table 7).Then, the adjusted R 2 and standard deviation are observed as 0.8803 and 0.6862, respectively (Table 3).The crop yield (Q/ha) between prediction and measured data is depicted in Figure 6.Moreover, MSE and RMSE metrics offer an average range likely of 0.5247 and 0.6586, respectively.However, these outcomes are not excellent, because MLR is the method that uses simple linear association among a dependent and independent variable.This technique adapted a least squares model, which is simple in design, but outcomes are not great.This method offers moderate results for developing models to restructure climate variables from tree ring services (error percentage is higher i.e., −14% and +13%) [34] but not shown potential fallouts for crop yield prediction.Therefore, the crop yield prediction can be further improved using machine learning techniques as illustrated in subsequent sections.

Machine Learning Techniques
Several studies demonstrated the prediction of paddy yield using machine learning methods [35][36][37][38][39].However, there is a need to enhance the prediction accuracy for reliable crop yield.As discussed above, R 2 , RMSE, MAE, MSE, MAPE, CV, and NMSE metrics are applied to evaluate the accuracy of the proposed ANN algorithm, such as SVM, GRNN, RBFNN, and BPNN for crop yield prediction.Furthermore, each ANN model generates a plot that represents the crop yield prediction against original yield data (Figure 7).From the illustrations, all the metrics are assessed using the above-mentioned formulas.Then, the metrics are evaluated to test the accuracy of the considered algorithms between predicted and original crop yield.The effectiveness of the proposed results is compared with the recent pieces of literature.Notably, Elavarasan et al. suggested deep reinforcement learning to develop the prediction scheme [35].It was noted that the efficiency and accuracy of the proposed scheme were sophisticated compared with other models, likely LSTN (Long Short-Term Network), BAN (Big Ass Number), and RAE (Regularized Auto Encoder).However, limited evaluation metrics were considered for precision prediction for paddy cultivation.Notably, CV and NMSE were not adapted to ensure the better precision of the proposed model.Furthermore, Gopal et al. [36] designed a hybrid model such as MLR-ANN for crop yield prediction.In this work, MLR's coefficients and their bias were engaged in initializing.The suggested hybrid model displayed improved prediction precision compared with SVR, K-NN (K-nearest neighbors), and RF (random forest).The precision evaluation metrics of the SVM showed a better result compared with BPNN and RF.However, there was little consideration of the evaluation metrics that require detailed evaluation to ensure the effectiveness of the suggested algorithms.Some of the work proposed the novel algorithms such as Hybrid CNN-RN, MARS, and DNN for corn and soybean yield prediction.However, only RMSE and correlation coefficient evaluation metrics are considered for prediction [18,40].For paddy yield prediction, few works proposed RF, MLR-ANN, and DT.However, the evaluation metrics MAE, RMSE, and R were considered [22].
Furthermore, the researcher carried out tomato yield prediction [41], but only the RMSE metric was considered.In addition, a prediction of palm yield was proposed using genetic algorithm, but only R 2 and MSE were considered for evaluation [42].Additionally, wheat and barley yield predictions were proposed using the CNN algorithm; however, only the MAPE metric was considered [43].Consolidating all these inferences, the adaptation numbers of evaluation metrics are not great, and therefore, this work focused on computing the wide range of evaluation metrics such as R 2 , RMSE, MAE, MSE, MAPE, CV, and NMSE using the proposed algorithms to ensure their effectiveness.In addition, paddy yield prediction using SVM, RBFNN, GRNN, and BPNN are not demonstrated by the researchers remarkably.
To simplify the comparative analysis between various algorithms, individual metrics are presented in Figure 8, namely R 2 , RMSE, MAE, MSE, MAPE, CV, and NMSE.As stated above, higher accuracy represents greater R 2 (closer to unity) and lower RMSE, MAE, MSE, MAPE, CV, and NMSE.In line with this statement, it is proved that the GRNN algorithm performed well compared with other ANN and statistical methods.Notably, the R 2 metric of GRNN attained a more excellent value of about 0.9863, which is far better than other methods (Figure 8a).Furthermore, it is perceived that the RMSE metric of GRNN shows a lower value of about 0.2295 (Figure 8b), representing the accuracy of the crop yield compared with other adapted schemes.Similarly, the metrics of MSE, MAE, MAPE, CV, and NSME show the least range for GRNN models: about 0.1279, 0.0526, 1.3439, 0.0255, and 0.0136, respectively which are superior compared with other methods such as MLR, SVM, RBFNN, and BPNN (Figure 8b-f).All these outcomes attest the accuracy of the paddy yield prediction on the CDZ zones, i.e., the eastern part of Tamilnadu.
As the considered metrics show the higher effectiveness of the GRNN algorithm, it is essential to compute the running time of all the adopted ANN models.Therefore, the individual run times of the models are computed and illustrated in Figure 9.It is observed that the GRNN model completed the prediction task within 880 ms, which is comparatively lower than other ANN models such as SVM, BPNN, and RBFNN.This high-speed computation is not possible with other numerical models due to the complicated mathematical models that tend to increase the inaccuracy of the prediction.In addition, the numerical model has greater limitations regarding the number of input parameters, which is not a concern for the ANN model.Consolidating all the inferences and statements, it is perceived that the ANN algorithms have done well for crop yield prediction. Notably, the GRNN algorithm offered superior results compared with other adapted techniques such as SVM, BPNN, and RBFNN using three performance metrics.Furthermore, the prediction accuracy of the GRNN model is compared with other competitive methods from the literature.Notably, the regression analysis model was adopted by the authors for crop yield prediction accuracy, and the coefficient R 2 attained a maximum scale of about 0.7272 [30].In addition, the same coefficient was evaluated using the particle swarm optimization-imperialist competitive algorithm-support vector regression (PSO-ICA-SVR) method, and it attained the best value of 0.874 [17].In addition, the performance of the random forest method was considered for crop yield prediction using the R 2 coefficient, and it obtained better results, i.e., 0.92 [6].Comparing these inferences, the accuracy of the proposed GRNN model shows extremely good scale of about 0.9863 (about 7.53% is increased compared with the random forest method).Furthermore, other evaluation metrics such as RMSE and MAE are compared with existing methodologies; the PSO-ICA-SVR model offered minimum RMSE and MAE of about 1.418 and 1.737 respectively 17].However, the proposed GRNN model shows the lowest values of about 0.2295 (RMSE) and 0.1279 (MAE).Other evaluation metrics (MSE, MAPE, CV, and NMSE) are not demonstrated greatly by the researcher using machine learning models.This work targeted all possible evaluation metrics to validate the effectiveness of the proposed model.
Moreover, the absolute yield of the selected location is compared with other parts of Indian states, and the complete comparative case is illustrated in Figure 10.It is found that the state of Tamilnadu attained the highest yield: about 3191 kg/ha [44].This is owing to the optimum parameters of the state: notably, a mean temperature of 28 °C, higher rainfall of 464 mm (3 months), and pH value about 6.9.In addition, paddy cultivation parts of Tamilnadu comprise a wide range of alluvial soil, which is suitable for paddy cultivation.The predicted values of the machine learning model almost match with the absolute yield of the Tamilnadu but with different accuracy based on the effectiveness of the individual algorithm.It is already stated that the accuracy of the GRNN model shows better scale among other selected machine learning models.These research findings confirm the consistency between predicted yields and the government's yield statistics.As per the literature survey, there are no benchmark data sets available for crop yield, and it is challenging to predict owing to diverse biological parameters.Therefore, GRNN can be adapted for the crop yield prediction for effective outcomes that can reduce the risk factor for the farmers.

Conclusions
Prediction of crop yield is carried out using statistical and machine learning algorithms.Specifically, the statistical study of likely MLR techniques and machine learning algorithms such as SVM, GRNN, RBFNN, and BPNN are considered for evaluation to attain crop yield prediction of higher accuracy.Model performance metrics are adapted to scrutinize the accuracy level of the different algorithms.With the observed outcomes, the following conclusions are made:

•
Machine learning algorithms attained exceptionally greater yield prediction accuracy than statistical methodology based on the results of evaluation metrics.

•
Among the four machine learning algorithms such as SVM, RBFNN, GRNN, and BPNN, GRNN predicted the yield more precisely.

•
Compared with other existing models from the literature reports, the R 2 metrics of the proposed model (GRNN) are improved by 7.53%.

•
The absolute yield of Tamilnadu and other Indian states are compared, and it is found that Tamilnadu acquired the highest yield, about 3191 kg/ha, and the same is attained with the proposed GRNN prediction model with higher accuracy.

•
It is also concluded that Tamilnadu consists of optimum parameters (rainfall, temperature, and pH value) for paddy cultivation that enable the farmers to attain higher yield.

•
The recommended machine learning algorithm, notably GRNN, reduces the risk factor for paddy yield due its superior performance metrics.
In the future, superlative hybridization among the four adapted machine learning methods will be carried out using additional model performance metrics.
Author Contributions: Conceptualization, V.J. and S.M.P.; methodology, V.J.; software, V.J.; validation, S.M.P. and R.K.; writing-original draft preparation, V.J., S.M.P. and R.K.; writing-review and editing, V.J., S.M.P. and R.K.All authors have read and agreed to the published version of the manuscript.

Figure 1 .
Figure 1.CDZ belt in eastern part of Tamilnadu.

Figure 9 .
Figure 9. Run time of the ANN models.

Figure 10 .
Figure 10.Comparative study of yield among Indian states.

Table 2 .
Description of the selected sites.

Table 3 .
Data collection for yield prediction from selected sites.

Table 4 .
Input parameters of SVM.

Table 7 .
Implementation and outcomes of MLR method.
• R 2 , RMSE, MAE, MSE, MAPE, CV, and NSME performance metrics of GRNN showed a better scale of 0.9863, 0.2295, 0.1290, 0.0526, 1.3439, 0.0255, and 0.0136, respectively.• Run time of the GRNN model shows a superior scale of 880 ms, which is comparatively less than that of the other ANN models.