# Crop Yield Prediction Using Hybrid Machine Learning Approach: A Case Study of Lentil (Lens culinaris Medik.)

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Plant Material and Field Experiment

#### 2.2. Machine Learning Models

#### 2.2.1. Multivariate Adaptive Regression Spline (MARS) Model

_{0}is a constant, ${B}_{m}(\overrightarrow{X})$ is the mth basis function, which may be a single spline basis function, and ${c}_{m}$ is the coefficient of the mth basis function. Both the variables to be introduced into the model and the knot positions for each individual variable have to be optimized. For a dataset $\overrightarrow{X}$ containing n objects and p explanatory variables, there are N = n × p pairs of spline basis functions with knot locations (i = 1, 2, …, n; j = 1, 2, …, p). In the present study, x = 206 entries are exotic collections and 312 are indigenous collections, including 59 breeding lines, p = 9 and n = 518. MARS was used as a variable selection model in the present study.

#### 2.2.2. Artificial Neural Network (ANN) Model

_{t}is output of neural network model (yield per plant); n is number of hidden nodes; m is the number of input nodes; f is the net input of the activation function; ${\beta}_{ij}$ {I = 1, 2, …, m; j = 0, 1, …, n} are the weights from input to hidden nodes; ${\alpha}_{j}\{j=0,1,\dots ,n\}$ are the vectors of weights from hidden to output node; ${\alpha}_{0}$ and ${\beta}_{0j}$ are the weights of arcs leading from bias terms. Activation function is a differentiable function that is used for smoothing the result of the cross product of the covariate or neurons and the weights. In artificial neural networks, the activation function of a node defines the output of that node given an input or set of inputs. In the present study, logistic function was used as activation function and the Levenberg–Marquardt (LM) learning algorithm was used to adjust the weights in the multi-layered feedforward networks.

#### 2.2.3. Support Vector Regression (SVR) Model

_{i}is associated weight and $Ker({x}_{i},{x}_{j})$ is the nonlinear mapping function known as kernel function for input (independent) variables ${x}_{i}$ (I = 1, 2, …, 9).

#### 2.2.4. Hyperparameter Tuning of Machine Learning Models

#### 2.3. Proposed Mars Based Hybrid Model

- Step 1. Start model building with all available predictors.
- Step 2. Apply MARS algorithm for extracting the important predictors based on its importance.
- Step 3. Build the machine learning model (ANN/SVR) using the selected predictors.
- Step 4. Obtain prediction using the model obtained in Step 3.

#### 2.4. Model Performance and Accuracy of Fitted Models

## 3. Results

#### 3.1. Data Processing and Statistical Analysis

_{i}is the original data, x

_{n}is the normalized values, and x

_{max}and x

_{min}are the maximum and minimum values, respectively. Denormalised has been done prior to calculation of performance measures.

#### 3.2. Input Variable Selection

−0.01 × BF7 + 0.01 × BF8 − 0.001 × BF9 + 0.002 × BF10 − 0.21 × BF11 − 0.0002 × BF12

#### 3.3. ANN Model Development

^{5}to 1 × 10

^{8}and threshold = 0.01. The schematic representation of the fitted ANN model with weights is shown in Figure 3. Table 5 summarizes the error rate and performance measures of fitted ANN with a different number of hidden nodes. The ANN model with 1, 2, 3 and 5 hidden nodes had the same performance. However, the ANN model with 4 hidden nodes gave the best result. Thus, the best-fitted replication in the ANN model with 4 hidden nodes was used for yield forecasting.

#### 3.4. SVR Model Development

## 4. Discussion

## 5. Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Institutional Review Board Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Sarker, A.; Erskine, W.; Singh, M. Regression models for lentil seed and straw yields in Near East. Agric. For. Meteorol.
**2003**, 116, 61–72. [Google Scholar] [CrossRef] - Ghanem, M.E.; Marrou, H.; Soltani, A.; Kumar, S.; Sinclair, T.R. Lentil Variation in Phenology and Yield Evaluated with a Model. Agron. J.
**2015**, 107, 1967–1977. [Google Scholar] [CrossRef] - Statistics Division (FAOSTAT), UN Food and Agriculture Organization, United Nations. Production of Lentils in 2020; Crops/World Regions/Production Quantity from Pick Lists; FAO: Rome, Italy, 2022. [Google Scholar]
- Mondal, M.M.A.; Puteh, A.B.; Malek, M.A.; Roy, S.; Yusop, M.R. Contribution of morpho-physiological traits on yield of lentil (Lens culinaris Medik). Aust. J. Crop Sci.
**2013**, 7, 1167–1172. Available online: http://www.cropj.com/mondal3506_7_8_2013_1167_1172.pdf (accessed on 10 September 2022). - Muehlbauer, F.J. Seed Yield Components in Lentils. Crop Sci.
**1974**, 14, 403–406. [Google Scholar] [CrossRef] - Seid, M. Crop Forecasting: Its Importance, Current Approaches, Ongoing Evolution and Organizational Aspects. FAO Report. 2016. Available online: https://www.fao.org/fileadmin/templates/rap/files/meetings/2016/160524_AMIS-CM_3.2.3_Crop_forecasting_Its_importance__current_approaches__ongoing_evolution_and.pdf (accessed on 1 April 2021).
- Alireza, B.B.; Mohamadreza, S.; Said, A.; Behnam, T.; Gafari, G. Path analysis of seed and oil yield in safflower. Commun. Plant Sci.
**2012**, 2, 15–20. Available online: https://cpsjournal.org/2012/04/09/path-analysis-safflower/ (accessed on 10 September 2022). - Vapnik, V.N. Statistical Learning Theory, 1st ed.; Wiley-Interscience: New Delhi, India, 1998. [Google Scholar]
- Sarkar, S.; Ghosh, A.; Brahmachari, K.; Ray, K.; Nanda, M.K. Assessing the yield response of lentil (Lens culinaris Medikus) as influenced by different sowing dates and land situations in Indian Sundarbans. Legume Res.-Int. J.
**2021**, 44, 1203–1210. [Google Scholar] [CrossRef] - Bagheri, A.; Zargarian, N.; Mondani, F.; Nosratti, I. Artificial neural network potential in yield prediction of lentil (Lens culinaris L.) influenced by weed interference. J. Plant Prot. Res.
**2020**, 60, 284–295. [Google Scholar] [CrossRef] - Khairunniza-Bejo, S.; Mustaffha, S.; Ismail, W.I.W. Application of artificial neural network in predicting crop yield: A review. J. Food Sci. Eng.
**2014**, 4, 1–9. Available online: http://psasir.upm.edu.my/id/eprint/36505/1/Application%20of%20artificial%20neural%20network%20in%20predicting%20crop%20yield.pdf (accessed on 10 September 2022). - Schultz, A.; Wieland, R. The use of neural networks in agroecological modelling. Comput. Electron. Agric.
**1997**, 18, 73–90. [Google Scholar] [CrossRef] - Uno, Y.; Prasher, S.O.; Lacroix, R.; Goel, P.K.; Karimi, Y.; Viau, A.; Patel, R.M. Artificial neural networks to predict corn yield from Compact Airborne Spectrographic Imager data. Comput. Electron. Agric.
**2009**, 47, 149–161. [Google Scholar] [CrossRef] - Lee, T.S.; Chen, I.F. A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines. Expert Syst. Appl.
**2005**, 28, 743–752. [Google Scholar] [CrossRef] - Zhang, W.; Goh, A.T.C. Multivariate adaptive regression splines and neural network models for prediction of pile drivability. Geosci. Front.
**2016**, 7, 45–52. [Google Scholar] [CrossRef][Green Version] - Khazaei, J.; Naghavi, M.R.; Jahansouz, M.R.; Salimi-Khorshidi, G. Yield estimation and clustering of chickpea genotypes using soft computing techniques. Agron. J.
**2008**, 100, 1077–1087. [Google Scholar] [CrossRef] - Higgins, A.; Prestwidge, D.; Stirling, D.; Yost, J. Forecasting maturity of green peas: An application of neural networks. Comput. Electron. Agric.
**2010**, 70, 151–156. [Google Scholar] [CrossRef] - Gandhi, N.; Petkar, O.; Armstrong, L.J.; Tripathy, A.K. Rice crop yield prediction in India using support vector machines. In Proceedings of the 2016 13th International Joint Conference on Computer Science and Software Engineering, JCSSE, Khon Kaen, Thailand, 13–15 July 2016. [Google Scholar] [CrossRef]
- Gopal, G.; Bagade, A.; Doijad, S.; Jawale, L. Path analysis studies in safflower germplasm (Carthamus tinctorius L.). Int. J. Curr. Microbiol. Appl. Sci.
**2014**, 3, 347–351. Available online: https://www.ijcmas.com/vol-3-12/G.R.Gopal,%20et%20al.pdf (accessed on 10 September 2022). - Deo, R.C.; Kisi, O.; Singh, V.P. Drought forecasting in eastern Australia using multivariate adaptive regression spline, least square support vector machine and M5Tree model. Atmos. Res.
**2017**, 184, 149–175. [Google Scholar] [CrossRef][Green Version] - Su, Y.; Xu, H.; Yan, L. Support vector machine-based open crop model (SBOCM): Case of rice production in China. Saudi J. Biol. Sci.
**2017**, 24, 537–547. [Google Scholar] [CrossRef] - Klompenburg, T.V.; Kassahun, A.; Catal, C. Crop yield prediction using machine learning: A systematic literature review. Comput. Electron. Agric.
**2020**, 177, 105709. [Google Scholar] [CrossRef] - Batool, D.; Shahbaz, M.; Shahzad Asif, H.; Shaukat, K.; Alam, T.M.; Hameed, I.A.; Ramzan, Z.; Waheed, A.; Aljuaid, H.; Luo, S. A Hybrid Approach to Tea Crop Yield Prediction Using Simulation Models and Machine Learning. Plants
**2022**, 11, 1925. [Google Scholar] [CrossRef] - Cubillas, J.J.; Ramos, M.I.; Jurado, J.M.; Feito, F.R. A Machine Learning Model for Early Prediction of Crop Yield, Nested in a Web Application in the Cloud: A Case Study in an Olive Grove in Southern Spain. Agriculture
**2022**, 12, 1345. [Google Scholar] [CrossRef] - Bali, N.; Singla, A. Emerging trends in machine learning to predict crop yield and study its influential factors: A survey. Arch. Comput. Methods Eng.
**2022**, 29, 95–112. [Google Scholar] [CrossRef] - Ji, Z.; Pan, Y.; Zhu, X.; Zhang, D.; Dai, J. Prediction of Corn Yield in the USA Corn Belt Using Satellite Data and Machine Learning: From an Evapotranspiration Perspective. Agriculture
**2022**, 12, 1263. [Google Scholar] [CrossRef] - Bishop, M.C. Neural Networks for Pattern Recognition; Oxford University Press: New York, NY, USA, 1995. [Google Scholar]
- May, R.; Dandy, G.; Maier, H. Review of input variable selection methods for artificial neural networks. Artif. Neural Netw.–Methodol. Adv. Biomed. Appl.
**2011**, 10, 19–45. [Google Scholar] - Montomery, D.C.; Peck, E.A.; Vining, G.G. Introduction to Linear Regression Analysis, 4th ed.; Wiley-Interscience: New Delhi, India, 2006. [Google Scholar]
- Cai, J.; Luo, J.; Wang, S.; Yang, S. Feature selection in machine learning: A new perspective. Neurocomputing
**2018**, 300, 70–79. [Google Scholar] [CrossRef] - Pishgoo, B.; Azirani, A.A.; Raahemi, B. A dynamic feature selection and intelligent model serving for hybrid batch-stream processing. Knowl.-Based Syst.
**2022**, 256, 109749. [Google Scholar] [CrossRef] - Zhao, S.; Wang, M.; Ma, S.; Cui, Q. A feature selection method via relevant-redundant weight. Expert Syst. Appl.
**2022**, 207, 117923. [Google Scholar] [CrossRef] - Friedman, J.H. Multivariate adaptive regression splines. Ann. Stat.
**1991**, 19, 1–67. [Google Scholar] [CrossRef] - Sekulic, S.; Kowalski, B.R. MARS: A tutorial. J. Chemom.
**1992**, 6, 199–216. [Google Scholar] [CrossRef] - Lee, T.S.; Chiu, C.C.; Chou, Y.C.; Lu, C.J. Mining the customer credit using classification and regression tree and Multivariate adaptive regression splines. Comput. Stat. Data Anal.
**2006**, 50, 1113–1130. [Google Scholar] [CrossRef] - Jha, G.K.; Chiranjit, M.; Jyoti, K.; Gajab, S. Nonlinear principal component based fuzzy clustering: A case study of lentil genotypes. Indian J. Genet. Plant Breed.
**2014**, 74, 189–196. Available online: http://isgpb.co.in (accessed on 1 April 2021). [CrossRef] - Friedman, J.H.; Roosen, C.B. An introduction to multivariate adaptive regression splines. Stat. Methods Med. Res.
**1995**, 4, 197–217. [Google Scholar] [CrossRef] - Jha, G.K.; Sinha, K. Time-delay neural networks for time series prediction: An application to the monthly wholesale price of oilseeds in India. Neural. Comput. Appl.
**2014**, 3–4, 563–571. [Google Scholar] [CrossRef] - ASCE Task Committee on Application of Artificial Neural Networks in Hydrology. Artificial Neural Networks in Hydrology. I: Preliminary Concepts. J. Hydrol. Eng.
**2000**, 5, 115–123. [Google Scholar] [CrossRef] - Drucker, H.; Surges, C.J.C.; Kaufman, L.; Smola, A.; Vapnik, V. Support vector regression machines. Adv. Neural. Inf. Process. Syst.
**1997**, 9, 155–161. Available online: https://papers.nips.cc/paper/1996/file/d38901788c533e8286cb6400b40b386d-Paper.pdf (accessed on 1 April 2021). - Abdipour, M.; Younessi-Hmazekhanlu, M.; Ramazani, M.Y.H.; Omidi, A.H. Artificial neural networks and multiple linear regression as potential methods for modeling seed yield of safflower (Carthamus tinctorius L.). Ind. Crops Prod.
**2019**, 27, 185–194. [Google Scholar] [CrossRef] - Zhang, Q.M.; Zhang, G.P. Trend time series modeling and forecasting with neural networks. IEEE Trans. Neural Netw.
**2008**, 19, 808–816. [Google Scholar] [CrossRef] - Diebold, F.X.; Mariano, R.S. Comparing predictive accuracy. J. Bus. Econ. Stat.
**2002**, 20, 134–144. [Google Scholar] [CrossRef] - Das, P.; Lama, A.; Jha, G.K. MARSANNhybrid: MARS Based ANN Hybrid Model. 2021. Available online: https://CRAN.R-project.org/package=MARSANNhybrid (accessed on 1 April 2021).
- Das, P.; Lama, A.; Jha, G.K. MARSSVRhybrid: MARS SVR Hybrid. 2021. Available online: https://CRAN.R-project.org/package=MARSSVRhybrid (accessed on 1 April 2021).
- Lotfi, P.; Mohammadi-Nejad, G.; Golkar, P. Evaluation of drought tolerance in different genotypes of the safflower (Carthamus tinctorius L.). Iran. J. Agric. Sci.
**2012**, 5, 1–14. Available online: http://www.ijabbr.com (accessed on 1 April 2021). - Yang, Y.-X.; Wu, W.; Zheng, Y.-L.; Huang, C.-Y.; Liu, R.-J.; Chen, L. Correlation and path analysis on characters related to flower yield per plant of Carthamus tinctorius. Zhong Cao Yao
**2006**, 37, 105. Available online: https://pesquisa.bvsalud.org/portal/resource/pt/wpr-574547 (accessed on 1 April 2021). - Khalili, M.; Pour Aboughadareh, A.; Naghavi, M.R.; Naseri Rad, H. Path analysis of the relationships between seed yield and some of morphological traits in safflower (Carthamus tinctorius L.) under normal irrigated and rainfed conditions. Tech. J. Eng. Appl. Sci.
**2014**, 3, 1692–1696. Available online: http://www.tjeas.com (accessed on 1 April 2021). - Senobari, S.; Sabzalian, M.R.; Saeidi, G. Evaluation of phenotypic and genetic relationships between agronomic traits, grain yield and its components in genotypes derived from interspecific hybridization between wild and cultivated safflower. Iran. J. Field Crop. Sci.
**2016**, 47, 131–139. Available online: https://www.cabdirect.org/cabdirect/abstract/20163237386 (accessed on 1 April 2021). - Haykin, S. Neural Networks: A Comprehensive Foundation; Prentice Hall: Toronto, ON, Canada, 1999. [Google Scholar]
- Das, P. Study on Machine Learning Techniques Based Hybrid Model for Forecasting in Agriculture. Ph.D. Thesis, Indian Agricultural Research Institute, New Delhi, India, 2020. Available online: http://krishikosh.egranth.ac.in/handle/1/5810147805 (accessed on 1 April 2021).

Model | Hyper Parameters | |
---|---|---|

ANN | Training algorithm | Resilient back propagation (Rprop) |

Maximum steps up to which the neural network is trained (Stepmax) | 1 × 10^{7} | |

The number of repetitions used to train the neural network model (Rep) | 3 | |

Threshold (threshold value of the partial derivatives of the error function) | 0.01 | |

SVR | Defining algorithms | Kernel |

Regularization parameter (C) | 1 | |

Kernel coefficient (Gamma) | Scale | |

Penalty function (Epsilon) | 0.1 | |

Cross validation | 10 | |

MARS | Number of model terms upper bound (Nmax) | 20 |

Penalty coefficient (b) | 3 |

Parameter | Range | Mean | Standard Deviation | CV |
---|---|---|---|---|

DF | 58–106 | 78.69 | 10.75 | 13.66 |

PH | 17–47.6 | 30.79 | 4.79 | 15.55 |

DM | 114–140 | 126.03 | 4.70 | 3.72 |

SW | 1.2–4.1 | 2.43 | 0.53 | 21.75 |

BYP | 4.2–28 | 13.37 | 3.75 | 28.01 |

PB | 2–9 | 3.76 | 1.06 | 28.20 |

SB | 4–18 | 10.22 | 2.39 | 23.40 |

PPP | 3.7–309.3 | 116.16 | 47.53 | 40.92 |

YPP | 0.2–10.7 | 3.72 | 1.61 | 43.25 |

PHLP | 1–19 | 10.74 | 2.30 | 21.44 |

Degree | RMSE | MAD | MAPE |
---|---|---|---|

1 | 0.5972 | 0.5134 | 0.1792 |

2 | 0.4492 | 0.4866 | 0.1566 |

3 | 0.4356 | 0.4842 | 0.1565 |

Variable | GCV | RSS |
---|---|---|

PPP | 100 | 100 |

SW | 46.5 | 48.6 |

Ph | 31.2 | 33.9 |

BYP | 31.1 | 33.9 |

PHLP | 25.5 | 28 |

PB | 18.4 | 20.1 |

DF | 6.6 | 7.9 |

No. of Nodes in Hidden Layer | RMSE | MAD | MAPE |
---|---|---|---|

1 | 1.6134 | 1.2169 | 0.4612 |

2 | 1.6134 | 1.2169 | 0.4612 |

3 | 1.6134 | 1.2169 | 0.4612 |

4 | 0.9627 | 0.6288 | 0.1828 |

5 | 1.1512 | 1.0191 | 0.3521 |

Kernel Function | RMSE | MAD |
---|---|---|

Radial basis | 0.6474 | 0.3602 |

Linear | 0.8599 | 0.5231 |

Polynomial | 0.827 | 0.5253 |

Sigmoid | 0.8269 | 0.5253 |

Model | RMSE | MAD | MAPE | ME | |
---|---|---|---|---|---|

In-sample | ANN | 0.9827 | 0.6288 | 0.1828 | 8.1055 |

MARS | 0.4356 | 0.4842 | 0.1565 | 4.2157 | |

MARS-ANN | 0.0802 | 0.0607 | 0.2478 | 0.3918 | |

MARS-SVR | 0.0826 | 0.0579 | 0.1834 | 0.8498 | |

MLR | 0.9869 | 0.6520 | 0.1840 | 9.10 | |

SVR | 0.6474 | 0.3602 | 0.1089 | 7.1265 | |

Out-sample | ANN | 0.8142 | 0.6435 | 0.2308 | 2.4871 |

MARS | 0.9415 | 0.6147 | 0.2769 | 5.3540 | |

MARS-ANN | 0.0802 | 0.0579 | 0.2214 | 0.7085 | |

MARS-SVR | 0.0658 | 0.0579 | 0.1626 | 0.2206 | |

MLR | 0.8520 | 0.0610 | 0.2852 | 3.6302 | |

SVR | 0.6853 | 0.4902 | 0.2707 | 2.6435 |

Model | DM Value | p Value | Remarks |
---|---|---|---|

MARS-ANN vs. MARS-SVR | 4.185 | <0.01 | The accuracy of MARS-ANN is better than MARS-SVR. |

MARS-ANN vs. ANN | 5.304 | <0.01 | The accuracy of MARS-ANN is better than ANN model. |

MARS-ANN vs. MARS | 5.725 | <0.01 | The accuracy of MARS-ANN is better than MARS model. |

MARS-ANN vs. SVR | 5.955 | <0.01 | The accuracy of MARS-ANN is better than SVR model. |

MARS-SVR vs. ANN | 6.563 | <0.01 | The accuracy of MARS-SVR is better than ANN model. |

MARS-SVR vs. SVR | 6.823 | <0.01 | The accuracy of MARS-SVR is better than SVR model. |

MARS-SVR vs. MARS | 6.235 | <0.01 | The accuracy of MARS-SVR is better than MARS model. |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Das, P.; Jha, G.K.; Lama, A.; Parsad, R.
Crop Yield Prediction Using Hybrid Machine Learning Approach: A Case Study of Lentil (*Lens culinaris* Medik.). *Agriculture* **2023**, *13*, 596.
https://doi.org/10.3390/agriculture13030596

**AMA Style**

Das P, Jha GK, Lama A, Parsad R.
Crop Yield Prediction Using Hybrid Machine Learning Approach: A Case Study of Lentil (*Lens culinaris* Medik.). *Agriculture*. 2023; 13(3):596.
https://doi.org/10.3390/agriculture13030596

**Chicago/Turabian Style**

Das, Pankaj, Girish Kumar Jha, Achal Lama, and Rajender Parsad.
2023. "Crop Yield Prediction Using Hybrid Machine Learning Approach: A Case Study of Lentil (*Lens culinaris* Medik.)" *Agriculture* 13, no. 3: 596.
https://doi.org/10.3390/agriculture13030596