# Using Machine Learning and Feature Selection for Alfalfa Yield Prediction

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

^{2}of 0.90 beats a previous work’s best R

^{2}of 0.87. Our primary contribution is the demonstration that ML, with feature selection, shows promise in predicting crop yields even on simple datasets with a handful of features, and that reporting accuracies in R and R

^{2}offers an intuitive way to compare results among various crops.

## 1. Introduction

^{2}, trying some different models, and using more accessible datasets with simpler features. Other previous work in this area generally used more complex data collection techniques, such as unmanned aerial vehicles (UAVs) [7], remote sensors [8], and satellite imagery [9]. Our primary contributions are as follows:

- We achieved prediction accuracies higher than the previous work, showing that simple, publicly available datasets with limited features, requiring no special instruments to collect, could be used to train models comparable to or better than state-of-the-art.
- We extended previous work in ML with feature selection for crop yield prediction to consider alfalfa, one of the world’s most important agricultural resources.
- We presented our results in terms of the coefficient of correlation (R) and the coefficient of determination (R
^{2}), which is more meaningful across various domains with disparate units than mean absolute error (MAE) used in some previous works.

#### 1.1. ML Models

#### 1.1.1. Linear Regression

_{i}is the value of a data point’s ith feature, ${w}_{i}$ is a coefficient associated with the $i$th feature, ${w}_{0}$ is the intercept, and y

_{i}is the prediction of the linear regression.

#### 1.1.2. Neural Networks

_{j}being the learned coefficient for the jth input, and A being a predefined nonlinear function. To train a neural network, all the coefficients (${w}_{j}$’s) are initialized with random values. Then the training data is fed to the network and predictions are found. An error is calculated by finding the difference between the prediction and the true value. By finding the gradient of the error, the neural network can iteratively change the coefficients of each node to minimize the overall error. By changing the number of layers and nodes, a neural network can approximate many different functions [12].

#### 1.1.3. Support Vector Machines

#### 1.1.4. K-Nearest Neighbors

#### 1.1.5. Regression Trees

^{2}. If the answer is yes, then it goes to another node and asks another question. If the answer is no, it goes to a different node. This process continues until an answer is given. In order to learn what questions to ask, the regression tree minimizes some impurity measure [13]. Note that a random forest is a collection of multiple regression trees, and the final output of a random forest is the average result of all its regression trees.

#### 1.1.6. Bayesian Ridge Regression

#### 1.1.7. Feature Selection

## 2. Related Work

^{2}) metrics. R reflects accuracy and captures the direction or strength of correlations [19], and we found R

^{2}to be a dominant accuracy metric in previous work. We hope both metrics help paint a more intuitive picture of accuracy than MAE across various crops with disparate yield units. Moreover, our work starts with a simpler, less esoteric dataset and features than previous work. For example, they include several attributes of soil chemistry data; however, we added solar radiation, which has been proven to be a good predictor.

^{2}, but they did not provide R results. Noland et al. similarly showed that data collected via UAVs and other remote sensors could be used to train predictive models, and they also measured success in terms of R

^{2}. However, that work relied on canopy reflectance and light detection and ranging (LiDAR) data. Though the current work used simpler, more easily acquired data, our highest R

^{2}scores of around 0.90 beat theirs of around 0.87 for alfalfa yield prediction [8].

^{2}was their metric of choice, and their success with similar models such as random forest helped motivate the current work to apply ML to a related problem [23]. Wang et al. [9] used ML to predict yields for winter wheat in the CONUS in their 2020 paper, where they combined multiple sources of data including satellite imagery, climate data, and soil maps to train a support vector machine (SVM), AdaBoost model, deep neural network (DNN), and a random forest with positive results measured in R

^{2}and mean absolute error (MAE), such as the current work, as well as root mean squared error (RMSE) [9]. Our work adopted a simpler approach but used fewer varieties of data, all of which were publicly available, whereas ours did not require processing image data. Leng and Hall [24] showed that ML aided in simulating yield averages for maize in their 2020 paper, while Nikoloski et al. [25] showed promise applying ML to estimating productivity in dairy farm grasslands in their 2019 work which used the R

^{2}metric among others. The current work is the first study we know of that shows promise for applying such popular ML techniques to predicting crop yields using only simple, publicly available weather and variety trial datasets.

## 3. Materials and Methods

^{2}) value were all found and recorded. This was done for each of the 10 iterations. Note that this means that 10 different models were made for each method. We calculated and recorded the average MAE, R, and R

^{2}value over all 10 models. We reported R

^{2}scores because we found this to be the dominant metric for reflecting accuracy in similar work. On the other hand, we emphasized R scores in our results because R captured the direction of correlation, while R

^{2}ignored it. Further, these two metrics followed the same trends and were usually not greatly different from each other [19]. We also reported MAE in keeping with previous work, and because MAE was not always consistent with R and R

^{2}, it may therefore be instructive and either support or undermine other metrics.

## 4. Results

^{2}value for each model over the 10 iterations, as shown in this section’s tables. Note that the average yield in the dataset is 2020 lbs./acre. Using the SelectKBest feature selection method, we made all features available for feature selection and compared the results from K = 3 to K = 11. Notice that as K increased, the R value increased, but the increase in R levels tailed off at around K = 6 (Figure 1). These 6 features were the Julian day, number of days since the crop was sown, total solar radiation, average soil moisture, day length, and percent cover. The results of the models with no feature selection are shown in Figure 2 and Table 3. Here, the support vector regression model had the highest average R of 0.948.

^{2}value was found to be 0.752.

## 5. Discussion

## Supplementary Materials

## Author Contributions

## Funding

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Appendix A

- ‘criterion’: [‘mae’];
- ‘max_depth’: [5,10,25,50,100].

- ‘n_estimators’: [5,10,25,50,100];
- ‘max_depth’: [5,10,15,20];
- ‘criterion’: [“mae”].

- ‘n_neighbors’: [2,5,10];
- ‘weights’: [‘uniform’, ‘distance’];
- ‘leaf_size’: [5,10,30,50].

- ‘kernel’: [‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’];
- ‘C’: [0.1, 1.0, 5.0, 10.0];
- ‘gamma’: [“scale”, “auto”];
- ‘degree’: [2,3,4,5].

- ‘hidden_layer_sizes’: [(3), (5), (10), (3,3), (5,5), (10,10)];
- ‘‘solver’: [‘sgd’, ‘adam’];
- ‘learning_rate’: [‘constant’, ‘invscaling’, ‘adaptive’];
- ‘learning_rate_init’: [0.1, 0.01, 0.001].

- ‘n_iter’: [100,300,500];
- ‘lambda_1’: [1.e−6, 1.e−4, 1.e−2, 1, 10].

## References

- United Nations. Transforming our world: The 2030 agenda for sustainable development. In Resolution Adopted by the General Assembly; United Nations: New York, NY, USA, 2015. [Google Scholar]
- Copenhagen Consensus Center. Background. Available online: https://www.copenhagenconsensus.com/post-2015-consensus/background (accessed on 29 December 2020).
- Rosegrant, M.W.; Magalhaes, E.; Valmonte-Santos, R.A.; Mason-D’Croz, D. Returns to investment in reducing postharvest food losses and increasing agricultural productivity growth. In Prioritizing Development: A Cost Benefit Analysis of the United Nations’ Sustainable Development Goals; Cambridge University Press: Cambridge, UK, 2018; p. 322. [Google Scholar]
- Lomborg, B. The Nobel Laureates’ Guide to the Smartest Targets for the World: 2016–2030; Copenhagen Consensus Center USA: Tewksbur, MA, USA, 2015. [Google Scholar]
- Dodds, F.; Bartram, J. (Eds.) The Water, Food, Energy and Climate Nexus: Challenges and an Agenda for Action; Routledge: Abingdon, UK, 2016. [Google Scholar]
- Bocca, F.F.; Rodrigues, L.H.A. The effect of tuning, feature engineering, and feature selection in data mining applied to rainfed sugarcane yield modelling. Comput. Electron. Agric
**2016**, 128, 67–76. [Google Scholar] [CrossRef] - Feng, L.; Zhang, Z.; Ma, Y.; Du, Q.; Williams, P.; Drewry, J.; Luck, B. Alfalfa Yield Prediction Using UAV-Based Hyperspectral Imagery and Ensemble Learning. Remote Sens.
**2020**, 12, 2028. [Google Scholar] [CrossRef] - Noland, R.L.; Wells, M.S.; A Coulter, J.; Tiede, T.; Baker, J.M.; Martinson, K.L.; Sheaffer, C.C. Estimating alfalfa yield and nutritive value using remote sensing and air temperature. Field Crop. Res.
**2018**, 222, 189–196. [Google Scholar] [CrossRef] - Wang, Y.; Zhang, Z.; Feng, L.; Du, Q.; Runge, T. Combining Multi-Source Data and Machine Learning Approaches to Predict Winter Wheat Yield in the Conterminous United States. Remote Sens.
**2020**, 12, 1232. [Google Scholar] [CrossRef][Green Version] - Russell, S.J.; Norvig, P. Artificial Intelligence: A Modern Approach; Pearson Education Limited: London, UK, 2016. [Google Scholar]
- Rojas, R. Neural Networks-A Systematic Introduction; Springer: New York, NY, USA, 1996. [Google Scholar]
- Mitchell, T. Machine Learning; McGraw-Hill: New York, NY, USA, 1997. [Google Scholar]
- González Sánchez, A.; Frausto Solís, J.; Ojeda Bustamante, W. Predictive ability of machine learning methods for massive crop yield prediction. Span. J. Agric. Res.
**2014**. [Google Scholar] [CrossRef][Green Version] - Quinlan, J.R. Learning with continuous classes. In Proceedings of the 5th Australian Joint Conference on Artificial Intelligence, Hobart, Tasmania, 16–18 November 1992; Volume 92, pp. 343–348. [Google Scholar]
- Gelman, A.; Stern, H.S.; Carlin, J.B.; Dunson, D.B.; Vehtari, A.; Rubin, D.B. Bayesian Data Analysis; Chapman and Hall/CRC: Abingdon, UK, 2013. [Google Scholar]
- Dash, M.; Liu, H. Feature Selection for Classification. Intelligent Data Analysis; IOS Press: Amsterdam, The Netherlands, 1997; Volume 1, pp. 131–156. [Google Scholar]
- Hall, M.A. Correlation-Based Feature Selection for Machine Learning. Ph.D. Thesis, University of Waikato, Hamilton, New Zealand, 1999. [Google Scholar]
- Kononenko, I. Estimating Attributes: Analysis and Extensions of RELIEF. In Proceedings of the European Conference on Machine Learning, Catania, Italy, 6–8 April 1994; Springer: Berlin/Heidelberg, Germany, 1994; pp. 171–182. [Google Scholar]
- Ratner, B. The correlation coefficient: Its values range between +1/−1, or do they? J. Target. Meas. Anal. Mark.
**2009**, 17, 139–142. [Google Scholar] [CrossRef][Green Version] - Boote, K.J.; Jones, J.W.; Hoogenboom, G.; Pickering, N.B. The CROPGRO model for grain legumes. In Applications of Systems Approaches at the Field Level; Springer Science and Business Media LLC: Berlin/Heidelberg, Germany, 1998; pp. 99–128. [Google Scholar]
- Malik, W.; Boote, K.J.; Hoogenboom, G.; Cavero, J.; Dechmi, F. Adapting the CROPGRO Model to Simulate Alfalfa Growth and Yield. Agron. J.
**2018**, 110, 1777–1790. [Google Scholar] [CrossRef][Green Version] - Jing, Q.; Qian, B.; Bélanger, G.; Vanderzaag, A.; Jégo, G.; Smith, W.; Grant, B.; Shang, J.; Liu, J.; He, W.; et al. Simulating alfalfa regrowth and biomass in eastern Canada using the CSM-CROPGRO-perennial forage model. Eur. J. Agron.
**2020**, 113, 125971. [Google Scholar] [CrossRef] - YangiD, P.; Zhao, Q.; Cai, X. Machine learning based estimation of land productivity in the contiguous US using biophysical predictors. Environ. Res. Lett.
**2020**, 15, 074013. [Google Scholar] [CrossRef][Green Version] - Leng, G.; Hall, J.W. Predicting spatial and temporal variability in crop yields: An inter-comparison of machine learning, regression and process-based models. Environ. Res. Lett.
**2020**, 15, 044027. [Google Scholar] [CrossRef] - Nikoloski, S.; Murphy, P.; Kocev, D.; Džeroski, S.; Wall, D.P. Using machine learning to estimate herbage production and nutrient uptake on Irish dairy farms. J. Dairy Sci.
**2019**, 102, 10639–10656. [Google Scholar] [CrossRef][Green Version] - McKinney, W. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, Austin, TX, USA, 28 June–3 July 2010; pp. 51–56. [Google Scholar]
- Hunter, J.D. Matplotlib: A 2D Graphics Environment. Comput. Sci. Eng.
**2007**, 9, 90–95. [Google Scholar] [CrossRef] - Waskom, M.; Botvinnik, O.; Drewokane; Hobson, P.; David; Halchenko, Y.; Lee, A. Seaborn: v0. 7.1. Zenodo
**2016**. [Google Scholar] [CrossRef] - Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Vanderplas, J. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res.
**2011**, 12, 2825–2830. [Google Scholar] - Oliphant, T.E. A Guide to NumPy; Trelgol Publishing: Wilmington, DE, USA, 2006; Volume 1, p. 85. [Google Scholar]
- Van Der Walt, S.; Colbert, S.C.; Varoquaux, G. The NumPy Array: A Structure for Efficient Numerical Computation. Comput. Sci. Eng.
**2011**, 13, 22–30. [Google Scholar] [CrossRef][Green Version] - Witten, I.H.; Frank, E.; Hall, M.A. Data Mining: Practical Machine Learning Tools and Techniques, 3rd ed.; Elsevier Morgan Kaufmann: San Francisco, CA, USA, 2011. [Google Scholar]

**Figure 1.**Performance of models with k features and all features made available for feature selection. The average R value of the models is shown. SelectKBest feature selection was used with K values from K = 3 to K = 11. Note that the average R value for Bayesian ridge regression and linear regression were much lower than any of the other models, so they were not shown here.

**Figure 2.**The results from linear regression and Bayesian ridge regression were much lower than the other models, so their results are not shown here. The results are shown explicitly in Table 3.

**Figure 3.**Results from Cfs feature selection with all features. The results from linear regression and Bayesian ridge regression were much lower than the other models, so their results are not shown here. The results are shown explicitly in Table 4.

**Figure 4.**Correlation heat map between features. A heat map showing the value of the correlation coefficient between each possible pair of features. We see higher correlations, positive and negative, between yield and Julian day, time since sown, radiation, rainfall, day length, and others.

**Figure 5.**Results from Cfs feature selection with no percent cover. The results from linear regression and Bayesian ridge regression were too low to show. The results are shown explicitly in Table 5.

**Figure 6.**Results from ReliefF feature selection. The results from linear regression and Bayesian ridge regression were much lower than the other models, so their results are not shown here. The results are shown explicitly in Table 7.

**Figure 7.**Results from Wrapper feature selection operator. The results from linear regression and Bayesian ridge regression were much lower than the other models, so their results are not shown here. The results are shown explicitly in Table 8.

Feature Name | Value | Abbreviation |
---|---|---|

Julian day of harvest | 249.00 | JD |

Number of days since the crop was sown | 643.00 | DSS |

Number of days since last harvest | 30.00 | DSH |

Total solar radiation since the previous harvest (MJ/m^{2}) | 610.29 | Sol |

Total rainfall since the previous harvest(mm) | 98.83 | Rain |

Avg air temp since the previous harvest (C) | 25.33 | T |

Avg max air temp since the previous harvest (C) | 31.25 | MaxT |

Avg min air temp since the previous harvest (C) | 19.1 | MinT |

Avg soil moisture since the previous harvest (%) | 0.11 | SM |

Interpolated percent cover for the day of the harvest (%) | 78.82 | PC |

Day length on the day of the harvest (hrs) | 12.62 | DL |

Classes | Yield (t) |
---|---|

1 | 0.01–0.74 |

2 | 0.75–1.24 |

3 | 1.25+ |

Model | Mean Absolute Error (MAE) (lbs./acre) | R | R^{2} |
---|---|---|---|

Support vector machine | 209.888 | 0.948 | 0.895 |

K-nearest neighbors | 205.418 | 0.946 | 0.891 |

Random forest | 207.448 | 0.945 | 0.887 |

Neural network | 232.937 | 0.937 | 0.873 |

Regression tree | 236.039 | 0.927 | 0.849 |

Linear regression | 358.454 | 0.818 | 0.664 |

Bayesian ridge regression | 357.686 | 0.818 | 0.663 |

**Table 4.**Results from Cfs feature selection with all features. These average scores are from using the features Julian day, total solar radiation, total rainfall, and percent cover.

Model | Mean Absolute Error (lbs./acre) | R | R^{2} |
---|---|---|---|

Random forest | 228.651 | 0.933 | 0.865 |

Support vector machine | 248.458 | 0.925 | 0.851 |

K-nearest neighbors | 251.494 | 0.914 | 0.831 |

Regression tree | 272.247 | 0.9 | 0.8 |

Neural network | 293.606 | 0.887 | 0.778 |

Linear regression | 382.928 | 0.792 | 0.627 |

Random forest | 228.651 | 0.933 | 0.865 |

Bayesian ridge regression | 383.459 | 0.79 | 0.619 |

**Table 5.**Results from Cfs feature selection with no percent cover. The average scores from using the features Julian day, number of days since the sown date, total solar radiation, and total rainfall.

Model | Mean Absolute Error (lbs./acre) | R | R^{2} |
---|---|---|---|

K-nearest neighbors | 193.938 | 0.952 | 0.904 |

Random forest | 196.539 | 0.952 | 0.903 |

Regression tree | 200.052 | 0.95 | 0.899 |

Support vector machine | 231.222 | 0.936 | 0.871 |

Neural network | 260.651 | 0.911 | 0.821 |

Bayesian ridge regression | 372.945 | 0.8 | 0.632 |

Linear regression | 372.547 | 0.798 | 0.632 |

**Table 6.**p-values between the R

^{2}values of the models trained by the two CfsSubsetEval feature sets. The results were found by doing unpaired two-tailed t tests. The first feature set contained the Julian day, total solar radiation, total rainfall, and percent cover. The second feature set contained the Julian day, the number of days since the sown date, total solar radiation, and the total rainfall. Significant results are shown in bold.

Model | T Test Results |
---|---|

Random forest | 0.0046 |

K-nearest neighbor | 0.0007 |

Regression tree | 0.0103 |

Support vector regression | 0.2820 |

Neural network | 0.2070 |

Linear regression | 0.8940 |

Bayesian ridge regression | 0.7481 |

**Table 7.**Results from ReliefF feature selection. The average scores from using the features number of days since the sown date, total rainfall, and the average minimum temperature since the previous harvest.

Model | Mean Absolute Error (lbs./acre) | R | R^{2} |
---|---|---|---|

K-nearest neighbors | 195.86 | 0.953 | 0.905 |

Random forest | 197.026 | 0.95 | 0.9 |

Regression tree | 199.584 | 0.948 | 0.897 |

Neural network | 357.532 | 0.842 | 0.7 |

Support vector machine | 344.604 | 0.83 | 0.688 |

Linear regression | 667.121 | 0.262 | 0.05 |

Bayesian ridge regression | 666.844 | 0.258 | 0.049 |

**Table 8.**Results from Wrapper feature selection operator. The average scores from using the features number of days since the sown date, total rainfall, day length, and the Julian day.

Model | Mean Absolute Error (lbs./acre) | R | R^{2} |
---|---|---|---|

K-nearest neighbors | 199.28 | 0.952 | 0.904 |

Random rorest | 197.782 | 0.952 | 0.903 |

Regression tree | 200.208 | 0.951 | 0.902 |

Support vector machine | 261.395 | 0.917 | 0.835 |

Neural network | 300.245 | 0.883 | 0.776 |

Linear regression | 370.509 | 0.807 | 0.651 |

Bayesian ridge regression | 372.011 | 0.8 | 0.634 |

**Table 9.**p-values between R

^{2}values of different feature selection operators. Results from unpaired two-tail t tests. ‘All’ represents the results from Table 3, ‘Cfs’ represents the results which used the features from Figure 5/Table 5, ‘ReliefF’ represents the results from Figure 6/Table 7, and ‘Wrapper’ represents the results from Figure 7/Table 8. If a p-value is followed by a parenthesis, the value in the parentheses is an abbreviation of the feature selection method that resulted in the higher average R

^{2}value. Lower p-values are better, and the lowest are bolded.

T Test | RF | KNN | RT | SVR | NN | Lin | Bayes |
---|---|---|---|---|---|---|---|

All vs. Cfs | 0.2973 | 0.3303 | 0.0086 (C) | 0.0559 | 0.0871 | 0.3758 | 0.3795 |

All vs. ReliefF | 0.4631 | 0.2306 | 0.0140 (R) | 0.0001 (A) | 0.0010 (A) | 2 × 10^{−13} (A) | 3 × 10^{−15} (A) |

All vs. Wrapper | 0.2398 | 0.3321 | 0.0045 (W) | 0.0038 (A) | 0.0035 (A) | 0.7555 | 0.3569 |

Cfs vs. ReliefF | 0.8331 | 0.9179 | 0.8967 | 0.0002 (C) | 0.0156 (C) | 3 × 10^{−12} (C) | 3 × 10^{−11} (C) |

Cfs vs. Wrapper | 0.9867 | 0.9804 | 0.7840 | 0.0685 | 0.2196 | 0.6726 | 0.9486 |

ReliefF vs. Wrapper | 0.8057 | 0.8924 | 0.6999 | 0.0014 (W) | 0.1052 | 5 × 10^{−10} (W) | 8 × 10^{−13} (W) |

**Table 10.**Best feature selection operators for each machine learning method. There is no significant difference between the results in the same cell. ‘All’ refers to all features being used, ‘Cfs’ refers to the set of features found by CfsSubsetEval, ‘ReliefF’ refers to the set of features found by ReliefFAttributeEval, and ’Wrapper’ refers to the set of features found by ‘WrapperSubsetEval’.

Machine Learning Method | Feature Selection Operator that Led to the Best Results |
---|---|

Random forest | All, Cfs, ReliefF, Wrapper |

K-nearest neighbors | All, Cfs, ReliefF, Wrapper |

Regression tree | Cfs, ReliefF, Wrapper |

Support vector regression | All, Cfs |

Neural network | All, Cfs |

Linear regression | All, Cfs, Wrap |

Bayesian ridge regression | All, Cfs, Wrap |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Whitmire, C.D.; Vance, J.M.; Rasheed, H.K.; Missaoui, A.; Rasheed, K.M.; Maier, F.W.
Using Machine Learning and Feature Selection for Alfalfa Yield Prediction. *AI* **2021**, *2*, 71-88.
https://doi.org/10.3390/ai2010006

**AMA Style**

Whitmire CD, Vance JM, Rasheed HK, Missaoui A, Rasheed KM, Maier FW.
Using Machine Learning and Feature Selection for Alfalfa Yield Prediction. *AI*. 2021; 2(1):71-88.
https://doi.org/10.3390/ai2010006

**Chicago/Turabian Style**

Whitmire, Christopher D., Jonathan M. Vance, Hend K. Rasheed, Ali Missaoui, Khaled M. Rasheed, and Frederick W. Maier.
2021. "Using Machine Learning and Feature Selection for Alfalfa Yield Prediction" *AI* 2, no. 1: 71-88.
https://doi.org/10.3390/ai2010006