# Crop Yield Prediction through Proximal Sensing and Machine Learning Algorithms

^{1}

^{2}

^{*}

## Abstract

**:**

^{2}locations in each field, four times throughout the growing season, and yield samples were collected manually at the end of the growing season. Four datasets, namely PE-2017, PE-2018, NB-2017, and NB-2018, were then formed by combing data points from three fields to represent the province data for the respective years. Modeling techniques were employed to generate yield predictions assessed with different statistical parameters. The SVR models outperformed all other models for NB-2017, NB-2018, PE-2017, and PE-2018 dataset with RMSE of 5.97, 4.62, 6.60, and 6.17 t/ha, respectively. The performance of k-NN remained poor in three out of four datasets, namely NB-2017, NB-2018, and PE-2017 with RMSE of 6.93, 5.23, and 6.91 t/ha, respectively. The study also showed that large datasets are required to generate useful results using either model. This information is needed for creating site-specific management zones for potatoes, which form a significant component for food security initiatives across the globe.

## 1. Introduction

^{2}to collect soil and physiochemical properties.

## 2. Materials and Methods

#### 2.1. Collection of Data and the Study Sites

#### 2.2. Proximal Sensing Data

#### 2.3. Soil Sampling Data

#### 2.4. Yield Data

^{2}was marked in each grid to manually dig the soil out and collect the potato tubers in separate plastic buckets. The potato tubers collected in buckets were weighed on a digital field balance to determine tuber yield (kg). The potatoes were reburied back into the soil for farmer’s harvest.

#### 2.5. Machine Learning Algorithms

#### 2.5.1. Linear Regression

#### 2.5.2. Elastic Net

#### 2.5.3. k-Nearest Neighbors (k-NN)

#### 2.5.4. Support Vector Regression

#### 2.6. Tuning of Hyperparameter for Reproducibility

#### 2.7. Model Evaluation Criteria

^{2}), mean absolute error (MAE), and root means square error (RMSE) were among the statistical parameters used for evaluating the accuracy of the models in predicting the values close to the observed ones. These statistical measures are well-known matrices [27,28] were calculated as:

## 3. Results and Discussion

#### 3.1. Descriptive Statistics

#### 3.2. Correlation Analysis

#### 3.3. Evaluation of Machine Learning Algorithms

^{2}for three runs of LR were 0.72, 0.62, and 0.75 respectively, while mean R

^{2}was 0.70 with a standard deviation of 0.05 (Table 4). The R

^{2}for three runs of EN were 0.61, 0.61, and 0.71, respectively. Relatively lower mean R

^{2}of 0.65 was observed for EN in comparison with LR; however, a slightly lower standard deviation of 0.04 was observed for EN. The lowest mean R

^{2}of 0.62 was recorded for k-NN algorithm with the highest standard deviation of 0.09. The highest mean R

^{2}was recorded by SVR algorithm with slightly higher standard deviation of 0.07 in comparison with LR and EN. The MAE and RMSE for NB-2017 were in the ranges of 4.68–5.60 and 5.97–6.93 t/ha, respectively, for all algorithms. The lowest MAE and RMSE were recorded for the SVR algorithm, e.g., 4.68 and 5.97 t/ha, respectively.

^{2}of 0.54, 0.73, and 0.64 for three runs of the test set, respectively, with a relatively higher standard deviation than other algorithms, e.g., 0.07. EN performed relatively better with a higher mean R

^{2}of 0.65 and with the lowest standard deviation of 0.01 for PE-2017 dataset in the testing phase. A similar poor performance of k-NN was observed for PE-2017 dataset as recorded in NB-2017 dataset in comparison to other algorithms (Figure 3). Three runs of testing trials for SVR yielded R

^{2}of 0.57, 0.71, and 0.67 with a standard deviation of 0.07. Similarly, mean R

^{2}of 0.65 was recorded as in the case of EN regressor; however, the lowest MAE (5.18 t/ha) and RMSE (6.60 t/ha) values were recorded for SVR for PE-2017 dataset.

^{2}(0.63) for NB-2018 datasets. However, slightly lower MAE (3.59 t/ha) and RMSE (4.69 t/ha) were recorded for LR in comparison with EN. The highest mean R

^{2}of 0.65 was recorded for SVR for NB-2018 dataset (Figure 3).

^{2}of 0.54 was recorded for both SVR and k-NN algorithms for PE-2018 dataset.

#### 3.4. Comparative Analysis of Machine Learning Algorithms

^{2}were observed for NB datasets. For the NB-2018 dataset, mean validation accuracies ranged 0.53–0.65 and for NB-2017 dataset slightly higher ranges were observed, e.g., 0.62–0.72. In comparison with NB datasets, PE-2017 dataset showed the narrowest range of accuracies (0.64–0.65). No major effects of different algorithms were apparent for PE-2018 dataset; however, the k-NN algorithm performed unexpectedly better in comparison with other datasets (Figure 4) as there were less correlated variables for this dataset (Figure 2).

## 4. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Agriculture and Agri-Food Canada (AAFC) Potato Market Information Review 2016–2017. Available online: https://www5.agr.gc.ca/eng/industry-markets-and-trade/canadian-agri-food-sector-intelligence/horticulture/horticulture-sector-reports/potato-market-information-review-2016-2017/?id=1536104016530#a1.2.3 (accessed on 15 January 2020).
- Drummond, S.T.; Sudduth, K.A.; Joshi, A.; Birrell, S.J.; Kitchen, N.R. Statistical and neural methods for site-specific yield prediction. Trans. Am. Soc. Agric. Eng.
**2003**, 46, 5–14. [Google Scholar] [CrossRef][Green Version] - Varcoe, V.J. A note on the computer simulation of crop growth in agricultural land evaluation. Soil Use Manag.
**1990**, 6, 157–160. [Google Scholar] [CrossRef] - Farooque, A.A.; Zaman, Q.U.; Schumann, A.W.; Madani, A.; Percival, D.C. Response of wild blueberry yield to spatial variability of soil properties. Soil Sci.
**2012**, 177, 56–68. [Google Scholar] [CrossRef] - Kitchen, N.R.; Drummond, S.T.; Lund, E.D.; Sudduth, K.A.; Buchleiter, G.W. Soil electrical conductivity and topography related to yield for three contrasting soil-crop systems. Agron. J.
**2003**, 95, 483–495. [Google Scholar] - Drummond, S.T.; Birrell, S.; Sudduth, K.A. Analysis and correlation methods for spatial data. ASAE
**1995**, 95, 9. [Google Scholar] - Dai, X.; Huo, Z.; Wang, H. Simulation for response of crop yield to soil moisture and salinity with artificial neural network. Field Crop. Res.
**2011**, 121, 441–449. [Google Scholar] [CrossRef] - Cousens, R. An empirical model relating crop yield to weed and crop density and a statistical comparison with other models. J. Agric. Sci.
**1985**, 105, 513–521. [Google Scholar] [CrossRef] - Dourado-Neto, D.; Teruel, D.A.; Reichardt, K.; Nielsen, D.R.; Frizzone, J.A.; Bacchi, O.O.S. Principles of crop modeling and simulation: I. uses of mathematical models in agricultural science. Sci. Agric.
**1998**, 55, 46–50. [Google Scholar] [CrossRef][Green Version] - Doraiswamy, P.C.; Moulin, S.; Cook, P.W.; Stern, A. Crop yield assessment from remote sensing. Photogramm. Eng. Remote Sens.
**2003**, 69, 665–674. [Google Scholar] [CrossRef] - Prasad, A.K.; Chai, L.; Singh, R.P.; Kafatos, M. Crop yield estimation model for Iowa using remote sensing and surface parameters. Int. J. Appl. Earth Obs. Geoinf.
**2006**, 8, 26–33. [Google Scholar] [CrossRef] - Kaul, M.; Hill, R.L.; Walthall, C. Artificial neural networks for corn and soybean yield prediction. Agric. Syst.
**2005**, 85, 1–18. [Google Scholar] [CrossRef] - Miao, Y.; Mulla, D.J.; Robert, P.C. Identifying important factors influencing corn yield and grain quality variability using artificial neural networks. Precis. Agric.
**2006**, 7, 117–135. [Google Scholar] [CrossRef] - Das, B.; Nair, B.; Reddy, V.K.; Venkatesh, P. Evaluation of multiple linear, neural network and penalised regression models for prediction of rice yield based on weather parameters for west coast of India. Int. J. Biometeorol.
**2018**, 62, 1809–1822. [Google Scholar] [CrossRef] [PubMed] - Shahhosseini, M.; Martinez-Feria, R.A.; Hu, G.; Archontoulis, S.V. Maize yield and nitrate loss prediction with machine learning algorithms. Environ. Res. Lett.
**2019**, 14, 124026. [Google Scholar] [CrossRef][Green Version] - Pantazi, X.E.; Moshou, D.; Alexandridis, T.; Whetton, R.L.; Mouazen, A.M. Wheat yield prediction using machine learning and advanced sensing techniques. Comput. Electron. Agric.
**2016**, 121, 57–65. [Google Scholar] [CrossRef] - Farooque, A.; Zare, M.; Zaman, Q.; Abbas, F.; Bos, M.; Esau, T.; Acharya, B.; Schumann, A. Evaluation of DualEM-II sensor for soil moisture content estimation in the potato fields of Atlantic Canada. Plant Soil Environ.
**2019**, 65, 290–297. [Google Scholar] [CrossRef][Green Version] - Taylor, R. Introducing Dualem to the IUSS Working Group on Proximal Soil Sensing. Available online: http://www.landbrugsinfo.dk/Planteavl/Praecisionsjordbrug-og-GIS/Filer/pl_11_562_b1_Dualem.pdf (accessed on 4 May 2020).
- Heiri, O.; Lotter, A.F.; Lemcke, G. Loss on ignition as a method for estimating organic and carbonate content in sediments: Reproducibility and comparability of results. J. Paleolimnol.
**2001**, 25, 101–110. [Google Scholar] [CrossRef] - Patterson, G.T.; Carter, M.R. Soil Sampling and Methods of Analysis, 2nd ed.; Carter, M.R., Gregorich, E.G., Eds.; CRC Press Taylor & Francis Group: Boca Raton, FL, USA, 2007; Volume 44, ISBN 9780849335860. [Google Scholar]
- Angrist, J.D.; Pischke, J.-S. Mostly Harmless Econometrics: An Empiricist’s Companion; Princeton University Press: Princeton, NJ, USA, 2008. [Google Scholar]
- Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol.
**2005**, 67, 301–320. [Google Scholar] [CrossRef][Green Version] - Fix, E. Discriminatory Analysis: Nonparametric Discrimination, Consistency Properties; USAF School of Aviation Medicine: Dayton, OH, USA, 1951. [Google Scholar]
- Cover, T.M.; Hart, P.E. Nearest Neighbor Pattern Classification. IEEE Trans. Inf. Theory
**1967**, 13, 21–27. [Google Scholar] [CrossRef] - Drucker, H.; Surges, C.J.C.; Kaufman, L.; Smola, A.; Vapnik, V. Support vector regression machines. In Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 1997; Volume 9, pp. 155–161. [Google Scholar]
- Kastens, J.H. Small sample behaviors of the delete-d cross validation statistic. Open J. Stat.
**2015**, 5. [Google Scholar] [CrossRef][Green Version] - Zhang, J.; Zhu, Y.; Zhang, X.; Ye, M.; Yang, J. Developing a Long Short-Term Memory (LSTM) based model for predicting water table depth in agricultural areas. J. Hydrol.
**2018**, 561, 918–929. [Google Scholar] [CrossRef] - Afzaal, H.; Farooque, A.A.; Abbas, F.; Acharya, B.; Esau, T. Groundwater estimation from major physical hydrology components using artificial neural networks and deep learning. Water
**2019**, 12, 5. [Google Scholar] [CrossRef][Green Version] - Üstün, B.; Melssen, W.J.; Buydens, L.M.C. Facilitating the application of Support Vector Regression by using a universal Pearson VII function based kernel. Chemom. Intell. Lab. Syst.
**2006**, 81, 29–40. [Google Scholar] [CrossRef] - Pan, Y.; Jiang, J.; Wang, R.; Cao, H. Advantages of support vector machine in QSPR studies for predicting auto-ignition temperatures of organic compounds. Chemom. Intell. Lab. Syst.
**2008**, 92, 169–178. [Google Scholar] [CrossRef] - Poudel, S.; Shaw, R. The relationships between climate variability and crop yield in a mountainous environment: A case study in Lamjung District, Nepal. Climate
**2016**, 4, 13. [Google Scholar] [CrossRef][Green Version] - Maqsood, J.; Farooque, A.A.; Wang, X.; Abbas, F.; Acharya, B.; Afzaal, H. Contribution of Climate Extremes to Potato Tuber Yield: A Sustainability Prospective for Future Strategies. Sustainability
**2020**, 12, 4937. [Google Scholar] [CrossRef] - Farooque, A.A.; Zare, M.; Abbas, F.; Bos, M.; Esau, T.; Zaman, Q. Forecasting potato tuber yield using a soil electromagnetic induction method. Eur. J. Soil Sci.
**2019**, 1–18. [Google Scholar] [CrossRef] - Zare, M.; Farooque, A.A.; Abbas, F.; Zaman, Q.; Bos, M. Trends in the variability of potato tuber yield under selected land and soil characteristics. Plant Soil Environ.
**2019**, 65, 111–117. [Google Scholar] [CrossRef][Green Version] - Afzaal, H.; Farooque, A.A.; Abbas, F.; Acharya, B.; Esau, T. Precision Irrigation Strategies for Sustainable Water Budgeting of Potato Crop in Prince Edward Island. Sustainability
**2020**, 12, 2419. [Google Scholar] [CrossRef][Green Version] - Abera Guluma, D. International journal of agriculture & agribusiness factors affecting potato (Solanum tuberosum L.) tuber seed quality in mid and highlands: A review dejene abera guluma. Int. J. Zambrut
**2020**, 7, 24–40. [Google Scholar] - Nurmanov, Y.T.; Chernenok, V.G.; Kuzdanova, R.S. Potato in response to nitrogen nutrition regime and nitrogen fertilization. Field Crop. Res.
**2019**, 231, 115–121. [Google Scholar] [CrossRef] - Wang, X.; Guo, T.; Wang, Y.; Xing, Y.; Wang, Y.; He, X. Exploring the optimization of water and fertilizer management practices for potato production in the sandy loam soils of Northwest China based on PCA. Agric. Water Manag.
**2020**, 237, 106180. [Google Scholar] [CrossRef] - Kumar, N.; Prasad, V.; Pal Yadav, N. Effect of chemical fertilizers and bio fertilizers on flower yield, tuberous root yield and quality parameter on dahlia (Dahlia variabilis L.) cv. Kenya orange. J. Pharmacogn. Phytochem.
**2019**, 8, 2265–2267. [Google Scholar]

**Figure 1.**Selected field locations in Prince Edward Island and Brunswick provinces. Fields 1, 2, and 3 were in Prince Edward Island. Fields 1, 2, 3, 4, 5 and 6 were in New Brunswick.

**Figure 2.**Pearson correlation analysis of selected variables for this study where all possible relationships within variables are presented. PRP is perpendicular/vertical coplanar geometry. HCP is horizontal coplanar geometry. SOM is soil organic matter (%). NDVI is normalized difference vegetation index.

**Table 1.**Description of study sites, study years, datasets, the data used for training and testing of machine learning algorithms and potato fields used for data collection.

Province | Year | Dataset Name | Training Points | Testing Points | Fields Location |
---|---|---|---|---|---|

Prince Edward Island | 2017 | PE-2017 | 80 | 40 | Field 1 |

Field 2 | |||||

Field 3 | |||||

2018 | PE-2018 | 79 | 40 | Field 1 | |

Field 2 | |||||

Field 3 | |||||

New Brunswick | 2017 | NB-2017 | 80 | 40 | Field 1 |

Field 2 | |||||

Field 3 | |||||

2018 | NB-2018 | 80 | 40 | Field 4 | |

Field 5 | |||||

Field 6 |

Algorithm | Hyperparameters Tuning | ||
---|---|---|---|

Elastic net | Penalty multiplier | Alpha | 1 |

Mixing parameters of penalties | L1 Ratio | 0.5 | |

Number of repetitions | Maximum iterations | 1000 | |

Random number updates | Selection method | Cyclic | |

Random number generator | Random state | Seed | |

k-nearest neighbor | Number of neighbors | n_neighbors | 5 |

Assignment of weight | weight | uniform | |

Controlling parameter | leaf size | 30 | |

Distance calculation parameter | P | 2 | |

Distance calculation method | Metric | Minkowski | |

Support vector regression | Defining algorithms | Kernel | Linear |

Regularization parameter | C | 1 | |

Kernel coefficient | Gamma | Scale | |

Penalty association | Epsilon | 0.1 | |

Reducing factor | shrinking | TRUE | |

Linear Regression | Intercept calculation | Fit Intercept | TRUE |

Data normalization | Normalize | FALSE | |

True X copying | Copy_X | TRUE | |

Number of iterations | n_jobs | None |

Field | Variable | Mean ± SD | Minimum | Maximum | Variable | Mean ± SD | Minimum | Maximum |
---|---|---|---|---|---|---|---|---|

NB-2017 | Yield (t/ha) | 53.9 ± 11.2 | 26.2 | 78.9 | Slope (%) | 2.42 ± 1.31 | 0.10 | 4.94 |

NB-2018 | 43.4 ± 8.17 | 23.3 | 61.0 | 2.79 ± 1.79 | 0.20 | 8.10 | ||

PE-2017 | 47.7 ± 11.3 | 25.5 | 80.0 | 2.04 ± 1.10 | 0.40 | 5.00 | ||

PE-2018 | 48.2 ± 10.0 | 26.2 | 83.2 | 2.27 ± 0.68 | 0.76 | 4.69 | ||

NB-2017 | HCP (mS/m) | 5.85 ± 1.52 | 2.54 | 10.8 | SOM (%) | 3.82 ± 0.86 | 2.20 | 6.63 |

NB-2018 | 5.42 ± 1.54 | 2.40 | 8.60 | 3.81 ± 0.79 | 2.60 | 5.90 | ||

PE-2017 | 6.31 ± 1.87 | 2.80 | 10.0 | 2.22 ± 0.50 | 0.80 | 3.20 | ||

PE-2018 | 6.56 ± 1.32 | 3.38 | 10.5 | 2.66 ± 0.37 | 1.25 | 3.85 | ||

NB-2017 | PRP (mS/m) | 5.14 ± 1.47 | 1.74 | 9.50 | Soil pH | 5.61 ± 0.40 | 4.85 | 7.10 |

NB-2018 | 4.04 ± 1.27 | 1.30 | 7.10 | 5.79 ± 0.55 | 4.60 | 7.20 | ||

PE-2017 | 4.69 ± 1.45 | 1.40 | 7.70 | 5.55 ± 0.21 | 5.10 | 6.10 | ||

PE-2018 | 4.09 ± 1.14 | 1.87 | 7.45 | 5.71 ± 0.26 | 5.15 | 6.50 | ||

NB-2017 | Soil Moisture (%) | 17.5 ± 3.89 | 9.96 | 27.7 | NDVI | 0.79 ± 0.06 | 0.66 | 0.92 |

NB-2018 | 8.37 ± 2.85 | 3.40 | 16.3 | 0.58 ± 0.06 | 0.50 | 0.70 | ||

PE-2017 | 15.6 ± 3.95 | 6.80 | 25.7 | 0.83 ± 0.06 | 0.70 | 0.90 | ||

PE-2018 | 11.0 ± 1.77 | 6.33 | 16.7 | 0.50 ± 0.10 | 0.35 | 0.92 |

Site | Year | Algorithm | MAE (t/ha) | RMSE (t/ha) | Mean R^{2} | Std. Dev. (R^{2}) |
---|---|---|---|---|---|---|

New Brunswick | 2018 | Linear Regression | 3.59 | 4.69 | 0.63 | 0.04 |

Elastic Net | 3.79 | 4.72 | 0.63 | 0.06 | ||

k-Nearest Neighbor | 4.21 | 5.23 | 0.53 | 0.07 | ||

Support vector regression | 3.60 | 4.62 | 0.65 | 0.06 | ||

2017 | Linear Regression | 4.77 | 6.19 | 0.70 | 0.05 | |

Elastic Net | 5.60 | 6.67 | 0.65 | 0.04 | ||

k-Nearest Neighbor | 5.57 | 6.93 | 0.62 | 0.09 | ||

Support vector regression | 4.68 | 5.97 | 0.72 | 0.07 | ||

Prince Edward Island | 2018 | Linear Regression | 5.01 | 6.24 | 0.53 | 0.09 |

Elastic Net | 5.27 | 6.54 | 0.49 | 0.11 | ||

k-Nearest Neighbor | 4.85 | 6.49 | 0.54 | 0.12 | ||

Support vector regression | 4.95 | 6.17 | 0.54 | 0.09 | ||

2017 | Linear Regression | 5.23 | 6.70 | 0.64 | 0.07 | |

Elastic Net | 5.57 | 6.74 | 0.65 | 0.01 | ||

k-Nearest Neighbor | 5.62 | 6.91 | 0.64 | 0.05 | ||

Support vector regression | 5.18 | 6.60 | 0.65 | 0.06 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Abbas, F.; Afzaal, H.; Farooque, A.A.; Tang, S. Crop Yield Prediction through Proximal Sensing and Machine Learning Algorithms. *Agronomy* **2020**, *10*, 1046.
https://doi.org/10.3390/agronomy10071046

**AMA Style**

Abbas F, Afzaal H, Farooque AA, Tang S. Crop Yield Prediction through Proximal Sensing and Machine Learning Algorithms. *Agronomy*. 2020; 10(7):1046.
https://doi.org/10.3390/agronomy10071046

**Chicago/Turabian Style**

Abbas, Farhat, Hassan Afzaal, Aitazaz A. Farooque, and Skylar Tang. 2020. "Crop Yield Prediction through Proximal Sensing and Machine Learning Algorithms" *Agronomy* 10, no. 7: 1046.
https://doi.org/10.3390/agronomy10071046