Leveraging Important Covariate Groups for Corn Yield Prediction
Abstract
:1. Introduction
2. Materials and Methods
2.1. Datasets
Variable Category | Variable Name (Units) | |
---|---|---|
Spatio- Temporal | Time Space | Year Farm Resource Region (FRR) Latitude Longitude |
Biophysical | Topography | Slope Elevation |
Climate | Growing degree days Temperature seasonality (standard deviation × 100) Mean temperature of the wettest quarter (°C) Mean temperature of the driest quarter (°C) Mean diurnal range (°C) Total growing season precipitation (mm) Precipitation seasonality (coefficient of variation) Precipitation of the warmest quarter (mm) Precipitation of the coldest quarter (mm) Irrigation (percent agricultural land irrigated). | |
Soil | Topsoil organic carbon (% weight) Subsoil pH (H20) (−log(H+)) Topsoil soil cation exchange capacity (Cmol/kg) Topsoil reference bulk density (kg/dm3) | |
Diversity | Shannon’s Diversity Index | |
Farm (er) | Farm inputs/management | Fertilizer ($/acre) Chemicals ($/acre) Labor ($/acre) Machinery ($/acre) Corn acreage (% total agricultural acres) |
Farm assistance | Government payments ($/acre) ([25], p. 759). Insurance (% total agricultural acreage) ([25], p. 761). | |
Farm(er) characteristics | Years farming % farming as primary occupation % tenants Median farm size (acres per operation) |
2.2. Data Analysis
2.2.1. Variable Selection
2.2.2. Imputation
2.2.3. Modeling
- mtry: The number of variables to consider making splits in the regression trees that comprise the forest.
- nodesize: The minimum number of observations in the node of a regression tree that must be present to consider future splits. Larger values lead to less variability in prediction.
- Ntree: The number of trees in the forest. It is known that larger forests lead to greater accuracy, but with quickly diminishing returns that come with increased computational cost.
3. Results
3.1. US Corn Yield Predictions across Biophysical and Farm(er) Models before Group Exclusion
3.2. Group Exclusion, Predictive Accuracy, and Variable Importance
3.3. RF Results Including Only Spatiotemporal and Climate Variables
3.4. US Corn Yield in Ensemble Predicted Infilled Dataset
3.5. Comparison to Other Methods
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Bigelow, D.P.; Borchers, A. Major Uses of Land in the United States, 2012; U.S. Department of Agriculture, Economic Research Service: Washington, DC, USA, 2017. [Google Scholar]
- Liang, X.Z.; Wu, Y.; Chambers, R.G.; Schmoldt, D.L.; Gao, W.; Liu, C.; Liu, Y.A.; Sun, C.; Kennedy, J.A. Determining Climate Effects on US Total Agricultural Productivity. Proc. Natl. Acad. Sci. USA 2017, 114, E2285–E2292. [Google Scholar] [CrossRef] [Green Version]
- Mueller, N.D.; Gerber, J.S.; Johnston, M.; Ray, D.K.; Ramankutty, N.; Foley, J.A. Closing Yield Gaps through Nutrient and Water Management. Nature 2012, 490, 254–257. [Google Scholar] [CrossRef] [PubMed]
- Burchfield, E.; Matthews-Pennanen, N.; Schoof, J.; Lant, C. Changing Yields in the Central United States under Climate and Technological Change. Clim. Chang. 2020, 159, 329–346. [Google Scholar] [CrossRef]
- Ray, D.K.; Ramankutty, N.; Mueller, N.D.; West, P.C.; Foley, J.A. Recent Patterns of Crop Yield Growth and Stagnation. Nat. Commun. 2012, 3, 1293. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhao, C.; Liu, B.; Piao, S.; Wang, X.; Lobell, D.B.; Huang, Y.; Huang, M.; Yao, Y.; Bassu, S.; Ciais, P.; et al. Temperature Increase Reduces Global Yields of Major Crops in Four Independent Estimates. Proc. Natl. Acad. Sci. USA 2017, 114, 9326–9331. [Google Scholar] [CrossRef] [Green Version]
- Moore, F.C.; Baldos, U.L.C.; Hertel, T. Economic Impacts of Climate Change on Agriculture: A Comparison of Process-Based and Statistical Yield Models. Environ. Res. Lett. 2017, 12, 065008. [Google Scholar] [CrossRef]
- Jeong, J.H.; Resop, J.P.; Mueller, N.D.; Fleisher, D.H.; Yun, K.; Butler, E.E.; Timlin, D.J.; Shim, K.M.; Gerber, J.S.; Reddy, V.R.; et al. Random Forests for Global and Regional Crop Yield Predictions. PLoS ONE 2016, 11, e0156571. [Google Scholar] [CrossRef] [Green Version]
- Rissing, A.; Burchfield, E.K.; Spangler, K.A.; Schumacher, B.L. Implications of U.S. agricultural data practices for sustainable food systems research. Nat. Food. 2023, accepted. [Google Scholar] [CrossRef]
- Burchfield, E.K.; Nelson, K.S. Agricultural Yield Geographies in the United States. Environ. Res. Lett. 2021, 16, 054051. [Google Scholar] [CrossRef]
- Estes, L.D.; Bradley, B.A.; Beukes, H.; Hole, D.G.; Lau, M.; Oppenheimer, M.G.; Schulze, R.; Tadross, M.A.; Turner, W.R. Comparing Mechanistic and Empirical Model Projections of Crop Suitability and Productivity: Implications for Ecological Forecasting. Glob. Ecol. Biogeogr. 2013, 22, 1007–1018. [Google Scholar] [CrossRef]
- Lobell, D.; Asseng, S. Comparing estimates of climate change impacts from process-based and statistical crop models. Environ. Res. Lett. 2017, 12, 015001. [Google Scholar] [CrossRef]
- Lobell, D.B.; Burke, M.B. On the use of statistical models to predict crop yield responses to climate change. Agric. For. Meteorol. 2010, 150, 1443–1452. [Google Scholar] [CrossRef]
- Schlenker, W.; Roberts, M.J. Nonlinear Temperature Effects Indicate Severe Damages to U.S. Crop Yields under Climate Change. Proc. Natl. Acad. Sci. USA 2009, 106, 15594–15598. [Google Scholar] [CrossRef] [Green Version]
- Landau, S.; Mitchell, R.A.C.; Barnett, V.; Colls, J.J.; Craigon, J.; Payne, R.W. A parsimonious, multiple-regression model of wheat yield response to environment. Agric. For. Meteorol. 2000, 101, 151–166. [Google Scholar] [CrossRef]
- Sheehy, J.E.; Mitchell, P.L.; Ferrer, A.B. Decline in rice grain yields with temperature: Models and correlations can give different estimates. Field Crops Res. 2006, 98, 151–156. [Google Scholar] [CrossRef]
- Bali, N.; Singla, A. Emerging Trends in Machine Learning to Predict Crop Yield and Study Its Influential Factors: A Survey. Arch. Comput. Methods Eng. 2022, 29, 95–112. [Google Scholar] [CrossRef]
- Liakos, K.G.; Busato, P.; Moshou, D.; Pearson, S.; Bochtis, D. Machine learning in agriculture: A review. Sensors 2018, 18, 2674. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- van Klompenburg, T.; Kassahun, A.; Catal, C. Crop yield prediction using machine learning: A systematic literature review. Comput. Electron. Agric. 2020, 177, 105709. [Google Scholar] [CrossRef]
- Shahhosseini, M.; Hu, G.; Huber, I.; Archontoulis, S.V. Coupling Machine Learning and Crop Modeling Improves Crop Yield Prediction in the US Corn Belt. Sci. Rep. 2021, 11, 1–15. [Google Scholar] [CrossRef]
- USDA ERS. Farm Resource Regions. Agricultural Information Bulletin 760, Washington, DC: USDA Economic Research Service, 2000. Available online: https://www.ers.usda.gov/webdocs/publications/42298/32489_aib-760_002.pdf?v=42487 (accessed on 1 June 2022).
- PRISM Climate Group. Oregon State University, 2014. Available online: https://prism.oregonstate.edu (accessed on 1 June 2020).
- Cross, H.Z.; Zuber, M.S. Prediction of Flowering Dates in Maize Based on Different Methods of Estimating Thermal Units. Agron. J. 1972, 64, 351. [Google Scholar] [CrossRef]
- Thornton, M.M.; Shrestha, R.; Wei, Y.; Thornton, P.E.; Kao, S.-C.; Wilson, B.E. Daymet: Monthly Climate Summaries on a 1-km Grid for North America, Version 4 R1; ORNL DAAC: Oak Ridge, TN, USA, 2022. [Google Scholar] [CrossRef]
- USDA-NASS. 2017 Census of Agriculture: United States Summary and State Data. Volume 1, Geographic Area Series, Part 51, AC-17-A-51, 2019. Available online: https://www.nass.usda.gov/Publications/AgCensus/2017 (accessed on 1 June 2022).
- USDA-NASS. QuickStats Database. Available online: https://quickstats.nass.usda.gov/ (accessed on 1 June 2020).
- Pervez, M.S.; Brown, J.F. Mapping Irrigated Lands at 250-m Scale by Merging MODIS Data and National Agricultural Statistics. Remote Sens. 2010, 2, 2388–2412. [Google Scholar] [CrossRef] [Green Version]
- Wieder, W.R.; Boehnert, J.; Bonan, G.B.; Langseth, M. Regridded Harmonized World Soil Database v1.2.; ORNL DAAC: Oak Ridge, TN, USA, 2012. [Google Scholar] [CrossRef]
- USDA-NASS. USDA National Agricultural Statistics Service (NASS) Cropland Data Layer Published Crop-Specific Data Layer. Available online: https://nassgeodata.gmu.edu/CropScape/ (accessed on 1 June 2020).
- Burchfield, E.K.; Nelson, K.S.; Spangler, K. The Impact of Agricultural Landscape Diversification on U.S. Crop Production. Agric. Ecosyst. Environ. 2019, 285, 106615. [Google Scholar] [CrossRef]
- Meyer, H.; Reudenbach, C.; Wöllauer, S.; Nauss, T. Importance of spatial predictor variable selection in machine learning applications—Moving from data reproduction to spatial prediction. Ecol. Modell. 2019, 411, 108815. [Google Scholar] [CrossRef] [Green Version]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Biau, G.; Scornet, E. A Random Forest Guided Tour. Test 2016, 25, 197–227. [Google Scholar] [CrossRef] [Green Version]
- FAO/IIASA/ISRIC/ISS-CAS/JRC. Harmonized World Soil Database; Version 1.1; FAO: Rome, Italy; IIASA: Laxenburg, Austria, 2009; Available online: https://www.fao.org/3/aq361e/aq361e.pdf (accessed on 1 June 2020).
- Wright, M.N.; Wager, S.; Probst, P. Package ‘Ranger’, 2022. Available online: https://cran.r-project.org/web/packages/ranger/ranger.pdf (accessed on 1 June 2021).
- R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2022. Available online: https://www.R-project.org/ (accessed on 1 June 2020).
- Cutler, D.R.; Edwards Thomas, C.J.; Beard, K.H.; Cutler, A.; Hess, K.T.; Gibson, J.; Lawler, J.J. Random Forests for Classification in Ecology. Ecology 2007, 88, 2783–2792. [Google Scholar] [CrossRef]
- Liaw, A.; Wiener, M. Classification and Regression by RandomForest. R News 2002, 2, 18–22. [Google Scholar]
- Grömping, U. Variable importance assessment in regression: Linear regression versus random forest. Am. Stat. 2009, 63, 308–319. [Google Scholar] [CrossRef]
- Probst, P.; Wright, M.N.; Boulesteix, A.L. Hyperparameters and Tuning Strategies for Random Forest. WIREs Data Min. Knowl. Discov. 2019, 9, e1301. [Google Scholar] [CrossRef] [Green Version]
- Hastie, T.; Tibshirani, R.; Friedman, J.H.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 7th ed.; Springer: New York, NY, USA, 2017; pp. 181–184. [Google Scholar]
- Kutner, M.H.; Nachtsheim, C.J.; Neter, J.; Wasserman, W. Applied Linear Regression Models; McGraw-Hill/Irwin: New York, NY, USA, 2004; Volume 4, pp. 563–568. [Google Scholar]
- Haycock, S.; Bean, B. Stressor: Algorithms for Testing Models under Stress. 2023. Available online: https://github.com/beanb2/stressor (accessed on 22 January 2023).
- Ali, M. PyCaret: An Open Source, Low-Code Machine Learning Library in Python. 2020. Available online: https://www.pycaret.org (accessed on 22 January 2023).
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Troy, T.J.; Kipgen, C.; Pal, I. The Impact of Climate Extremes and Irrigation on US Crop Yields. Environ. Res. Lett. 2015, 10, 054013. [Google Scholar] [CrossRef] [Green Version]
- Perrone, D.; Jasechko, S. Deeper Well Drilling an Unsustainable Stopgap to Groundwater Depletion. Nat. Sustain. 2019, 2, 773–782. [Google Scholar] [CrossRef]
- Scanlon, B.R.; Faunt, C.C.; Longuevergne, L.; Reedy, R.C.; Alley, W.M.; McGuire, V.L.; McMahon, P.B. Groundwater Depletion and Sustainability of Irrigation in the US High Plains and Central Valley. Proc. Natl. Acad. Sci. USA 2012, 109, 9320–9325. [Google Scholar] [CrossRef] [Green Version]
- Smidt, S.J.; Haacker, E.M.K.; Kendall, A.D.; Deines, J.M.; Pei, L.; Cotterman, K.A.; Li, H.; Liu, X.; Basso, B.; Hyndman, D.W. Complex Water Management in Modern Agriculture: Trends in the Water-Energy-Food Nexus over the High Plains Aquifer. Sci. Total Environ. 2016, 566, 988–1001. [Google Scholar] [CrossRef] [Green Version]
- Ray, D.K.; Gerber, J.S.; Macdonald, G.K.; West, P.C. Climate Variation Explains a Third of Global Crop Yield Variability. Nat. Commun. 2015, 6, 5989. [Google Scholar] [CrossRef] [Green Version]
- Rosenzweig, C.; Tubiello, F.N.; Goldberg, R.; Mills, E.; Bloomfield, J. Increased crop damage in the US from excess precipitation under climate change. Glob. Environ. Chang. 2002, 12, 197–202. [Google Scholar] [CrossRef] [Green Version]
- Auffhammer, M.; Schlenker, W. Empirical Studies on Agricultural Impacts and Adaptation. Energy Econ. 2014, 46, 555–561. [Google Scholar] [CrossRef] [Green Version]
- Landis, D.A. Designing Agricultural Landscapes for Biodiversity-Based Ecosystem Services. Basic Appl. Ecol. 2017, 18, 1–12. [Google Scholar] [CrossRef] [Green Version]
- McDaniel, M.D.; Tiemann, L.K.; Grandy, A.S. Does Agricultural Crop Diversity Enhance Soil Microbial Biomass And. Ecol. Appl. 2014, 24, 560–570. [Google Scholar] [CrossRef] [Green Version]
- Tscharntke, T.; Klein, A.M.; Kruess, A.; Steffan-Dewenter, I.; Thies, C. Landscape Perspectives on Agricultural Intensification and Biodiversity—Ecosystem Service Management. Ecol. Lett. 2005, 8, 857–874. [Google Scholar] [CrossRef]
- Burchfield, E.K. Shifting Cultivation Geographies in the Central and Eastern US. Environ. Res. Lett. 2022, 17, 054049. [Google Scholar] [CrossRef]
- Hatfield, J.L.; Walthall, C.L. Meeting Global Food Needs: Realizing the Potential via Genetics × Environment × Management Interactions. Agron. J. 2015, 107, 1215–1226. [Google Scholar] [CrossRef] [Green Version]
- Grassini, P.; Thorburn, J.; Burr, C.; Cassman, K.G. High-yield irrigated maize in the Western U.S. Corn Belt: I. On-farm yield, yield potential, and impact of agronomic practices. Field Crops Res. 2011, 120, 142–150. [Google Scholar] [CrossRef] [Green Version]
- Kayad, A.; Sozzi, M.; Gatto, S.; Whelan, B.; Sartori, L.; Marinello, F. Ten years of corn yield dynamics at field scale under digital agriculture solutions: A case study from North Italy. Comput. Electron. Agric. 2021, 185, 106126. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Schumacher, B.L.; Burchfield, E.K.; Bean, B.; Yost, M.A. Leveraging Important Covariate Groups for Corn Yield Prediction. Agriculture 2023, 13, 618. https://doi.org/10.3390/agriculture13030618
Schumacher BL, Burchfield EK, Bean B, Yost MA. Leveraging Important Covariate Groups for Corn Yield Prediction. Agriculture. 2023; 13(3):618. https://doi.org/10.3390/agriculture13030618
Chicago/Turabian StyleSchumacher, Britta L., Emily K. Burchfield, Brennan Bean, and Matt A. Yost. 2023. "Leveraging Important Covariate Groups for Corn Yield Prediction" Agriculture 13, no. 3: 618. https://doi.org/10.3390/agriculture13030618
APA StyleSchumacher, B. L., Burchfield, E. K., Bean, B., & Yost, M. A. (2023). Leveraging Important Covariate Groups for Corn Yield Prediction. Agriculture, 13(3), 618. https://doi.org/10.3390/agriculture13030618