Diffuse reflectance spectroscopy (DRS) is emerging as a rapid and cost-effective alternative to routine laboratory analysis for many soil properties. However, it has primarily been applied in project-specific contexts. Here, we provide an assessment of DRS spectroscopy at the scale of the continental United States by utilizing the large (n
> 50,000) USDA National Soil Survey Center mid-infrared spectral library and associated soil characterization database. We tested and optimized several advanced statistical approaches for providing routine predictions of numerous soil properties relevant to studying carbon cycling. On independent validation sets, the machine learning algorithms Cubist and memory-based learner (MBL) both outperformed random forest (RF) and partial least squares regressions (PLSR) and produced excellent overall models with a mean R2
of 0.92 (mean ratio of performance to deviation = 6.5) across all 10 soil properties. We found that the use of root-mean-square error (RMSE) was misleading for understanding the actual uncertainty about any particular prediction; therefore, we developed routines to assess the prediction uncertainty for all models except Cubist. The MBL models produced much more precise predictions compared with global PLSR and RF. Finally, we present several techniques that can be used to flag predictions of new samples that may not be reliable because their spectra fall outside of the calibration set.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited