Can Low-Cost, Handheld Spectroscopy Tools Coupled with Remote Sensing Accurately Estimate Soil Organic Carbon in Semi-Arid Grazing Lands?

: Soil organic carbon inﬂuences several landscape ecological processes, and soils are be-coming recognized as a mechanism to mitigate the negative impacts of climate change. There is a need to deﬁne methods and technologies for addressing soils’ spatial variability as well as the time and cost of sampling soil organic carbon (SOC). Visible and near-infrared spectroscopy have been suggested as a sampling tool to reduce inventory cost. We sampled nineteen ranch properties totaling 17,347 ha across Oklahoma and Texas in 2019 to evaluate the effectiveness and accuracy of a handheld reﬂectometer (Our Sci, Ann Arbor, MI, USA) (370–940 nm) and existing remote sensing approaches to estimate SOC in semi-arid grazing lands. Our data suggest that the Our Sci Reﬂectometer estimated soil organic carbon with a precision of approximately ( ± 0.3% SOC); however, it was least accurate at higher carbon concentrations. The Our Sci reﬂectometer, although consistently accurate at lower SOC concentrations, was still less accurate than a model built using only remote sensing and digital soil map data as predictors. Combining the two data sources was the most accurate means of determining SOC. Our results indicated that the Our Sci handheld Vis-NIR reﬂectometer tested may have only limited applications for reducing inventory costs at scale.


Introduction
Soil is the largest terrestrial carbon store on Earth with approximately 1500 Gt of carbon (C) in the top meter [1]. This reservoir has decreased over time due to anthropogenic and natural disturbances, contributing to 78 ± 12 PgC being released to the atmosphere [2]. As a result, sequestering soil carbon by reversing these losses could be an important climate change mitigation strategy [3]. Researchers estimate that over the next 50 years, the global potential of soil organic carbon sequestration and restoration of degraded soils is approximately 0.6-1.2 PgC year −1 , suggesting a possible cumulative sink capacity of 30-60 PgC [4]. On decadal time scales, soil can serve as a carbon sink or source depending on climate and land-use history [5]. Consequently, applying soil-health-focused management in agricultural production systems that prioritize rebuilding soil carbon concentrations could provide multiple benefits to both the on-farm production system and society. All participating ranches were beef cattle (Bos taurus taurus) operations and fully integrate management goals around grazing intensity, frequency, and duration within an operational grazing management plan. Forty-two percent of the study site ranches comprised native vegetation only, primarily mid and tall warm-season grasses. However, some areas support shortgrass prairie communities. Fifty-eight percent of the properties in the study had a complement of introduced pasture, primarily bermudagrass (Cynodon dactylon (L.) Pers.), and some cropland. Although most of the study site ranches had a diversity of land uses, most of the randomly selected sampling locations were on rangeland due to rangelands comprising most of the total acreage. Around 72.2% of the ran- The dominant soil orders in the represented MLRAs are Mollisols, Alfisols, and Vertisols. The annual average precipitation in this area ranges from 635 to 965 mm. The yearly amount of precipitation can vary widely from year to year, with the predominance occurring as high-intensity, convective thunderstorms during spring and fall. The average Soil Syst. 2022, 6, 38 4 of 16 annual temperature is 14 to 18 • C. The freeze-free period averages 235 days and ranges from 205 to 265 days, respectively [28].

Sampling Design
Soil sampling sites were selected through stratified random sampling with the web application Stratifi [29]. The web app uses an unsupervised classification algorithm, WEKA X-Means [30], to incorporate data on vegetation productivity (Landsat 8 derived indices; 30 m resolution), topography/slope/aspect (National Elevation Dataset; 10 m resolution), and soil properties (gSSURGO 30 m resolution) within a pre-defined study area to define a series of "strata" or areas with similar combinations of the above attributes. The WEKA X-Means algorithm automatically selects the appropriate number of strata based on the variability of input layers within the study area. Stratifi then chooses a series of random sampling sites based on the desired sampling density and the relative size of each stratum.
Initial sampling locations were generated at double the desired density, then onsite verification determined the accessibility of each sampling location. An edge buffer constraint was added to each stratum, and sampling locations were generated as not to exceed a 20 m buffer distance to the edge. The verification process selected points in numerical order until the desired sampling density was met. Sampling density was set at a minimum of five sites per strata and 1 site per 32.38 ha. If a sampling point was inaccessible, a secondary and tertiary sampling protocol was randomly selected within this buffer zone. If the subsequent backup protocols did not satisfy the sampling density, additional randomized sampling locations were generated in the Stratifi application, and the process repeated until the sampling density goal was met. Inaccessible regions with steep slopes, high brush density, or other safety concerns were excluded from strata.

Soil Sampling
Soil sampling was conducted in 2019. In total, 1738 soil samples were collected at multiple depths from 519 identified sampling locations. At each sampling location, soils were sampled with a Giddings™ soil probe (7.62 cm, Giddings Machine Company, Windsor, CO, USA) to a total depth of 90 cm and vertically stratified by depth (15,30,45,60,75, and 90 cm) or until bedrock restricted collection. Samples were transported to the Noble Research Institute's soil laboratory for further processing and analysis. Soil samples were milled with an Agvise soil grinder (<2 mm screen) and dried in an oven at 42 • C for 48 h. Dried soil samples were scanned in the lab with the Our Sci reflectometer (https://our-sci.gitlab.io/manufacturing/ reflectometer-tutorials/ (accessed on 10 March 2022)). The device is a handheld reflectometer designed to measure the reflectance of a material sample at a select set of wavelengths: 370, 395, 420, 530, 605, 650, 730, 850, 880, and 940 nm. To measure the reflectance, a soil sample is prepared in a small glass petri dish or cuvette and clamped into position in front of a series of LEDs at the wavelengths mentioned earlier. These LEDs then sequentially flash onto the soil sample, and a set of photoreceptors measure the reflectance of the sample at each isolated wavelength. Later, the soil samples were sent to a commercial laboratory where the industry standard, the dry combustion method, was used to analyze soil organic carbon [31].

Statistical Analysis
Since all samples were geo-tagged in the field, we were able to extract relevant data for each sample point from a various remote sensing datasets and digital soil maps. We focused on collecting data types that would add potential predictive power to our models for estimating soil C content, such as soil taxonomy, Normalized Differential Vegetation Index (NDVI), and clay content (Table 1). For each point, we identified the most representative soil series as reported by the United States Department of Agriculture's SSURGO [32] and then extracted soil characterization data related to that series. Characterization data included representative estimates of soil organic matter content; soil chemical properties related to the weathering status of soils (pH and cation exchange capacity); estimates of inorganic carbon content (gypsum and CaCO 3 ); relative content of soil textural components (sand, silt, clay); data on soil color as scored on the Munsell color system. NDVI was calculated as the normalized difference between band 8 (NIR, 835.1 nm) and band 4 (red, 664.5 nm) of the Sentinel 2 Multi-spectral Instrument dataset from the European Space Agency. To calculate NDVI at each sampling point, we retrieved Level 1-C Sentinel 2 reflectance data for a bounding box containing the entire sampling area using the ee package [33] in Python to access the Google Earth Engine data catalog. Data were retrieved for all available dates from 1 January 2019 to 31 December 2019. Images for each date were then cloud-masked using the QA60 band from Sentinel 2 for the corresponding date to identify and remove all pixels obscured by clouds. Finally, a 'greenest pixel' composite image of the study area was created by selecting the highest NDVI value across the date range for each pixel. Specific NDVI values for each sample point were then extracted from this composite image by overlaying point coordinates and finding the corresponding NDVI value.
Topographic data were similarly retrieved for each point by accessing the Google Earth Engine data catalog with the ee package in Python. Using the same bounding box, we retrieved elevation data from the USGS National Elevation Dataset for the entire study area. Slope and aspect were then derived from elevation data using the Google Earth Engine ee.terrain.products function. Specific values of elevation, slope, and aspect were then extracted for each sample point by overlaying the coordinates on each final image.
We then developed three models to estimate soil carbon content by the following: (1) using reflectance data collected using the Our Sci reflectometer, herein referred to as "reflectance" models; (2) using data extracted from remote sensing and digital soil maps Soil Syst. 2022, 6, 38 6 of 16 (DSM), herein referred to as "remote" models; (3) reflectance data collected using the Our Sci reflectometer in combination with data extracted from remote sensing and digital soil maps, herein referred to as "full" models. Each of these modeling approaches were calibrated using soil carbon data content from dry combustion elemental analysis as a dependent variable. For each model, 80% of samples were randomly partitioned to create a training dataset for the model, while the remaining 20% were partitioned for testing model predictions. Predictive models were trained with 100 calibration/validation splits to bootstrap each modeling approach and to assess the distribution of possible model outcomes. Models were developed using a Bayesian Additive Regression Tree (BART) approach [34] using the "bartMachine" R package [35]. BART is a machine learning algorithm that employs a Bayesian "sum of trees" approach to generate a best fit predictive model ( Figure 2). We chose to use it as opposed to alternative machine learning or traditional statistical methods, as it is capable of dealing with high-dimensional data but also includes a regularization feature that reduces overfitting. In addition, it allows for estimation of posteriors, which allows us to better assess uncertainty in our predictions, and it also generates statistics on variable importance (predictor inclusion frequency), allowing us to assess the relative importance of different predictor variables in each model type.

Results
The soil organic carbon content of samples ranged from 0.03 to 6.03% and followed a lognormal distribution with a mean of 0.96%. Means and distributions of soil carbon content values were similar across sites, and soil carbon was stratified by depth such that surface soil samples had higher carbon content, and carbon content rapidly declined with On each iteration, we used the generated model to estimate soil carbon content on all samples in the testing partition and compared these estimates to their observed reference method, soil carbon content, as measured in the lab to estimate Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and coefficient of determination (R 2 ). Mean Absolute Error (MAE) is the average measure of errors between paired observations. Root Mean Square Error (RMSE) is a measure of how far a prediction is from a measured reference value. Coefficient of determination (R 2 ) is the proportion of the variation in the dependent variable that can be explained from the predictor variables. Estimating these error statistics across all 100 iterations allowed us to determine how the accuracy of each method changes as the training dataset changes.
In addition to partitioning the dataset into training and testing groups across all sites and soil types, we used the same approach on subsets of data divided by the USDA soil textural classification (i.e., sand, clay, etc.). This approach allowed us to better understand how fundamental differences in soil texture and mineralogy, which are likely to affect a soil sample's reflectance characteristics, might also affect soil carbon estimation.

Results
The soil organic carbon content of samples ranged from 0.03 to 6.03% and followed a lognormal distribution with a mean of 0.96%. Means and distributions of soil carbon content values were similar across sites, and soil carbon was stratified by depth such that surface soil samples had higher carbon content, and carbon content rapidly declined with depth. Most samples were categorized as being medium-to-moderately fine-textured.
When training and testing splits were made across the entire dataset, models with reflectance data only had the lowest accuracy and the greatest bias, followed by models with remote and DSM data only, and then full models combining both data sources ( Table 2). Student's t-tests indicated that these differences in accuracy and bias were all significant (Table 3). Models relying solely on reflectance data explained just over half of the total variability in %SOC (R 2 = 0.54), while models developed with remotely sensed data (R 2 = 0.71) and the combination of both data sources (R 2 = 0.75) explained a greater amount of variability in the dataset (Figure 3). The reduction in accuracy for models using reflectance data only was the greatest for samples with higher soil carbon content, which the model tended to underestimate. In contrast, remote and full models made more accurate predictions across the range of soil carbon content ( Figure 4). Error across depth had a similar pattern for all model types, but distinct patterns across the depth gradient emerged ( Figure 5). Estimates for 0-15 cm samples had the highest error, and below 15 cm, error rapidly declined ( Figure 5). Given that soil carbon content was generally highest in the surface layer, this observation is consistent with the above results. Similarly, when separate models were developed for each texture class, models for medium to coarse texture soils (silt, loam, and sand) generally had lower soil carbon content than estimated soil carbon content, with MAE of about 0.2, and MAE was similar between full and reflectance-only models ( Figure 6). In contrast, models on fine-textured soils (clay), which generally had higher soil carbon content, had lower accuracy, particularly in the reflectance data-only models ( Figure 6). similar pattern for all model types, but distinct patterns across the depth gradient e ( Figure 5).  Analysis of variable importance based on variable inclusion proportion indicated that when reflectance data alone were used, wavelengths in the mid-visible range had the most significant apparent effect on model accuracy (Figure 7). However, when remote sensing data and digital soil map data were introduced, those patterns changed, and "far-red" and near-infrared wavelengths had higher variable inclusion proportion scores relative to other wavelengths (Figure 7b). Furthermore, these analyses showed that several remotely sensed information and digital soil map layers added substantially to model accuracy and were often the most important variables. Soil Syst. 2022, 6, 38 9 of 16 100 training/testing splits, and error bars around points represent the standard deviation across 10 training/testing splits. Circles with color represent the samples for various depths as shown in th legend. Red line represents 1:1 fit. Dotted grey line represents the line of best fit from soils collected from nineteen participating ranches encompassing approximately 17,347 ha in the Southern Grea Plains ecoregion of the United States.      Figure 6). In contrast, models on fine-textured soils (clay), which gen higher soil carbon content, had lower accuracy, particularly in the reflectance models ( Figure 6). Analysis of variable importance based on variable inclusion proportion that when reflectance data alone were used, wavelengths in the mid-visible ran

Discussion
The use of visible and near infrared spectroscopy continues to be a focus area for research investigating methods to reduce laboratory costs, increase the precision of estimations, and reduce variability associated with estimating soil organic carbon concentrations. Cost-effective handheld reflectometers have been suggested as a tool to address these concerns. The Our Sci handheld reflectometer estimated soil organic carbon in this study to precision of approximately +/− 0.3% SOC; however, estimation accuracy was greatly reduced for those samples with carbon concentrations above 2.0%. Further, models relying solely on reflectance data explained just over half of the total variability in %SOC (R 2 = 0.54). This reduced accuracy at higher soil carbon concentrations could suggest that vis-NIR spectroscopy within the wavelength range studied (370-980 nm) is potentially best suited for environments with relatively low concentrations of soil organic carbon. It is also possible that higher SOC levels may be more accurately measured with higher wavelengths, as bands from 1100 to 2400 nm have proved particularly important for SOC calibration in past studies [36,37]. Alternatively, reduced accuracy at higher concentrations may be an artifact of those samples being less represented in the training data, given there were fewer samples in that range.
The reflectance only model, although accurate at lower soil carbon concentrations, was consistently less accurate than the remote model built using exclusively existing geospatial data products as predictors (R 2 = 0.71). The full model, combining the two data

Discussion
The use of visible and near infrared spectroscopy continues to be a focus area for research investigating methods to reduce laboratory costs, increase the precision of estimations, and reduce variability associated with estimating soil organic carbon concentrations. Cost-effective handheld reflectometers have been suggested as a tool to address these concerns. The Our Sci handheld reflectometer estimated soil organic carbon in this study to precision of approximately +/− 0.3% SOC; however, estimation accuracy was greatly reduced for those samples with carbon concentrations above 2.0%. Further, models relying solely on reflectance data explained just over half of the total variability in %SOC (R 2 = 0.54). This reduced accuracy at higher soil carbon concentrations could suggest that vis-NIR spectroscopy within the wavelength range studied (370-980 nm) is potentially best suited for environments with relatively low concentrations of soil organic carbon. It is also possible that higher SOC levels may be more accurately measured with higher wavelengths, as bands from 1100 to 2400 nm have proved particularly important for SOC calibration in past studies [36,37]. Alternatively, reduced accuracy at higher concentrations may be an artifact of those samples being less represented in the training data, given there were fewer samples in that range.
The reflectance only model, although accurate at lower soil carbon concentrations, was consistently less accurate than the remote model built using exclusively existing geospatial data products as predictors (R 2 = 0.71). The full model, combining the two data sources, provided significant, but only modest accuracy improvements (R 2 = 0.75; p < 0.001). This small increase in model accuracy suggests that the Our Sci reflectometer has limited capacity to improve the accuracy of digital soil mapping methods in the studied region.
The relatively poor performance of vis-NIR spectroscopy reported by this study is consistent with wide variation in accuracy reported in other studies across different systems. Soils have been effectively characterized globally based on vis-NIR spectroscopy analysis, but only using far more robust laboratory grade spectroscopy [25]. While some studies have reported specific instances of high-performance carbon estimation via vis-NIR spectroscopy [38,39] researchers increasingly question the method's ability to fully replace laboratory analysis [40]. Subsequently, the high performance of vis-NIR-derived models can be misleading, as it may be a result of overfitting or poor validation techniques [41].
We observed higher accuracy of vis-NIR spectroscopy at depth and in lower carbon sites, however, this pattern has not been observed consistently across other regions, climates, and management systems. While soil depth has a considerable impact on soil organic carbon stocks, data regarding the vertical distribution of the SOC stocks in relation to vegetation and land use are rare [42][43][44][45][46][47]. In contrast, in a temperate, forested ecosystem, Gholizadeh et al. [48] found that in a very high carbon density landscape (mean soil C = 23.54%), in situ spectroscopy performed less well with increasing soil depth. There, SOC prediction accuracy was higher in shallower organic layers with higher concentrations of organic matter. Other studies have found in situ application of vis-NIR to produce models with high performance fits in high-carbon environments, specifically in a high-elevation pastoral landscape (R 2 = 0.77) [49] and in a tropical volcanic soil (R 2 = 0.91) [50]. Allo et al. [50] suggested that better performance in high-carbon environments might be the result of greater variation leading to greater detectability. However, relatively higher performance on low-carbon samples in our study may be because they were more represented in model training datasets.
The availability of accurate covariate data could also play a significant role in model accuracy of digital soil mapping options. In a recent study conducted in Sub-Saharan Africa, Ewing et al. [51] found the Our Sci reflectometer provided sufficient accuracy in models developed that combined reflectance data with covariate data from the African Soil Information Service (AfSIS) database (R 2 = 0.69). However, a model developed from AfSIS covariate data alone provided the poorest agreement with reference laboratory samples (R 2 = 0.04). Our findings are consistent, in that a model developed with reflectance data combined with covariate data from remotely sensed sources provided the greatest accuracy and the least error. However, inconsistencies arise when evaluating the accuracy of the covariate data alone. In our study the full model (R 2 = 0.75), although significant, was not substantially better than a model developed with only remotely sensed covariates (R 2 = 0.71). The addition of NDVI greenness estimations to our remote models in semi-arid environments vs arid environments with less herbaceous biomass production may explain some of the variation. Thus, in semi-arid grazing land environments in the United States, where models developed completely from remotely sensed covariates perform similarly to models developed with the addition of the reflectance data, the modest increase in accuracy does not justify the time, labor, and cost of field sampling if change detection over time is not a concern.
Several recent studies suggest that models built on wider spectra can perform better than those based on solely vis-NIR. Researchers have tested MIR measurements as an alternative to vis-NIR, reporting substantial improvements in estimation accuracy. In a review of published literature, Soriano-Disla et al., [52] investigated the performance of visible, near-, and mid-infrared spectroscopy as an appropriate tool to estimate soil properties including soil carbon concentration. Their review provided evidence to suggest that mid-infrared spectroscopy offered more accurate predictions than Vis-NIR. Further, Riedel et al. [53] investigated the prediction of soil parameters of Vis-NIR and MIR spectroscopy across soil types as part of the Saxon Permanent Soil Monitoring Program in Germany. Ultimately their findings suggested that mid-infrared spectra provided more accurate prediction capabilities for the majority of soil parameters investigated, however, Vis-NIR-based calibration models performed well with coefficient of determination estimates greater than 0.67 for total organic carbon, further suggesting that approaches utilizing the full range of Vis-Soil Syst. 2022, 6, 38 13 of 16 NIR wavelengths (up to 2500 nm) as opposed to the spectral range utilized in this study (370-940 nm) could increase model performance and accuracy.
Digital soil mapping methods that combine local, proximal sensing with data from remote sensing sources, such as those tested in this paper, have been proposed as a means to rapidly map soil properties at reduced cost [54]. Since digital soil mapping often relies on models using environmental covariates that are fixed (e.g., soil texture, topographic features), they have limited use in tracking changes in soil carbon over time. Combining such methods with local, proximal sensing data could provide improved accuracy for change detection at minimal additional cost. However, our results indicate that the OurSci reflectometer may not substantially improve accuracy of such methods in the studied region.
Furthermore, while a suggested use of low-cost, handheld spectroscopy devices for rapid in-field carbon assessments continue to increase in demand, there are additional potential challenges to consider that are related to the technology and the environmental context of interest. The accuracy of infield assessments is often challenged by uncertain and highly variable field conditions. Many biotic and abiotic factors can affect soil reflectance and, ultimately, the spectral signature. Factors that could potentially impact reflectance include quartz content, shadowing, soil particle size, plant residues, and even soil moisture. Sample moisture content can decrease spectroscopy-based model fits [55][56][57][58][59]. Recent work has focused on external parameter orthogonalization [60]. However, the impacts of sample moisture on sample spectra may be non-linear. Cao et al. [60] observed that moisture had a greater impact on the spectra of samples with lower SOC. These vary with substrate physical structure, biochemistry, and temperature. Gholizadeh et al. [48] suggested that a finer scale in situ spectroscopic models, where there is less soil textural and moisture content variation, may perform better than broader scale models [61,62]. The promise of low-cost, handheld spectroscopy tools is to be able to measure SOC concentrations in the field and to reduce laboratory analysis costs. However, in an effort to reduce the environmental variability and address these factors that may affect reflectance, samples in this study were processed and dried in the lab to more aptly measure the reflectometer's direct ability to estimate SOC concentration. Given the extra steps taken to address variability, our results continue to question the ability of the Our Sci reflectometer, measuring a wavelength range of (370-980 nm) at discrete intervals to replace dry combustion laboratory analysis in semi-arid grazing lands.
Ultimately, this study has potentially described the upper limit of accuracy with the Our Sci reflectometer for measuring SOC in semi-arid grazing land soils within the wavelengths described. An attempt was made to reduce as much variability as reasonably possible with the experimental design, then to further reduce sample variability by scanning the samples in the lab after they had been ground and dried. Thus, field measured results would have likely been less accurate. Approaches that utilize the full spectrum of the near infrared and mid infrared spectral range may provide greater accuracies, albeit the instrument costs would greatly increase. Inevitably, the tradeoff of accuracy and cost remain.

Conclusions
As the desire to better understand soils, their dynamic properties, and as ecosystem service market opportunities emerge, there will be an ever-increasing need to define methods and technologies that reduce labor and data acquisition costs. These techniques and technologies will need to be scalable, repeatable, and address temporal changes in dynamic soil properties. Our data suggest that low-cost vis-NIR spectroscopy (370-940 nm) does not add substantially better accuracy to remote/DSM-developed models in the studied region, and so may have limited utility in the estimation of stocks and monitoring of change in carbon over time. However, further refinement is warranted, particularly the testing of similarly low-cost MIR tools. Additional research is needed to investigate advanced remote sensing and other sensor technologies to mitigate the time and cost constraints of standard laboratory testing and high throughput.