1. Introduction
Soil organic carbon (SOC) constitutes the principal fraction of soil organic matter (SOM) and underpins critical soil functions such as nutrient cycling, water retention, and crop productivity, while also sustaining biodiversity and ecosystem resilience [
1]. As a dynamic reservoir within the global carbon cycle, the accumulation of SOC has the capacity to sequester atmospheric carbon dioxide (CO
2), thereby contributing to climate change mitigation [
2,
3]. In light of this potential, the European Commission’s Carbon-Farming Initiative under the European Green Deal now offers incentives to land managers for practices that increase SOC stocks, provided that their field measurements are both reliable and cost-effective for robust certification [
4,
5,
6,
7].
Conventionally, the quantification of SOC is performed through the utilization of laboratory-based approaches, including dry-combustion elemental analysis and mid-infrared (MIR) and near-infrared spectroscopy (NIRS). These methods are known for their high accuracy (R
2 ≥ 0.98), and once samples are dried, homogenized, and ground, they enable non-destructive, high-throughput SOC analysis—e.g., MIR spectroscopy can process samples in ≤1 min [
8,
9,
10]. However, they require extensive sample preparation, homogenization, and calibration against large reference datasets. Laboratory MIR/NIRS protocols have been developed to minimize spectral interference from moisture, texture, and mineralogy. This is achieved through the stringent control of sample moisture content and particle size [
11,
12,
13]. Despite the fact that these techniques remain the gold standard, their laborious workflows and per-sample cost of approximately EUR 44 limit their feasibility for frequent, large-scale on-farm monitoring [
14].
In order to address these limitations, a considerable number of studies have explored the potential of portable in situ sensing platforms that integrate optical, electrochemical, and environmental measurements. Early on-the-go visible/NIR systems have been shown to possess both potential and limitations with regard to the mapping of soil clay and SOC [
15,
16]. Subsequent reviews have documented the progression from rudimentary sensors to sophisticated benchtop and mobile instruments enhanced by machine learning algorithms [
17,
18]. Field evaluations under disturbance-reduced protocols have reported improved estimates of SOC [
19], and multi-sensor probes, coupled with advanced chemometrics, have yielded robust predictions of soil profile properties [
20,
21]. Recent investigations into miniaturized spectrometers have confirmed their potential for rapid soil property assessment [
22], while the application of unsupervised learning to regional Vis-NIR spectral libraries has further enhanced the prediction of organic carbon [
23]. Innovations such as moisture- and salinity-correction algorithms [
24,
25,
26] and rapid in situ CO
2-sensor methods [
27] continue to expand the field.
The multi-sensor FarmLab device from Stenon integrates visible/NIR reflectance, electrical impedance spectroscopy (EIS), and environmental sensors—including soil moisture, temperature, and volatile organic compounds (VOCs)—within a handheld spade probe for measurements in the upper 15 cm of soil [
28,
29]. Moreover, FarmLab offers dramatic cost savings—approximately EUR 3–4 per measurement versus roughly EUR 44 per laboratory sample—enabling much higher sampling densities at a fraction of the cost. Despite these economic advantages, independent assessments have indicated that its accuracy and precision remain lower than laboratory standards. Residual biases have been shown to be influenced by soil pH and texture [
30,
31,
32]. A recent comparison to another Vis-NIR multi-sensor platform has further highlighted its current limitations [
33].
Although the manufacturer reports deployments of the FarmLab device in Germany, Brazil, Kazakhstan, Kyrgyzstan, and California—including over 450 calibration sites in Germany and more than 6000 calibration measurements in Brazil [
34,
35]—no independent field evaluation has yet assessed the performance of the FarmLab device under temperate European arable conditions. In particular, there is a clear gap in the literature for research comparing its SOC estimates to established laboratory methods—acid-treated total organic carbon (TOC-acid), temperature-differentiated TOC (SoliTOC), and total carbon analysis—and examining how its integrated moisture and pH sensors mitigate field-sensor artifacts. Accordingly, the present study aims to (1) quantify the accuracy and precision of FarmLab SOC measurements against these laboratory standards, (2) evaluate the effectiveness of its onboard moisture and pH corrections, (3) determine its suitability for carbon-farming applications.
3. Results
Standard TOC ranged from 0.87% to 1.76% (1.17% ± 0.02 SE), and soil pH from 5.7 to 7.6 (6.57 ± 0.05 SE). The skewness and kurtosis were 1.13 and 1.73 for Standard-TOC and 0.20 and −1.07 for pH, respectively. The distribution and spread of these and other variables are shown in
Figure 3.
3.1. Descriptive Statistics of SOC Methods
The descriptive statistics for all eight SOC determination methods are summarized in
Table 2. Across the three uncorrected FarmLab algorithms (In-field-TOC-1–3), SOC was on average overestimated by +0.24% relative to Standard-TOC (mean bias +0.20–0.27%). Incorporating pH correction (In-field-TOC-4) cut that bias roughly in half (to +0.11%) and reduced the pooled standard deviation from 0.27% to 0.23%. By comparison, the two dry-combustion labs (TC-Gi, TC-Goe) differed from Standard-TOC by only +0.04–0.06% (SD ≈ 0.20%). These results confirm that In-field-TOC-4 is the most unbiased and precise in-field algorithm under our conditions.
3.2. Correlation of SOC Error with Soil Properties
We assessed whether the subplotwise SOC measurement error of the baseline FarmLab algorithm (In-field-TOC-1), defined as In-field-TOC-1 SOC minus Standard-TOC (SOC_error), was influenced by soil pH, carbonate content (TIC900), or volumetric soil moisture, all recorded in our dataset. Pearson’s correlation coefficients (
n = 100) are presented in
Table 3.
The negative correlation between SOC_error and pH (r = −0.39, p < 0.01) indicates that lower-pH soils tend to produce larger positive errors (overestimation by FarmLab), whereas higher-pH soils yield smaller biases. In contrast, no significant relationship was found between SOC_error and soil moisture (r = −0.14, p > 0.05), suggesting that FarmLab’s integrated moisture sensor effectively compensated for moisture-induced spectral artifacts.
A visual comparison of predicted and measured SOC values across the pH gradient (
Figure 4) reveals distinct patterns related to soil acidity. The correlation between Standard-TOC and In-field-TOC-1 was r = 0.13, while the pH-corrected In-field-TOC-4 showed a stronger association, with r = 0.39.
In the uncorrected model (
Figure 4A), SOC tends to be overestimated at low pH and low SOC levels (below 1.2%), consistent with the negative correlation reported in
Table 3. After applying the pH correction (
Figure 4B), these deviations are reduced, indicating improved accuracy in more acidic soils.
At higher SOC concentrations (>1.5%), overestimation persists in samples with higher pHs. Overall, the highest agreement between predicted and measured SOC is observed at moderate to low pH values (approximately 6.0–6.8) and SOC concentrations between 1 and 1.5%.
3.3. Method Agreement Evaluation Using Regression, Error Metrics, and Bland–Altman Analysis
To assess structural agreement and systematic bias between each in-field algorithm and the Standard-TOC reference, we first conducted Deming regression analyses (
Table 4). The intercept and slope of a Deming analysis fit quantify location and scale agreement: an ideal method lies exactly on the identity line (intercept = 0, slope = 1). In our comparisons, In-field-TOC-4 showed the closest proximity to these ideal parameters (intercept = 0.05 ± 0.06%, slope = 0.97 ± 0.04), followed by In-field-TOC-1 (intercept = 0.18 ± 0.07%, slope = 1.10 ± 0.05). The uncorrected algorithms (TOC-2, TOC-3) exhibited larger intercepts and slopes further from unity, indicating both constant and proportional bias. By contrast, the laboratory methods TC-Gi and TC-Goe yielded intercepts and slopes statistically indistinguishable from (0, 1), confirming strong equivalence between the two dry-combustion laboratories.
In addition to structural comparison, absolute and squared deviations from the Standard-TOC reference were evaluated by computing the MAE, RMSE, and NSE coefficient (
Table 5). In-field-TOC-4 yielded lower MAE and RMSE values than In-field-TOC-1, indicating reduced deviation magnitudes. The NSE values for all the in-field methods and one lab replicate (TC-Gi) were negative, reflecting systematic deviations from the Standard-TOC reference across the full range of observations. Only TC-Goe achieved an NSE near zero, consistent with the minimal bias observed in the Bland–Altman analysis.
Next, Bland–Altman plots quantify the mean bias and 95% limits of agreement (LoAs) between methods (
Figure 5;
Table 6).
In-field-TOC-1 exhibited a mean positive bias of +0.20% SOC and a wide LoA (−0.35 to +0.75%), whereas In-field-TOC-4 reduced both the bias (+0.11%) and LoA (−0.27 to +0.49%), reflecting improved equivalence. Laboratory replicates TC-Gi and TC-Goe had negligible bias (+0.05%) and narrow LoAs (−0.12 to +0.22%), underscoring their mutual consistency.
Together, these pairwise comparisons demonstrate that the pH-adjusted algorithm (In-field-TOC-4) achieves the best overall alignment with laboratory standards, substantially reducing both constant and proportional errors, while uncorrected in-field methods retain significant biases.
3.4. Inferential Comparison of SOC Methods
The equivalence between each SOC method and the Standard-TOC reference was evaluated using the all.structural.tests function in the eirasBA package with 30,000 bootstrap resamples [
43]. Three criteria were evaluated at α = 0.05: structural mean equality (accuracy), variance equality (precision), and bisector agreement (concordance). The results (
Table 7) reveal that no method satisfied all three tests simultaneously.
Both laboratory comparisons (TC-Gi vs. TC-Goe and SoliTOC vs. TOC-acid) achieved precision (p > 0.05) and concordance (intercept/slope p > 0.05) but failed the accuracy test (p < 0.01). The baseline FarmLab algorithm (In-field-TOC-1) similarly passed precision and concordance yet exhibited significant bias (accuracy p < 0.0001). The two uncorrected in-field updates (In-field-TOC-2/3) managed to pass only concordance (p > 0.05) but failed in both accuracy and precision (p < 0.0001). Although the pH-corrected model (In-field-TOC-4) met the accuracy criterion (p = 0.3250) and showed concordance (p > 0.05), it failed the precision test (p = 0.0087), indicating unequal variance between FarmLab and the reference measurements.
5. Conclusions
Our first independent field validation of the FarmLab multi-sensor probe under temperate European arable conditions shows that its default model overestimates SOC by +0.20–0.27% (SD 0.25–0.28%), while a simple pH correction halves that bias (+0.11%, SD 0.23%) and moisture effects are effectively neutralized. However, formal equivalence testing confirms that even the pH-corrected algorithm cannot yet match laboratory precision and concordance, and predictive reliability is limited to SOC concentrations between ~1% and ~1.5% under low to moderate pH. Outside this range, the model tends to overpredict lower and underpredict higher SOC values.
Economically, FarmLab’s per-sample cost of ~EUR 3–4 (versus ~EUR 44 for GPS-referenced lab analysis) enables high-density mapping essential for carbon farming MRV. We therefore advocate a hybrid approach: the use of routine, pH-corrected in-field measurements to capture spatial and temporal trends, anchored by periodic laboratory benchmarks to ensure certification-grade accuracy.
Looking forward, improving FarmLab’s performance will depend on expanding calibration across diverse soils, integrating additional sensing technologies to capture mineralogical, texture, and bulk density data, and adopting adaptive, data-driven calibration algorithms—steps that together can elevate low-cost in-field sensing to near-laboratory standards and support scalable, cost-effective soil carbon monitoring.