1. Introduction
Maize harvested as whole-plant corn (WPC) is an essential and widely utilized feed in both dairy and beef production systems [
1]. When preserved through ensiling, it provides a major source of energy and digestible nutrients that support ruminant productivity. Its nutritional value is largely defined by constituents such as dry matter, crude protein, starch, and fiber. These components can be quantified through conventional wet chemistry analyses or more rapidly estimated using sensing technologies, most commonly near-infrared (NIR) spectroscopy [
2,
3]. Conventional wet chemistry methods require specialized equipment, are costly, and are poorly suited to the rapid turnaround needed for strategic storage and ensiling management, as analyses of selected samples may not capture material variability or provide results in a timely manner [
4]. In contrast, on-site NIR spectroscopy (OS-NIRS) can be deployed at the farm-level to provide a rapid, non-destructive, and cost-effective approach for estimating WPC constituents. Its ability to process large numbers of samples in a short time enables more timely ration adjustments and improved precision in livestock nutrition management [
5].
To estimate WPC constituents using NIRS techniques, the material under study is illuminated by a light source of the NIR constituent sensor. In a reflection measurement configuration, the sensor captures light reflected from the sample and determines the optical reflection spectrum over a specific spectral region. Typically, a section in the near-infrared spectral region, between 900 nm and 1700 nm, is covered. As molecules of many relevant constituents, such as moisture, protein, starch, and fiber, have material specific absorption characteristics within this region, the spectrum carries information about the abundance of these constituents within the material presented to the sensor. The content of each material is finally estimated from the spectrum with the help of material- and constituent-specific calibration functions [
3,
6].
Near-infrared instruments used for WPC analysis are broadly categorized as either lab-scale/benchtop or transportable/on-site systems. Benchtop instruments are laboratory-grade analyzers that typically provide a high level of spectral resolution, stability, and calibration robustness, but they require controlled environments, trained operators, and substantial capital investment [
7]. Laboratory-based NIR spectroscopy is typically performed on dried and finely ground WPC, producing a homogeneous sample that minimizes variation due to particle size and moisture, thereby improving spectral consistency and calibration accuracy.
On-site systems are more compact and cost-effective, allowing widespread deployment. These instruments enable more rapid and frequent analyses than laboratory-based systems, but potentially with some compromise in analytical precision and spectral range [
8,
9]. On-site instruments typically analyze fresh, undried samples, which are inherently more heterogeneous and subject to greater variability in moisture and particle distribution [
5]. While this can reduce analytical precision relative to laboratory methods, it eliminates the need for drying and grinding, enabling a far greater number of samples to be scanned rapidly and with minimal preparation [
10].
In commercial forage laboratories, stationary NIRS analyses typically begin by splitting the delivered sample, with a portion used for moisture determination and the remainder oven-dried at moderate temperatures (<60 °C) to preserve chemical integrity [
5,
11,
12,
13]. The dried material is then ground to a uniform particle size (~1 mm) to produce a homogeneous sample suitable for NIRS scanning. Spectra are compared against calibrations developed from reference wet chemistry analyses, enabling accurate prediction of key nutritional constituents [
5,
12,
14,
15].
Although numerous studies have compared the performance of portable on-site NIRS constituent sensors with NIRS measurements from certified laboratories, there has been limited systematic evaluation of measurement precision for both methods [
16,
17,
18]. Precision refers to the closeness of agreement among repeated measurements of the same constituent on a given sample. When replicated samples are analyzed across multiple laboratories, two components of precision are typically considered: repeatability and reproducibility. Repeatability refers to the variability observed when replicate measurements are made under the same conditions within a single laboratory (within-lab error). Reproducibility refers to the variability observed when measurements are made on the same replicated sample under differing conditions across laboratories; it therefore includes both repeatability error and the additional inter-laboratory error [
19,
20]. Ensuring reproducibility among OS-NIRS sensors is critical in precision agriculture, as consistent performance across devices is needed for reliable, actionable compositional estimates in field-level decision making.
Chopped WPC exhibits considerable heterogeneity, both in anatomical composition and particle-size distribution, which poses significant challenges to obtaining homogeneous samples for analysis [
21]. The sampling process typically begins at harvest or during feed-out from the silo, wherein numerous small subsamples are collected, pooled, and thoroughly mixed. These composite samples are then systematically partitioned into subsamples of appropriate size for subsequent analysis. Unfortunately, even under stringent sampling protocols, variability among subsample replicates can introduce error, thereby affecting the comparability of measurements within and between analytical laboratories.
The objectives of this study were to evaluate the intra- and inter-laboratory precision of nutritional composition analyses of WPC, using both commercial analytical forage laboratories and on-site NIRS sensors. Toward this goal, samples were collected and analyzed under typical user conditions to represent practical, on-site application of the technology by producers and nutritionists. Specifically, the study aimed to: (a) quantify the measurement variation in nutritional parameters of replicate WPC samples analyzed across different commercial laboratories and on-site NIRS sensors, and (b) assess the agreement between laboratory analyses and on-site NIRS sensor measurements. We hypothesized that (a) on-site NIR sensors would demonstrate within-laboratory repeatability comparable to that of commercial analytical laboratories, and (b) incorporating on-site NIR sensors would yield intra-laboratory reproducibility similar to that of conventional laboratory analyses. This expectation was based on OS-NIRS systems integrating a larger effective sample volume through a substantially greater number of measurements than is possible with discrete laboratory analyses of limited sample mass.
2. Materials and Methods
2.1. Sample Collection and Analysis
To capture a broad range of compositional and regional variation in WPC, samples were collected and analyzed on-site and in analytical laboratories in the United States and Germany. The goal was to assess the variability in WPC composition measurements that a typical forage producer or ruminant nutritionist might encounter. Three separate experiments were conducted over a three year period at different locations. In each experiment, three common constituents of WPC were analyzed: protein, starch, and neutral detergent fiber (NDF). Moisture content (% w.b.) was measured only in the second and third experiments.
For Experiments 1 (2021) and 3 (2024), eighteen WPC samples were collected on farms near Arlington WI, USA (43.3380°; −89.3804°). These samples were collected from four diverse farm locations and fields. In 2021, corn at Arlington experienced a warmer-than-normal growing season combined with substantial moisture deficits, particularly from mid-season onward, which likely increased drought stress during grain fill and favored higher fiber concentration and reduced starch deposition in corn silage. In contrast, 2024 featured excess early-season rainfall with generally adequate heat accumulation, supporting strong vegetative growth and kernel set, followed by a drier late summer that likely promoted good starch accumulation and harvest dry-down, conditions generally favorable for higher silage energy density.
For Experiment 2 (2022), six WPC samples were collected on farms near Neuhemsbach, DE (49.5360°, 7.9038°). Drought conditions negatively impact WPC growth at this location [
22]. Samples were collected and analyzed under typical user conditions to represent practical on-site application of the technology. Harvester parameters, including length-of-cut and crop processor roll clearance, were maintained at values consistent with prevailing regional practices.
In all cases, WPC was collected from multiple fields to capture inherent variability. Material was sampled from the transport container after unloading, but prior to deposition in the bunker silo. The collected material was progressively halved several times to create smaller, manageable subsamples while maintaining representative composition. Samples analyzed with the on-site NIRS sensors were immediately brought to the sensors and scanned as described below. Samples designated for analytical laboratories were sealed in plastic bags, and within several hours were placed in a cooler set to approximately 5 °C, and shipped to the laboratories to ensure arrival within one to two days of collection. The number of samples and replicates is shown in
Table 1.
In this research all samples were analyzed using near-infrared reflectance spectroscopy analytical equipment. The commercial laboratories analyzed the WPC samples using benchtop scanning monochromator NIR instruments similar to the widely used FOSS NIRSystem 6500 (1100–2500 nm, 2 nm step, FOSS, Hillerod, Denmark) [
23]. The on-site NIRS sensors were the HarvestLab™ 3000 (950–1650 nm, 2 nm step, Deere & Co., Moline, IL, USA). Commercial benchtop instruments are denoted as analytical laboratories (AL), and mobile instruments as on-site NIRS (OS-NIRS) sensors. Each OS-NIRS unit was treated as an independent laboratory for intra- and inter-laboratory analyses; thus, the terms laboratories or labs refer collectively to AL and OS-NIRS sensors.
The OS-NIRS instruments consisted of a sensor body with the diode array spectrometer and a sampling unit with a rotating borosilicate glass bottom dish located above a halogen light source. The system operated with internal black and white references. Care was taken to ensure that the sampling dish was clean and dry prior to loading with a WPC sample. The measurement was repeated three times per replicate with mixing of the subsample in between. Mean values of moisture content (% w.b.) and crude protein, starch, and neutral detergent fiber (NDF) as concentrations of DM were recorded. The mean value of the three measurements was taken as the representative value for each replicate. Moisture content and constituent estimates were determined using the WPC calibration models published by John Deere in the year the studies were conducted.
In Experiment 3, WPC moisture content was measured by residual weight after drying subsamples according to NFTA procedures 2.1.1, 2.1.2, or 2.1.3 [
13]. Constituent analysis in Experiments 1 and 3 was conducted using dried and ground samples (<60 °C, 1 mm), and analyzed for protein, starch, and NDF using NIRS Forage and Feed Consortium protocols [
24]. In Experiment 2, WPC moisture was determined according to VDLUFA III, 3.1. methods. Constituent analysis was performed using dried and ground samples according to VDLUFA III, 31.2–31.3 methods.
2.2. Statistical Analysis
The statistical calculations follow the approach described in ISO standard 5725-2 [
25]. The analysis result of a physical material sample y
SPL is described by:
Here is the mean over all analysis results of a constituent in one experiment, is the random deviation from the mean of an individual sample in the experiment, is the laboratory bias and denotes a random fluctuation of the analysis of the samples within the laboratory.
In the following statistical analysis, the standard deviations are estimated for each experiment. These describe the natural variability of constituent concentrations within an experiment, the systematic variance in laboratory-to-laboratory analysis results and the inter-laboratory variance of analysis results when analyzing identically prepared samples, respectively.
Where appropriate, data were analyzed using three approaches: AL data only, AL combined with OS-NIRS data, and OS-NIRS sensor data alone. In the statistical calculations described below, the subscript i denotes the sample number, j denotes the replicate number for a given sample, and k denotes the laboratory identifier. The evaluations were conducted separately for each of the three experiments.
2.2.1. Assessment of Sample Variability Within an Experiment
The variability of the mean constituent content within the samples of each experiment, SD
s, was calculated by:
where n denotes the number of samples and
is the mean across all laboratory mean values
for sample
s, and
m is the mean across all laboratory mean values of the experiment, and CF is a correction factor to adjust for the small number of laboratories. A small sample variability suggests physically similar samples were collected within the experiment, while a large sample variability indicates the samples collected were diverse and cover a relevant range of the variability of analysis results of samples to be expected in the field.
When the degrees of freedom were low (i.e., small number of labs, samples, or replicates), the sample standard deviation would be a biased estimator that systematically underestimates the population standard deviation. This bias arises due to the limited information available to reliably estimate variability, particularly when the sample variance is calculated using a small number of independent observations [
26]. To adjust for this, a correction factor (CF) can be applied to the standard deviation calculation, scaling it upward to provide a more accurate estimate of the population value. In this study, a CF was applied when the degrees of freedom were fewer than 10, with the CF values derived from those reported in [
27].
To assess the extent of constituent variability captured within each experiment, we calculated the relative range (RR
E), defined as the experiment-specific range of a constituent (Max
E–Min
E) expressed as a percentage of the global range of that constituent observed across all experiments (Max
G–Min
G):
High relative range RRE indicated that individual experiments encompassed most of the overall constituent variability.
2.2.2. Within-Laboratory Repeatability
The pooled standard deviation (SD
P) of the replicate measurements within each sample was calculated to quantify the inherent variability and repeatability associated with measuring replicate subsamples within an individual laboratory:
where
n is the number of samples analyzed by a given laboratory;
d is the number of replicate measurements per sample;
is the measurement of the
jth replicate of sample
i in laboratory
k; and
ik is the mean of the
d replicates for sample
i in laboratory
k.
To evaluate whether within-laboratory repeatability (
SDP) differed among laboratories and sensors, Cochran’s C test was applied as recommended in ISO 5725-2. When the Cochran C test indicated significant heterogeneity among variances (
α = 0.05), the largest standard deviation was sequentially removed and the test repeated until a subset of variances among the remaining groups satisfied the assumption of homogeneity. For subsets containing only two remaining groups, an F-test was used to assess variance compatibility. Groups whose standard deviations were found to be statistically compatible were considered to have equivalent repeatability and were assigned the same grouping designation. This procedure was repeated independently for all subsets to identify groups of laboratories or sensors with comparable within-laboratory variability. The F-Test was conducted as follows:
where SD
P,1 and SD
P,2 are the pooled within-laboratory standard deviations (Equation (5)) for the two groups being compared. The calculated F value was compared to the critical value from the F-distribution at α = 0.05 using the appropriate numerator and denominator degrees of freedom. If
F >
Fcritical, repeatability (variance) was concluded to differ significantly between the two groups.
The pooled standard deviation (SD
P) provided an overall measure of repeatability within each lab by combining information from all samples and their replicates. However, this pooled value could obscure differences in variability among individual samples. To quantify the uncertainty associated with the pooled SD
P, we estimated a standard error (SE
P) based on the variability in the replicates across samples. Specifically, for each sample within a lab, we first calculated the standard deviation of the replicate measurements. We then computed the standard deviation of the sample-level standard deviations (SD
SLP) and divided it by the square root of the number of samples tested (n
s):
To quantify the uncertainty inherent in the measurement process when samples were analyzed within individual labs, a pooled standard deviation (SD
CP) was calculated by aggregating the residual variation across all replicates, samples, and laboratories:
where
value of the
jth replicate for sample
i in laboratory
k; and
ik mean of replicates for sample
i in lab
k.
We applied the same pairwise F-test approach to the pooled within-laboratory standard deviation across all laboratories (SDCP) to determine whether including the OS-NIRS sensor data significantly affected the repeatability estimate. In this case, the variances corresponding to SDCP was calculated with and without OS-NIRS data were compared, using the same variance ratio method and critical values from the F-distribution described above.
A calculation was performed to quantify the precision of the pooled within-laboratory standard deviation (SP
CP), enabling comparison of repeatability estimates across experiments. For each experiment, we calculated the standard error of the pooled within-laboratory standard deviation (SE
CP) to quantify the uncertainty associated with SD
CP. The standard error was computed as:
where SD
CP is the pooled within-laboratory standard deviation, and
df is the degrees of freedom used to calculate SD
CP. The degrees of freedom were determined as:
where
l is the number of laboratories,
n is the number of samples per laboratory, and d
ik is the number of replicate measurements for sample
i in laboratory
k.
2.2.3. Inter-Laboratory Reproducibility
To characterize variability among laboratories, several metrics were calculated. The first (SD
L) quantified inter-laboratory variability on a sample-by-sample basis and reflected the reproducibility of individual sample results across laboratories. It was based on the deviation of each laboratory’s mean measurement (averaged across replicates) from the overall mean for that sample across all laboratories. This approach isolated the true inter-lab variability by removing within-lab variation from the calculation.
where
n is the number of samples;
l is the number of laboratories;
ik is the average of all replicates for sample
i measured by laboratory
k;
i is the average of all replicates across all laboratories for sample
i. A two-sided 95% confidence interval was obtained by scaling SD
L by 1.96, providing a range that reflects expected inter-laboratory variation in both directions.
The second metric, SD
G quantified the extent of systematic differences between laboratories. Specifically, it measured how much each laboratory’s overall mean (averaged across all samples and replicates) deviated from the global mean across all data. Unlike S
L, which varies by sample, this metric provides a single summary measure of lab-to-lab consistency, reflecting any persistent bias or offset in measurements attributable to individual laboratories:
where
l is the number of laboratories;
is the grand mean across all laboratories, samples and replicates; and
is the average for laboratory k across all samples and replicates. A two-sided 95% confidence interval was obtained by scaling SD
G by 1.96, providing a range that reflects expected inter-laboratory variation in both directions.
To quantify the overall variability expected when samples are measured across laboratories, the inter-laboratory reproducibility standard deviation (S
R) was calculated following the ISO 5725 standard [
25]. This method separates total variation into within-lab repeatability
and between-lab variability
. Repeatability was first estimated for each sample–lab pair (
, Equation (13)), then averaged across all samples and labs to obtain
(Equation (14)). The total reproducibility (Equation (15)) was computed as the square root of the sum of
and
, from Equation (11). Unlike the approaches used in Equations (11) and (12), which summarize inter-lab differences directly from sample or global means, the ISO method formally incorporates both random and systematic variation to provide a comprehensive estimate of reproducibility.
4. Discussion
The objective of this study was to evaluate intra- and inter-laboratory precision of whole-plant corn (WPC) compositional analyses, with particular emphasis on the inclusion of on-site NIRS (OS-NIRS) sensors alongside commercial analytical laboratories (ALs). Across the three experiments, our results demonstrated that OS-NIRS measurements of moisture, starch, and NDF generally agreed with AL values, though experiment-specific biases were observed, especially under conditions of low starch and high NDF (Experiment 2). This finding aligns with prior reports that calibration robustness is a key determinant of sensor performance, particularly when sample composition deviates from the calibration range [
8,
9,
28]. Nonetheless, the ability of OS-NIRS to generate results comparable to laboratory instruments supports their potential role as complementary tools for forage quality assessment [
16,
17].
Physical differences in WPC morphology, including the relative proportions of leaves, stems, and kernels and associated variations in tissue structure and hardness, can influence NIR reflectance and compositional predictions, suggesting that local bias adjustments or region-specific calibrations may improve accuracy across diverse germplasm and environments [
28]. In addition, the physical state of samples influences NIR reflectance because water in fresh, intact plant tissue produces strong absorption features and the larger surface heterogeneity of unground material alters light scattering, often reducing calibration precision compared to dried, ground samples; this effect has been observed in forage and silage NIR studies and highlights how sample state contributes to variability in predictive performance [
29].
Several factors should be considered when interpreting these results. OS-NIRS measurements used a common manufacturer-supplied calibration that was not independently validated with reference analyses; accordingly, results are best interpreted in terms of relative differences and analytical precision, with sensor reproducibility reflecting instrument and sampling effects, while inter-laboratory comparisons also include differences among proprietary calibrations. Because OS-NIRS calibrations are based on laboratory reference values, the sensor can reproduce the mean behavior of repeated laboratory measurements, but variability in the reference method is necessarily reflected in the calibration, so observed differences are not solely attributable to the sensor. The number of experiments was limited and conducted within a limited number of regions, so findings may not capture the full diversity of cropping systems, environmental conditions, or management practices. Regional variability could influence calibration robustness and the transferability of OS-NIRS performance. The results observed in Experiment 2 may reflect compositional extremes and regional forage characteristics, and while local calibration adjustment using reference analyses could improve performance under such conditions, this was beyond the scope of the present study. Differences in maize hybrid genetics and growing conditions were not explicitly controlled in this study and may contribute to variability in spectral response; however, these factors were not expected to materially affect the comparative assessment of OS-NIRS and laboratory precision under practical use conditions. In Experiment 1, inter-laboratory reproducibility was estimated using only two laboratories and therefore represents a limited assessment that should be interpreted cautiously relative to formal ISO 5725 reproducibility evaluations. Future multi-region studies incorporating independent reference analyses are needed to assess generalizability and calibration transferability across diverse production systems.
With respect to intra-laboratory repeatability, OS-NIRS sensors performed similarly or in some cases better than ALs. For instance, OS-NIRS provided highly repeatable protein and starch measurements in Experiments 2 and 3, often outperforming several ALs which had greater within-lab variability. These results support our first hypothesis that OS-NIRS would achieve within-laboratory repeatability comparable to ALs, even when analyzing heterogeneous, undried and unground samples [
5,
10]. The enhanced repeatability of OS-NIRS is particularly significant given the lack of sample drying and grinding, which are standard practices in ALs to minimize heterogeneity [
12,
14]. This suggests that modern portable OS-NIRS technology has advanced sufficiently to deliver consistent results despite greater sample variability, an outcome also reported in recent inter-comparison studies [
18]. The improved repeatability observed for some constituents, and in particular protein, with OS-NIRS may partly reflect the larger effective sample volume integrated across repeated measurements, which can reduce sampling-related variance compared with laboratory analyses based on small subsamples. While not explicitly tested here, this interpretation is consistent with established sampling error theory in near-infrared spectroscopy [
30].
In terms of inter-laboratory reproducibility, substantial variability persisted across both ALs and OS-NIRS for starch and NDF, consistent with the inherent heterogeneity of WPC samples [
21]. However, the inclusion of OS-NIRS data did not materially degrade reproducibility estimates for most constituents. In some cases, such as NDF in Experiment 2, OS-NIRS inclusion even reduced overall uncertainty. These findings partly confirm our second hypothesis that incorporating OS-NIRS would yield reproducibility metrics similar to those obtained when only AL data were considered. They also reflect broader challenges in achieving high reproducibility across forage laboratories, where differences in calibration sets, instrument configurations, and sample handling contribute to variability [
16,
20].
Although inclusion of OS-NIRS data occasionally widened confidence intervals for inter-laboratory comparisons, particularly for starch, these effects were modest relative to the overall magnitude of sample variability. Importantly, constituent means derived from combined AL and OS-NIRS analyses were nearly identical to those from ALs alone, with differences well within reproducibility standard deviations. This suggests that OS-NIRS sensors can be integrated into multi-lab networks without introducing systematic bias, provided that calibration models remain current and representative of the full range of expected forage compositions [
3,
24].
When directly comparing analytical strategies, clear differences emerged. Including OS-NIRS data alongside AL data generally improved intra-laboratory repeatability for protein and sometimes starch, but its effect on inter-laboratory reproducibility was more variable. In contrast, OS-NIRS data used alone provided repeatability metrics on par with, and in some cases superior to, ALs, though reproducibility outcomes were mixed. For example, OS-NIRS-only analysis narrowed confidence intervals for starch and NDF in Experiment 2 but widened them in Experiments 1 and 3. These contrasting patterns indicate that OS-NIRS is best positioned to strengthen within-laboratory consistency, while its effect on cross-laboratory reproducibility depends strongly on constituent type and experimental context.
Overall, our findings highlight the trade-offs inherent in deploying OS-NIRS in practical on-site WPC forage analysis. While OS-NIRS may not fully eliminate inter-laboratory variability, it offers substantial benefits in measurement repeatability, speed, and accessibility at the farm level. This capability enables producers and nutritionists to generate timely, reliable compositional estimates that support strategic storage decisions at the time of harvest, while still maintaining compatibility with commercial laboratory analyses. When comparing the analytical results from the OS-NIRS devices with those from the ALs for the constituents tested, some absolute differences in the typical reproducibility standard deviations of both methods (
Table 3) can be expected. Continued efforts to improve calibration transferability and to expand calibration datasets will further enhance the role of OS-NIRS in complementing laboratory-based systems [
7,
21].