Current Status and Future Opportunities for Grain Protein Prediction Using On- and Off-Combine Sensors: A Synthesis-Analysis of the Literature

: The spatial information about crop grain protein concentration (GPC) can be an important layer (i.e., a map that can be utilized in a geographic information system) with uses from nutrient management to grain marketing. Recently, on- and off-combine harvester sensors have been devel-oped for creating spatial GPC layers. The quality of these GPC layers, as measured by the coefﬁcient of determination (R 2 ) and the root mean squared error (RMSE) of the relationship between measured and predicted GPC, is affected by different sensing characteristics. The objectives of this synthesis analysis were to (i) contrast GPC prediction R 2 and RMSE for different sensor types (on-combine, off-combine proximal and remote); (ii) contrast and discuss the best spatial, temporal, and spectral resolutions and features, and the best statistical approach for off-combine sensors; and (iii) review current technology limitations and provide future directions for spatial GPC research and application. On-combine sensors were more accurate than remote sensors in predicting GPC, yet with similar precision. The most optimal conditions for creating reliable GPC predictions from off-combine sensors were sensing near anthesis using multiple spectral features that include the blue and green bands, and that are analyzed by complex statistical approaches. We discussed sensor choice in regard to previously identiﬁed uses of a GPC layer, and further proposed new uses with remote sensors including same season fertilizer management for increased GPC, and in advance segregated harvest planning related to ﬁeld prioritization and farm infrastructure. Limitations of the GPC literature were identiﬁed and future directions for GPC research were proposed as (i) performing GPC predictive studies on a larger variety of crops and water regimes; (ii) reporting proper GPC ground-truth calibrations; (iii) conducting proper model training, validation, and testing; (iv) reporting model ﬁt metrics that express greater concordance with the ideal predictive model; and (v) implementing and benchmarking one or more uses for a GPC layer.


Introduction
Grain protein concentration (GPC) is a critical trait, especially for crops with quality premium markets, such as wheat (Triticum aestivum L.). Commonly, producers harvest, store, and sell grain with potentially differing GPC but pricing at an average GPC level. Since GPC changes as a function of yield level [1][2][3][4][5], soil moisture [2,6,7], nitrogen (N) fertilizer [1,8,9], genotype, environment, and other management factors [6][7][8], even with similar yield levels a relatively uniform field is likely to produce spatially variable GPC and, if known, this information could be utilized by farmers for different purposes.
Knowing the magnitude and extent of spatial GPC variability can be used by farmers not only to segregate grain harvest for premium markets [9][10][11][12], but also as decision-Therefore, the objectives of this study were to conduct a synthesis analysis of current/past scientific literature and summarize the findings related to (i) contrasting GPC prediction of the R 2 and RMSE for different sensor types (on-combine, off-combine proximal and remote); (ii) contrasting and discussing, based on R 2 and RMSE, the best spatial, temporal, and spectral resolutions and features, and the best statistical approach for offcombine sensors; and (iii) reviewing current technologies' limitations and providing future directions for spatial GPC research and application.

Materials and Methods
A synthesis analysis of the literature has been conducted to collect and summarize studies reporting on the creation of a spatial GPC layer. The search was performed on the engines Google Scholar, Web of Science, and Web of Knowledge using the terms "oncombine", "remote sensing", "protein", and "prediction", last searched on November 2021 by two independent searchers. A study was included in the data base if it fulfilled the following search criteria: (i) it utilized one or more in field sensor types for GPC prediction, (ii) it collected ground truth GPC data, and (iii) it reported at least one of R 2 or RMSE. Studies with the main text in a language other than English were not considered. Based on the title, abstract, and reference lists, a total of 202 studies were downloaded and further screened, and only 84 fulfilled all search criteria and were included in the final set. Each study received a unique entry (i.e., one table row) identification number (id). Studies reporting on more than one sensor and/or crop type were accommodated by allocating more than one data entry (i.e., one table row for each sensor and/or crop type for a given study, distinguished by different letters following the paper entry id). Therefore, a single publication could contribute more than one entry id to our main database. With that, the final data set was comprised of 105 entries across the 84 selected studies.
Selected papers were then summarized and had different variables extracted, including descriptors related to (i) study specific, and (ii) sensor specific characteristics. Study specific characteristics included crop type, number of site years, ground truth GPC range (maximum-minimum observed GPC), and GPC sensor type (combine, proximal and remote). Sensor specific characteristics included GPC sensor spatial resolution (m), number of days sensed (grouped into classes 1, 2-5, 6-10 and >10 days), best sensing characteristics (timing, number of spectral features, type of spectral features, nonspectral covariables, spectral frequency), and best statistical approach (Table 1). Sensor specific characteristics did not apply to on-combine sensors and were collected only from off-combine studies.
Best timing for sensing was extracted as either the single growth stage (for studies with one day sensed) or the growth stage with the greatest R 2 (for studies with two or more days sensed) for GPC prediction. Best number of spectral features was characterized as whether a single spectral (only one band or vegetation index) or multiple spectral (more than one band and/or vegetation index) features were used in the greatest R 2 model reported. Type of spectral feature is a binary response variable (i.e., yes/no) characterized as whether the best spectral feature (bands and/or vegetation indices) was comprised by the bands blue (400-500 nm), green (500-600 nm), red (600-700 nm), red-edge (700-800 nm), near-infrared (800-1300 nm), and short-wave infrared (1300-1900 nm).
Nonspectral covariables were characterized as whether the best number of spectral features included extra covariables (e.g., temperature, precipitation, gluten category, mechanistic crop modeling) and grouped into single spectral, single spectral + other, multiple spectral, multiple spectral + other. Best spectral frequency was characterized as whether spectral data from a single or multiple dates were used in the greatest R 2 model reported in the study. Best statistical approach was characterized as whether the statistical analysis used in the greatest R 2 model was part of the bivariate family (i.e., y~x), the multivariate family (i.e., y~x 1 + x 2 + . . . x n ), the partial least-square regression (PLSR) family (including PLSR, powered PLSR and N-PLSR), or random forest artificial neural network (RF-ANN). Table 1. Summary of the selected studies for the synthesis analysis. Entry identification, citation, study and sensor specific characteristics, and grain protein concentration model fit metrics. GPC = grain protein concentration, SS = single spectral, MS = multiple spectral, SD = single date, MD = multiple dates, Bivariatef = bivariate family, Multivariatef = multivariate family, PLSRf = partial least square regression family, RF-ANN = random forest artificial neural network, Max = maximum, Min = minimum, RMSE = root mean square error.  [103], with Q1 through Q4 representing the first (greatest quality) through fourth (lowest quality) quartiles of the distribution of journals impact factor. Entries absent from the list were classified as one of C = conference proceedings, IR = internal report, PC = personal communication, T = thesis, NA = not available.

Entry
When available, both GPC model fit metrics of R 2 and RMSE were extracted. In case a study reported multiple R 2 and RMSE values (i.e., for different sites, years, crop growth stages, spectral features, and statistical model), the range of model fit metric values was extracted and summarized for each study. Studies reporting separate model fit metrics for different crops and/or sensor types had one model fit metric range for each combination among the studied and any of the crop and sensor types extracted.
The response variables maximum R 2 and minimum RMSE were individually analyzed as a function of the explanatory variables sensor type, spatial resolution, number of days sensed, best timing, number of spectral features, type of spectral features, nonspectral covariables, spectral frequency, and statistical approach. Models were run as fixed effect analysis of variance (ANOVA) using the lm function from stats package [35]. Linear model assumptions of residual variance homogeneity and outlying residuals were checked using fitted vs. standardized residual plots, and residual normality was checked using quantilequantile standardized residual plots. Model assumptions were deemed met and models were considered appropriate for inference. All statistical analyses were performed in R [35]. Computer code is available upon request.
For remote sensors, spatial resolution varied from 0.018 m to 1000 m with a mean of 64 m and median of 1.8 m. For remote and proximal sensors, the number of days sensed was grouped into the categories 1, 2-5, 6-10 and >10 days, with 39, 38, 4 and 10 entries in each category, respectively. A total of 13 different best timings for sensing were reported, whereby anthesis, heading, and grain filling were the most common (28, 13 and 8 entries, respectively, Supplementary Figure S1a). The best number of spectral features was single spectral in 60 entries and multiple spectral in 31 entries. The most frequent type of spectral features was, from most to least: NIR, red, green, red-edge, SWIR, and blue (Supplementary Table S1). The best spectral frequency was single date in 72 entries or multiple dates in 19 entries ( Table 1). The best statistical approach was bivariate in 56 entries, multivariate in 12 entries, PLSR family in 18 entries, and RFF-ANN in 5 entries (Table 1).
Model fit metrics R 2 and RMSE ranges were summarized and displayed as the maximum R 2 and minimum RMSE reported for each combination among the studied and any of the crop and sensor types (Table 1). Model fit R 2 was reported in 99 entries, ranging from 0.017 to 0.99, with a mean of 0.63 and median of 0.64. Of the 90 R 2 entries, 76 were calculated on training data and 24 on validation data. Training data based R 2 ranged from 0.02 to 0.99, with a mean of 0.6 and median of 0.61, and validation data based R 2 ranged from 0.39 to 0.97, with a mean of 0.73 and median of 0.74. Model fit RMSE was reported in 55 entries, ranging from 0.016% to 1.65%, with a mean of 0.67% and median of 0.64%. Of the 55 RMSE entries, 20 were calculated on training data and 35 on validation data. Training data based RMSE ranged from 0.02% to 1.53%, with a mean of 0.70% and median of 0.64%, and validation data based RMSE ranged from 0.1% to 1.65%, with a mean of 0.66% and median of 0.57%.

Sensor Type
Both on-and off-combine (i.e., proximal and remote) sensor types were evaluated, with on-combine directly sensing the grain during harvest and off-combine sensing the crop canopy during the growing season. The overall distribution of the coefficient of determination (R 2 ) and RMSE were obtained with the goal of synthesizing the knowledge on protein prediction for both on-and off-combine sensor types. Sensor type resulted in significant differences in maximum R 2 (p = 0.005), with combine and proximal sensing having the greatest R 2 (0.79 and 0.66 on average), and remote sensing the lowest R 2 (0.57 on average) (Figure 1). The greatest maximum R 2 reported for each sensor type was 0.98 (id = 40) for combine, 0.99 (ids = 63, 80) for proximal, and 0.93 (id = 76) for remote sensors (Table 1). Sensor type resulted in similar minimum RMSE (p = 0.4), varying from 0.54% to 0.71% on average (Figure 1). The least minimum RMSE reported for each sensor type was 0.28% (id = 40) for combine, 0.02% (id = 63) for proximal, and 0.2% (id = 76) for remote sensors (Table 1). These proximal and remote sensor studies utilized multiple spectral features, single date sensing, and RF-ANN statistical analysis.

Number of Days Sensed
The number of days sensed for proximal and remote sensors resulted in similar maximum R 2 (p = 0.94) and minimum RMSE (p = 0.47) (Supplementary Figure S1d,e). The number of days sensed with the greatest maximum R 2 for proximal sensors (ids = 63, 80) was 4 days (2-5 days category), and for remote sensors (id = 76) was 1 day. The number of days sensed with the least minimum RMSE for proximal sensors (id = 63) was 4 days (2-5 days category) and for remote sensors (id = 76) was 1 day (Table 1).

Number of Spectral Features
The number of spectral features for proximal and remote sensors used across study specific best models resulted in significant differences in maximum R 2 (p = 0.02), with models that included multiple spectral features (more than one band and/or vegetation index) having greater maximum R 2 (0.68 on average) than models including only a single spectral feature (0.57 on average) (Figure 2). The greatest maximum R 2 observed for single spectral features was 0.99 (id = 80) and with multiple spectral features was 0.99 (id = 63) ( Table 1). The number of spectral features resulted in similar minimum RMSE (p = 0.2), varying from 0.64% to 0.81% on average. The least minimum RMSE observed for single spectral features was 0.17% (id = 8) and with multiple spectral features was 0.02% (id = 63) ( Table 1). Figure 2. Distribution of (a) maximum R 2 and (b) minimum root mean squared error (RMSE) by type of the best spectral covariable (single and multiple spectral). Black dot and lines represent the mean ± standard deviation, and n is the number of observations for each best number of spectral features. In panel (a), means followed by the same letter are not significantly different at α = 0.05.

Type of Spectral Features
The spectral bands blue and green were identified as the most relevant features for modeling GPC for proximal and remote sensors and resulted in significant differences in maximum R 2 and minimum RMSE, whereas other bands had no effect (Figure 3). The inclusion of the blue band resulted in an increased maximum R 2 (0.76 on average), compared to when this band was not part of the best spectral feature (0.57 on average). The greatest maximum R 2 observed when the blue band was present was 0.97 (id = 8), and when it was absent was 0.99 (id = 80) (Supplementary Table S1). The inclusion of the green band resulted in a decreased minimum RMSE (0.68% on average) compared to when this band was not part of the best spectral feature (1.1% on average). The least minimum RMSE observed when the green band was present was 0.17% (id = 8), and when it was absent was 0.1% (id = 39) (Supplementary Table S1).

Nonspectral Covariables
The number of spectral features (bands and/or vegetation indices), along with the inclusion of other covariables (e.g., weather, gluten category, mechanistic crop modeling) for proximal and remote sensors, resulted in significant differences in maximum R 2 (p = 0.03, Supplementary Figure S1f). A single spectral feature alone had the lowest maximum R 2 (0.56 on average) and was significantly different from multiple spectral (0.69 on average). The greatest maximum R 2 for single spectral was 0.99 (id = 80), for single spectral plus other was 0.85 (id = 69), for multiple spectral was 0.99 (id = 63), and for multiple spectral plus other was 0.77 (id = 31a) ( Table 1).
The number of spectral features, along with the inclusion of other covariables, resulted in similar minimum RMSE (p = 0.45, Supplementary Figure S1g) and varied from 0.58% to 0.85%, on average. The least minimum RMSE observed for single spectral was 0.17% (id = 8), for single spectral plus other was 0.34% (id = 68), for multiple spectral was 0.02% (id = 63), and for multiple spectral plus other was 0.4% (id = 31a) ( Table 1).

Spectral Frequency
The best spectral frequency for proximal and remote sensors resulted in similar maximum R 2 (p = 0.7, Supplementary Figure S1h) with accuracies of 0.61 for single and 0.59 for multiple dates, on average. The greatest maximum R 2 observed for single date was 0.99 (ids = 63, 80), and for multiple dates was 0.89 (id = 38). The best spectral frequency resulted in similar minimum RMSE (p = 0.1, Supplementary Figure S1i), with single date precision of 0.66% and multiple dates precision of 0.92%, on average. The least minimum RMSE observed for single date was 0.02% (id = 63), and for multiple dates was 0.4% (id = 1a).

Statistical Approach
The statistical analysis approach for proximal and remote sensors resulted in significant differences in maximum R 2 (p = 0.007), with model average maximum R 2 increasing from 0.56 with simpler bivariate to 0.81 with complex RF-ANN models (Figure 4). The greatest maximum R 2 observed for the bivariate family was 0.99 (id = 80), for the multivariate family was 0.85 (id = 69), for the PLSR family was 0.92 (id = 15), and for RF-ANN was 0.99 (id = 63) ( Table 1). Figure 4. Distribution of (a) maximum R 2 and (b) minimum root mean squared error (RMSE) by statistical approach (bivariatef = bivariate family, multivariatef = multivariate family, PLSRf = partial least squares family, RF-ANN = random forest artificial neural network) utilized to model grain protein concentration. Black dot and lines represent the mean ± standard deviation, and n is the number of observations for each distribution of best statistical approach. Means followed by the same letter are not significantly different at α = 0.05.

Discussion
This synthesis analysis is the first attempt to summarize GPC prediction results across different sensor type studies based on GPC model accuracy and precision. Our work expands on the previous review published by [18] by (i) including both on-and off-combine sensors and comparing their performance in predicting GPC, (ii) identifying the most optimal off-combine sensing characteristics for creating a reliable GPC layer, and (iii) highlighting current limitations and laying the groundwork for future developments in GPC prediction.
Our study demonstrated that on-combine sensors are more accurate than remote sensors in predicting GPC, yet with similar precision. This is an important finding as it points to a tradeoff between GPC predictive performance (better with on-combine sensors) versus capacity for within season management, segregated harvest planning, and lower implementation cost (better with remote sensors). However, on-combine sensors can quickly become the "gold standard" for benchmarking in season prediction of GPC layer. These points are thoroughly discussed below.
Overall, off-combine GPC prediction accuracy was optimized when sensing near to anthesis, using more than one spectral feature comprised of the blue band along with complex statistical approaches. GPC prediction precision was optimized when sensing near anthesis and using a spectral feature comprised of the green band. Leaf N content at anthesis is correlated with GPC [46]. Sensing around this time provides information on the potential reservoir for N at canopy scale, an important source for protein formation via post-anthesis N remobilization to the grain [104]. Near anthesis, crop biomass generally surpasses the threshold of 2 leaf area index units above which the red band saturates and becomes insensitive to increasing biomass and chlorophyll levels [105,106]. Under conditions of high biomass and chlorophyll, replacing the red band with the green band in normalized difference vegetation indices has successfully been used to regain sensitivity of sensed data to crop reflectance [105], which can be related to the improved precision when using the green band observed in our study. Using both the green and blue bands in a normalized difference index had great GPC predictive accuracy and precision [41]. The authors attributed the success of combining both bands to their strong relationship with chlorophyll, carotenoids, and leaf N [41]. Further, the blue band has also been negatively correlated with biomass, leaf and plant water content, and GPC [48]. Thus, these bands can potentially contribute to improved GPC predictive power by combining information on both canopy level pigment concentration and water condition, two critical factors driving N uptake, remobilization, and GPC formation.
Our results point to the need of combining multiple vegetation indices with complex statistical analysis to assist in untangling intricate crop reflectance responses that are difficult to predict with fewer spectral features and simpler models. Complex statistical analysis, such as PLSR, RF and ANN, may result in improved model performance at the expense of interpretability. Optimizing predictive accuracy should be the main goal when creating GPC predictions, and only then should the issue of model interpretability be considered [107]. To address the lack of interpretability in GPC prediction from complex models, posthoc interpretability methods could be used to ensure predictions are evaluated within the context of their relevancy in addressing one or more specific uses of this technology [107].
The above guidelines provide a direction for future GPC sensing studies to increase the likelihood of creating accurate and precise GPC predictive models. Such models have been previously proposed to be utilized for (i) next season N fertilizer management, (ii) segregated harvest, and (iii) environmental compliance [18]. Within season remote sensing, we further propose new uses of a spatial GPC layer, including (iv) same season fertilizer management for increased GPC, and (v) in advance segregated harvest planning related to field prioritization and farm infrastructure.
Next season fertilizer N management can be assisted by the use of spatial GPC and grain yield layers to calculate N removal in the previous season, which is then utilized as the variable rate N layer to be applied in the following growing season in a monocropping system. Previous season GPC layer used for variable rate N has been demonstrated to maintain or increase wheat grain yield while maintaining or increasing GPC levels and net returns, compared to a fixed rate N management in semiarid regions [13,15]. Nonetheless, its applicability has neither been tested with different crop rotations nor in wider ranges of water availability, where in season weather drives crop growth, N response and GPC.
Same season N management for increased GPC can only be assisted by off-combine sensing, mainly via the utilization of remote sensing to improve scalability. For this, crop reflectance from in season sensing is transformed into a GPC layer, which is then used as a benchmark to calculate variable N fertilizer rates that, when applied economically, increase GPC at harvest. This approach would need to consider the yield potential of different zones of the field and the spatially differential relationships between grain yield and GPC [2,5]. This approach can benefit from the feedback from an at harvest on-combine GPC layer, providing the ground-truthing GPC data to improve the in season predictions (postmortem evaluation) and further adjust in season GPC predictive models.
Segregated harvest can be assisted by the use of a GPC layer created by both on-and off-combine sensors. When on-combine sensing is used, grain segregation occurs at the combine level, which requires a mounted GPC sensor and at least two separate grain bins, one for low and another for high protein grain [10]. The added hardware cost will be incurred by the producer, which could signify an earlier barrier for technology adoption, but when added to the cost of the machinery, then its general adoption will significantly expand over time. Moreover, since on-combine GPC sensing occurs in parallel with harvest, it does not allow for the preharvest assessment of GPC variability (within and across fields) for prioritization and planning purposes.
When off-combine sensing is used, especially if employed via remote sensors, the magnitude and extent of GPC variability across fields can be assessed and used to prioritize highly variable GPC fields for segregated harvest. Within a given field, GPC homogenous zones can be identified before harvest, and then harvested and stored separately. Finally, the preharvest GPC prediction gives the producer time to plan how to allocate grain storage capacity across different GPC harvest loads. As off-combine remote sensing does not require added hardware for sensing and grain segregation, it will be more cost-effective to a producer and prone to adoption. Nonetheless, formal validation of the level of agreement on the sensing should be performed by combining this data (off-combine) with the on-combine GPC layer.
An important aspect of GPC zoning is its temporal variability and predictability performance both in and across season. In season GPC prediction performance is expected to vary as crops grow, develop, undergo stresses, and senesce [26,84], and can be overcome by sensing fields at model optimized growth stages, such as near anthesis, as indicated by our results. Across season GPC prediction performance is directly affected by the temporal stability and persistence of GPC zones. Depending on crop, soil, weather, and management interactions, GPC zones can change not only in mean GPC value, but also in their spatial extent and location from one season to another [2,10,14,64]. To overcome this issue, GPC prediction maps need to be updated at least once every growing season. Studies are needed to evaluate the temporal stability and persistence of GPC zones both in and across season for multiple growing seasons, to elucidate how GPC changes spatially and temporally in the long term.
Environmental compliance can be assisted by the use of a GPC layer by providing information on spatial crop N removal, which, when combined with N inputs (e.g., fertilizer N rate), allows for the calculation of spatial N balance [17]. The larger the N balance, the more opportunity for environmental pollution via nitrate leaching and nitrous oxide emissions [16]. Thus, a spatial N balance layer can be used by the producer (i) to identify under and over fertilized zones and adjust N fertilization rates for the next crop in the rotation, and (ii) to demonstrate improvements in N balance and environmental stewardship to the general public [16].
Lastly, spatial GPC adoption for any one or more purposes will depend on farmers' interest in the technology based on its return on investment. A recent survey of soybean farmers from multiple U.S. states identified that >55% of 186 respondents would be interested in investing in technology to assess spatial GPC if a premium level of at least $0.50 was paid per bushel when GPC was above a certain threshold [108]. Farmer interest is expected to increase, as both the direct and indirect benefits of a spatial GPC layer become more evident.
Limitations of the current study and the literature were identified related to (i) crop representativeness, (ii) GPC analysis, (iii) model training, (iv) model fit metric reporting, and (v) model implementation. A large majority of the studies (i.e., 85%) were conducted on wheat. The success of this technology will depend on how it can be applied to different crops and rotations, thus, more studies on a large variety of crop types are necessary.
Reliable ground-truth GPC data is the first step in creating a robust GPC predictive model, but this is a labor and cost intensive task. Most of the studies included in this work utilized laboratory near infra-red equipment to indirectly measure ground-truth GPC, yet very few studies reported on whether this equipment had been properly calibrated with a traditional chemical laboratory N analysis (e.g., Kjeldahl N analysis; [109]). Future studies should ensure that proper equipment calibration is conducted and report on its performance.
The majority of the studies reported model fit metrics computed on the same data used for training the model. This practice likely overestimates model accuracy and precision, building models that fit the training data well, but that have issues predicting new data correctly across different spatiotemporal scales (different fields for the same crop, and same crop but in other seasons). Future research directions should split the data set into training, validation, and testing, using the training and validation sets for creating the model [21,29], and the testing set to evaluate model transfer learning ability to different spatiotemporal scales.
This synthesis focused on R 2 and RMSE to assess GPC prediction quality, as these were the most commonly reported model fit metrics. Assuming the ideal model would have null intercept, unity slope, and low variability, only RMSE responds as expected (i.e., increasing) as the actual GPC predictive model deviates from the ideal condition. On the other hand, R 2 does not respond as intercept deviates from null, and in fact increases as slope deviates from unity, which is undesirable with predicted-observed relationships where a slope of unity is the desired outcome. Thus, we recommend future studies also report other model fit metrics, such as the concordance correlation coefficient [110]. This metric has the benefits of (i) responding to deviations from a null intercept and unity slope, (ii) being decomposable into precision and accuracy components, and (iii) the metric itself and its components of precision and accuracy can undergo significance testing [111,112].
Finally, all studies included in this synthesis reported creating a sensor based spatial GPC prediction, yet the majority of the studies lacked demonstrations of its implementation. This synthesis showed that a GPC layer can be created with reasonable accuracy and precision. Future studies should build on this finding, create reliable GPC predictions, implement the predicted GPC layer to any one or more uses, and test its performance against the conventional methods. This benchmarking will be key to elucidate the potential economic value of a spatial GPC layer across various uses and define the future of this technology.

Conclusions
This synthesis gathered and summarized 84 studies from the literature reporting the accuracy and/or precision of grain protein concentration predictive models. We found that on-combine sensors are more accurate than remote sensors, although both have similar precision. This has implications on the tradeoff between model performance versus model scalability and potential for decision-making planning. For off-combine sensors, we identified the most important sensing characteristics to create a reliable grain protein concentration model. Those included sensing near anthesis using multiple spectral features that include the blue and green bands, and that are analyzed by complex statistical approaches.
We contrasted and compared the use of different sensor types for the previously proposed uses of a grain protein concentration layer of (i) next-season N fertilizer management, (ii) segregated harvest, and (iii) environmental compliance. We further proposed two new uses of an in season grain protein concentration layer, namely, (iv) within season nitrogen management for increased grain protein concentration, and (v) in advance segregated harvest planning.
We identified current limitations of the technology and proposed directions to be explored by future research. Those included performing grain protein concentration predictive studies on a larger variety of crops and water regimes; performing and reporting proper grain protein concentration ground-truth calibrations; creating transferable grain protein concentration predictive models by conducting proper model training, validation, and testing; reporting model fit metrics that express greater concordance with the ideal predictive model and that can be partitioned into accuracy and precision; and implementing and benchmarking one or more uses for a grain protein concentration layer. Altogether, these research lines will assist in developing reliable grain protein concentration predictive capacity across diverse crops, environments, and management options, and will further elucidate the potential agronomic, economic, and environmental value of a grain protein concentration layer when implemented across multiple uses in a farming operation.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/rs13245027/s1, Figure S1: Best crop stage for grain protein concentration (GPC) prediction based on maximum R 2 (a), and distribution of maximum R 2 and minimum root mean squared error (RMSE) explained by spatial resolution of remote sensors (b,c), number of days sensed by proximal and remote sensors (d,e), best covariable type (f,g), and best spectral frequency (h,i). Here, n is the number of observations for each distribution. Black dot and lines represent the mean ± standard deviation. In panel (f), means followed by the same letter are not significantly different at α = 0.05. Table S1: Summary of the selected studies for the synthesis analysis, including entry identification, citation, crop, and sensor-specific characteristics including sensor type, spatial resolution, the best type of spectral feature and columns identifying whether the best type of spectral feature included or not the bands of blue (400-500 nm), green (500-600 nm), red (600-700 nm), red-edge (RE, 700-800 nm), near-infrared (NIR, 800-1300 nm), and short-wave infrared (1300-1900 nm), and grain protein concentration model fit metrics.  Acknowledgments: This is Contribution no. 22-137-J from the Kansas Agricultural Experiment Station.

Conflicts of Interest:
Author Y.W. is employed by the company John Deere. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The sponsors had no role in the design, execution, interpretation or writing of the study.