Corn Nitrogen Nutrition Index Prediction Improved by Integrating Genetic, Environmental, and Management Factors with Active Canopy Sensing Using Machine Learning

Accurate nitrogen (N) diagnosis early in the growing season across diverse soil, weather, and management conditions is challenging. Strategies using multi-source data are hypothesized to perform significantly better than approaches using crop sensing information alone. The objective of this study was to evaluate, across diverse environments, the potential for integrating genetic (e.g., comparative relative maturity and growing degree units to key developmental growth stages), environmental (e.g., soil and weather), and management (e.g., seeding rate, irrigation, previous crop, and preplant N rate) information with active canopy sensor data for improved corn N nutrition index (NNI) prediction using machine learning methods. Thirteen site-year corn (Zea mays L.) N rate experiments involving eight N treatments conducted in four US Midwest states in 2015 and 2016 were used for this study. A proximal RapidSCAN CS-45 active canopy sensor was used to collect corn canopy reflectance data around the V9 developmental growth stage. The utility of vegetation indices and ancillary data for predicting corn aboveground biomass, plant N concentration, plant N uptake, and NNI was evaluated using singular variable regression and machine learning methods. The results indicated that when the genetic, environmental, and management data were used together with the active canopy sensor data, corn N status indicators could be more reliably predicted either using support vector regression (R2 = 0.74–0.90 for prediction) or random forest regression models (R2 = 0.84–0.93 for prediction), as compared with using the best-performing single vegetation index or using a normalized difference vegetation index (NDVI) and normalized difference red edge (NDRE) together (R2 < 0.30). The N diagnostic accuracy based on the NNI was 87% using the data fusion approach with random forest regression (kappa statistic = 0.75), which was better than the result of a support vector regression model using the same inputs. The NDRE index was consistently ranked as the most important variable for predicting all the four corn N status indicators, followed by the preplant N rate. It is concluded that incorporating genetic, environmental, and management information with canopy sensing data can significantly improve in-season corn N status prediction and diagnosis across diverse soil and weather conditions.


Introduction
Proper nitrogen (N) management is critical for optimizing corn (Zea mays L.) yield and quality, farmer's profitability, and sustainable development [1][2][3][4]. In addition, crop N management is challenging, due to its dynamic nature. The combination of these factors results in complex interactions driving N dynamics and the spatial and temporal variability in both soil N supply and crop N demand [2,5]. Mismanaging N can significantly impact food security, environmental sustainability, human health, and climate change [1][2][3]. Precision N management aims to match N supply and crop N demand in both space and time and has the potential to improve N use efficiency and reduce negative environmental impacts [2,4]. Technologies that can be used to reliably and efficiently diagnose crop N status over space and time in a timely manner are urgently needed to guide in-season site-specific N management.
One potential method for assessing corn N status is using a N nutrition index (NNI). The NNI is the ratio of measured plant N concentration (PNC) over an established critical PNC value (N c )-defined as the minimum PNC that can produce maximum aboveground biomass (AGB) [6]. The N c changes throughout the growing season as the concentration of adequate N becomes diluted (i.e., declines) with increasing AGB. The relationship between Nc and AGB can be described by a negative power function commonly referred to as a critical N dilution curve [6]. Various such curves have been established for different crops, including corn [7], wheat (Triticum aestivum L.) [8], rice (Oryza sativa L.) [9], and potato (Solanum tuberosum L.) [10]. Utilizing established dilution curves allows one to calculate the N c of a crop at any given AGB and further calculate the NNI based on the measured PNC. The NNI will inform the user if the crop N status is deficient (NNI < 1), optimum (NNI = 1), or surplus (NNI >1) [6]. However, the NNI values representing optimum N can vary (e.g., optimum N status is when 0.95 ≥ NNI ≤ 1.05) [11]. In-season N application rates can be increased if NNI is less than 1 or 0.95 and reduced if NNI is greater than 1 or 1.05. More quantitative in-season N recommendation algorithms have been developed using NNI [12] or plant N uptake (PNU) and critical PNU calculated from AGB and a critical N dilution curve [13], leading to improved N use efficiency over conventional fixed rate and time applications. However, the drawback of this method is that it requires destructive sampling and chemical analysis, which has limited its implementation for on-farm precision N management.
A promising strategy to overcome this method's shortcomings is to use proximal and/or remote sensing technologies to non-destructively estimate NNI [11,13]. Passive sensing technologies are limited by poor weather or poor lighting conditions. As a result, active canopy sensors may be more useful for precision N management, because they provide their own light sources and are not limited by environmental light conditions, and therefore can be used at any time of the day [11,14,15]. These sensors measure reflected light in several wavelengths, including red, near-infrared, and red edge wavelengths. Combining these measurements into vegetation indices-such as the normalized difference vegetation index (NDVI) or the normalized difference red edge (NDRE)-are good estimators of AGB, leaf area index, and PNU [11,14,15]. However, predictions of PNC and NNI are generally not satisfactory using commonly used vegetation indices and more efforts are needed to further improve their non-destructive predictions, especially across diverse soil, weather, and management conditions [11,15,16].
One approach to improve the prediction of PNC and NNI is to include multiple vegetation indices in the prediction models. Previous studies indicated that a combination of different vegetation indices using stepwise multiple linear regression or machine learning methods could significantly improve the prediction of crop N status indicators [17][18][19]. In general, machine learning models performed better than multiple linear regression models due to their capability to model complicated non-linear relationships. Support vector regression (SVR) and random forest regression (RFR) models generally performed better than other tested models [17,18].
Further improvement could occur by accounting for genetics (e.g., crop varieties), environmental conditions (e.g., soil and weather conditions), and management practices (e.g., preplant N fertilizer application rates, planting density, rotations, irrigation, and tillage practices, etc.). With most of the current studies limited to a single region or province (e.g., a US state), it is difficult to determine how genetic, environmental, and management factors impact the performance of active canopy sensors on a regional level. One approach to overcome the influences of genetic, environmental, and management factors is to use N-rich plots or strips as references or virtual references (naturally better crop growth areas in a field) to calculate N sufficiency or N response indices for N status diagnosis or making N recommendations [11,[20][21][22][23]. However, the location of the reference plots, strips, or areas can influence the diagnosis results or N recommendations, and no consensus has been reached about where to put the references and how many will be needed [24,25]. Another approach is to incorporate soil and weather information to improve crop sensor-based N recommendation or the prediction of crop variables [26][27][28][29][30][31]. A recent study found that incorporating soil and weather information improved the performance of a crop sensorbased N recommendation algorithm developed by the University of Missouri and tested across the US Midwest, reducing the difference between sensor-recommended N rates and economic optimum N rate by~25 kg ha −1 [26]. Additional improvement with this process was observed when using some machine learning algorithms (e.g., RFR, decision tree, and elastic net) [27][28][29][30].
To date, there has been some effort at predicting crop N status and the economic optimum N rate using environmental and management information [15,27,28,30]. However, studies focusing on NNI prediction using canopy reflectance sensing that incorporates genetic, environmental, and management information have not been reported. Being able to predict NNI would help farmers assess their crop N status and make in-season N management decisions [12,13]. Therefore, the objective of this study was to use different machine learning methods to evaluate, across diverse environments, the potential for integrating genetic, environmental, and management factors with active canopy sensor data for improving predictions of corn N status indicators. A sub-objective was to evaluate the relationship between previously published vegetation indices and corn N status indicators.

Experimental Design
Data were collected through a research collaboration between Corteva Agrisciences and eight US Midwest universities across an array of soil, weather, and management conditions of the US Corn Belt [31]. Although comprehensive data were collected during this project, corn AGB and PNC data were only available at the V9 ± 1 developmental growth stage [32] at 13 site-years from four states (Illinois, Iowa, Missouri, and Nebraska) from the 2015 and 2016 growing seasons. The V9 ± 1 developmental growth stage is the key stage for side-dress N application for most farmers and is a focus of this study. The average plot dimension was 3 m wide and 15 m long [33]. Each N response experiment had 16 N fertilizer treatments, with 4 replications in a randomized complete block design. Eight N application rate treatments (0-315 kg N ha −1 in 45 kg ha −1 increments) applied at the planting stage were used for this analysis. More site characteristic information, data descriptions, and data can be found in previous publications [31,33].

Measured Features
The corn canopy reflectance was measured using the RapidSCAN CS-45 proximal active canopy sensor (Holland Scientific, Lincoln, NE, USA) with wavelengths centered at 670, 730, and 780 nm. The sensor data were collected at the V9 ± 1 developmental growth stage. The sensor was held 60 cm above the corn canopy and walked at about 4 km h −1 to collect reflectance data from the two middle rows of each plot, collecting an average of 120 observations per plot. The reflectance data (i.e., red, near-infrared, and red edge wavelengths) were averaged to represent each plot. Based on previous research and preliminary analysis, 6 vegetation indices (Table 1) were calculated using the average wavelengths. The NDVI and NDRE indices were defined as sensor-provided vegetation indices in this study, because the RapidSCAN sensor can automatically calculate these indices.
Note: R Red : the reflectance data at the red wavelength (670 nm). R RE : the reflectance data at the red edge wavelength (720 nm). R NIR : the reflectance data at the NIR wavelength (780 nm).
After sensor data collection, six representative corn plants (entire aboveground plant) were taken from the middle two rows of each plot. The plant samples were dried at 60-70 • C to a constant weight, weighed to determine dry biomass, and then ground to pass through a <1 mm sieve. The PNC was determined using the Dumas combustion method with an Elementar Rapid N Cube (Elementar Analysensyteme GmbH, Langenselbold, Germany) by Agvise Laboratory (Northwood, ND, USA). The PNU was calculated by multiplying AGB and PNC. The AGB and PNU are not typically used as corn N status indicators, but because they are used to calculate or are related to PNC and NNI, they are termed "corn N status indicators" in this study.
Hobo U30 automatic weather stations (Onset Computer Corporation, Bourne, MA, USA) adjacent to each trial site [31] collected precipitation and temperature on a 15 min interval. Irrigation was not part of the precipitation measurement. Weather measurements were used to calculate additional weather variables from the time of planting to the time of sensing, including total precipitation (ranging from 95 to 426 mm), corn heat units (CHU; ranging from 943 to 1543), growing degree days (GDD; ranging from 407 to 662), Shannon diversity index (SDI; ranging from 0.52 to 0.75), and abundant and well-distributed rainfall (AWDR; ranging from 53 to 251) (see Table 2 for calculations and Table 3 for descriptions of the weather variables). Note: Y max and Y min are the contributions to CHU from the daily maximum (T max , up to 30 • C) and minimum (T min ) air temperatures in degrees Celsius, respectively: pi is the ratio of daily rainfall to PPT. n is the days from planting to sampling. Soil texture was determined from core samples taken at a depth of 1.2 m. Each diagnostic horizon was analyzed for sand, silt, and clay (pipette method) by the University of Missouri's Soil Health Assessment Center. For this analysis, data were weighed by soil depth to obtain an average value for 0-30 cm [31,33]. Soil texture for the top 30 cm included clay (ranging from 3 to 40%), silt (ranging from 7 to 71%), and sand (ranging from 5 to 91%) ( Table 3).

Feature Engineering
The NNI was calculated as the ratio of PNC over N c , derived using the critical N dilution curve developed by Plénet and Lemaire [7] for an AGB greater than 1 t ha −1 following Equation (1): where N c is the critical N concentration and W is the dry AGB in t ha −1 . The first step of the analysis was determining the relationship between single vegetation indices and crop N status indicators (AGB, PNC, PNU, and NNI). Next, machine learning algorithms were used to predict each of the corn N status indicators using vegetation indices alone and with genetic, environmental, and management data. In the preliminary analysis, including more vegetation indices than the sensor-provided ones (NDVI and NDRE) did little to enhance prediction of corn N status indicators. Therefore, only the sensor-provided vegetation indices were used in the further analysis. The final step was validating the different models for predicting NNI. Model accuracy was based on classifying NNI as either deficient, optimal, or surplus. In these analyses, data (n = 414; 2 missing plots) from 13 different site-years were pooled together, with 75% of the data being used for model calibration and 25% for prediction. For the first step, to determine the relationship between each vegetation index and N status indicators, singular regression models were developed using Microsoft Excel 2016 (Microsoft Inc., Seattle, WA, USA). For the second step, two machine learning algorithms (SVR and RFR) were used to predict each of the four corn N status indicators. The SVR and RFR models were developed using the scikit-learn Python machine learning library [40,41]. The agreement between the observed and predicted parameters was evaluated using the coefficient of determination (R 2 ), root mean square errors (RMSE), and mean absolute errors (MAE) generated during prediction. The Gini coefficients were also derived from the RFR models to describe the importance of each of the variables considered.
The final step was to evaluate the performance of different modeling approaches to predict NNI. The NNI diagnostic results from the singular regression and machine learning models were compared to those based on the NNI that was destructively determined in the prediction dataset (n = 104). The following diagnostic criteria were used: N was deficient, optimal, and surplus when NNI < 0.95, 0.95 ≤ NNI ≤ 1.05, and NNI > 1.05, respectively [11]. The diagnostic accuracy was evaluated using accuracy (%), precision (%), recall (%), F1_scale (%) [41], and the kappa statistic [42]. Accuracy was calculated by finding the total number of correctly classified items and dividing that by the total number of items. Precision was calculated as the number of true positives divided by the total number of true positives and false positives. Recall was calculated as the sum of true positives across all classes divided by the sum of true positives and false negatives across all classes. F1 score is the harmonic mean of precision and recall [41]. The kappa statistic corrects the agreement that occurs by chance and is a more robust measure of the agreement of two classifications [42]. A kappa value of 1 indicates perfect agreement between two categorization systems, while kappa values ≥ 0.60, 0.4-0.6, and < 0.4 indicate satisfactory, moderate, and weak agreement, respectively [42].

Variability of Corn N Status Indicators
Across site-years, AGB had the highest variability, followed by PNU, PNC, and NNI in the calibration dataset ( Table 5). The prediction dataset had similar variability (Table 5).

Best Vegetation Indices for Predicating Corn N Status Indicators
The pairwise Pearson correlation coefficients (r) for relationships between vegetation indices and corn N status indicators are presented in Figure 1. The correlation coefficients between AGB or PNC and VIs were relatively low. NDRE (r = 0.74, p < 0.05) and the difference vegetation index (DVI) (r = 0.73, p < 0.05) were more strongly correlated with PNU than other vegetation indices. The Maccioni index (MACC) (r = 0.73, p < 0.05) and NDRE (r = 0.72, p < 0.05) had stronger correlations with NNI. The best performing singular models using a single vegetation index are presented in Table 6. PNU and NNI were best estimated with exponential models using NDRE and MACC, respectively. However, the R 2 for calibration and prediction was all ≤ 0.30. For AGB and PNC, no significant models were identified.

Using Machine Learning for Predicting Corn N Indicators
The performance of the machine learning models developed using the two sensorprovided vegetation indices (NDVI and NDRE) with/without genetic, environmental, and management information are shown in Table 7. No models performed well using the sensor-provided vegetation indices alone. Table 7. The calibration and prediction results of support vector regression (SVR) and random forest regression (RFR) models using different input variables for predicting corn aboveground biomass, plant N concentration, plant N uptake, and N nutrition index across site-years at around the V9 growth stage. Here, G, E, and M refer to genetic, environmental, and management information, respectively. When combining NDVI and NDRE with genetic, environmental, and management information, both machine learning models significantly improved their prediction results ( Table 7). The RFR models performed slightly better during calibration (R 2 = 0.93-0.97) than the SVR models (R 2 = 0.85-0.95), but both performed similarly with the prediction dataset. Because the RFR models showed improved performance metrics, the remainder of this paper will focus on these results. The scatter plots of measured and predicted corn N status indicators for the prediction dataset using sensor-provided vegetation indices and genetic, environmental, and management information based on the RFR models are shown in Figure 2. For RFR models, the Gini coefficient analysis showed that NDRE was consistently the most important variable for predicting all four N status indicators (Figure 3). For AGB, the second, third, and fourth important variables were all related to hybrid differences (GDUs, silk comparative relative maturity, and GDUs to silk), followed by silt, NDVI, and preplant N rate, with Gini coefficients ≥0.05. For PNC, silt, sand, preplant N rate, and AWDR were the most important variables, followed by cumulative precipitation, GDD and clay content (with Gini coefficients ≥0.05). For PNU, NDRE and CHU were the two most important variables. For NNI, NDRE and preplant N rate were the main factors. Previous crop, irrigation, and seeding rate were consistently unimportant for all four N status indicators.

Accuracy of N Status Diagnosis
The machine learning models using NDVI and NDRE together with genetic, environmental, and management information had the overall accuracy of 67-87% (Table 8), with the RFR model performing better than the SVR model.

The Importance of Using Multi-Source Data Fusion for In-Season Corn N Status Prediction
While the RapidSCAN sensor has NDVI and NDRE as sensor-provided vegetation indices, NDRE was consistently identified as the more important variable for in-season N status prediction. This is consistent with what others have reported [14][15][16]. However, some researchers also found that NDVI performed similarly to or slightly better than NDRE for predicting crop AGB, NNI, or yield [15,43,44]. Typically, NDVI performs well at predicting AGB and PNU during the early growth stages before canopy closure, but at moderate to high biomass, the near-infrared reflectance can continue to increase with biomass, while the red reflectance does not change much with biomass. The result is that the NDVI becomes saturated with canopy closure [14]. This is mainly because visible light has low transmittance through leaves and is only influenced by the top layers of the crop canopy, especially after canopy closure, while the near-infrared light has higher transmittance through leaves and can penetrate deeper into the crop canopy [45,46]. In contrast, red edge and near-infrared bands penetrate the crop canopy similarly, so vegetation indices using these bands (e.g., NDRE) can better overcome the saturation problem, and red edge-based vegetation indices have been found to perform better than the NDVI at later growth stages for predicting crop yield, biomass, leaf area index, PNU, etc. [14,16]. While we found NDRE to be more consistent, it was advantageous to include both vegetation indices in the prediction models for them to function across diverse conditions (e.g., differences in crop heights and canopy closure).
When genetic, environmental, and management information was combined with vegetation indices, 74% or more of the variability in corn N status indicators could be predicted. This was true for both the calibration and prediction datasets, either using the SVR or RFR method. The RFR-based NNI prediction model performed the best, with the overall diagnostic accuracy being 87%, and the kappa statistics being 0.75. This result was better than most of the results reported by other researchers for small plot research using GreenSeeker sensor information on corn (kappa values of 0.36-0.66) [11], RapidSCAN on rice (kappa values of 0.14-0.56) [16], the Crop Circle Phenom sensor on corn (kappa values of 0.22-0.54) [15], and canopy fluorescence sensor Multiplex 3 on rice (kappa values of 0.23-0.84) [47]. These results were all based on small plot experiments at the same site with similar soil and climatic conditions. In a study using unmanned aerial vehicle (UAV) remote sensing to diagnose wheat N status across different fields with variable hybrids, soils, and management practices in a village, the kappa statistics were only 0.28 to 0.37 [23].
Compared with previous research, this study was unique in that it incorporated different hybrids, soils, climatic conditions, and management practices across a wide geographic region (four US Midwest states). All these results demonstrated the importance of combining genetic, environmental, and management information with crop sensing data for in-season prediction and the diagnosis of crop N status.

Important Factors for N Indicator Estimation
The amount of fertilizer N applied at planting was identified as the second most important factor for predicting NNI, the third for predicting PNC, the fourth for PNU, and the seventh for predicting AGB in this study. The preplant N rate varied from 0 to 315 kg N ha −1 in this study, which can significantly influence corn N status. This result was supported by a previous study which indicated that the preplant N rate was the most important input variable based on extreme gradient boosting models incorporating Crop Circle Phenom sensor data and management data for predicting corn PNC, PNU, and NNI [15]. The preplant N rate may be less important for a specific commercial field if a uniform preplant N fertilizer rate was applied across the field. However, preplant N fertilizer rates can vary from field to field, and within a commercial field if variable-rate N was applied before planting. Therefore, it will be important for the models predicting crop N status to include preplant N rate information. Such data can be easily obtained from as-applied data.
Soil texture can affect soil water flow, soil organic matter N mineralization, nutrient dynamics, and N availability [48]. It has been found that soil texture has a dominant effect on corn response to N application, with a greater response in fine-textured soils than in medium-textured soils [49]. This study used the percentages of clay, silt, and sand content to represent soil texture to make it easier to incorporate soil texture information in machine learning models. This study indicated all three soil texture variables were among the top 10 important variables for predicting PNC, PNU, and NNI. It has been found that active canopy sensor-based in-season N recommendations were improved similarly when adjusted using either measured or Soil Survey Geographic database (SSURGO) soil texture data (clay content) [26]. Soil texture data can be easily retrieved from the SSURGO database, making it very practical to include such data in N status prediction models.
Weather conditions during the growing season, especially rainfall patterns and temperature accumulation, can influence soil biological activity, soil organic matter decomposition, mineralization, soil N supply and losses, and therefore crop N status and growth [50][51][52]. The Shannon diversity index was identified as the most important climatic variable for predicting both PNU and NNI, while abundant and well-distributed rainfall and total precipitation were identified as the most important climatic variables for predicting PNC and AGB, respectively. High rainfall can cause significant N losses, and thus lead to higher responses to N fertilization; the influence can vary depending on the distribution of rainfall, represented by both the Shannon diversity index and the abundant and well-distributed rainfall measure [50]. Crop growth and development also depends on temperature, with faster growth under warmer conditions and no growth beyond a threshold temperature [51]. The GDD variable has been used to predict crop phenological stages [51], N release from crop residues and amendments [52], and grain yield [53]. It can also guide in-season N, irrigation, and pesticide management [54]. Higher heat accumulation can cause a higher rate of N mineralization and more N volatilization, crop growth, and PNU, and as a result, higher CHU can lead to greater responses to N fertilization [51]. Temperature was comparatively less important than rainfall patterns in this study, with CHU being important for predicting PNU and NNI, and GDD important for PNC and AGB. We conclude that it is important to include these variables to better indicate the timing of in-season sensor data collection and reflect the temperature situation of each site-year [20,54].
Corn hybrids differ in growth, yield components, responses to N fertilization, N uptake efficiency, and N use efficiency [55][56][57][58]. Corn hybrids have different abilities to maintain yield under N or water stress, and hybrid differences can account for one third of grain N concentration variation [5], but it is difficult to account for hybrid differences due to the large number of hybrids on the market [5,58,59]. In this study, we used comparative relative maturity, silk comparative relative maturity, growing degree units to silk, and growing degree units to physiological maturity to represent hybrid differences. Corn hybrid comparative relative maturity ratings are based on hybrid comparisons with maturity checks from grain harvest moisture level and at flowering stage [60]. This gives growers a "relative" idea of how hybrids from the same company will advance through the different reproductive stages (e.g., flowering, grain filling and maturity, etc.) [60]. The results of this study indicated that hybrid information was most important for AGB prediction, with growing degree units to physiological maturity, silk comparative relative maturity, and growing degree units to silk being the second, third, and fourth most important variables, respectively. For other N status indicators, hybrid-related variables were not ranked among the top 10 important variables. In general, comparative relative maturity was less important than other hybrid-related variables. The hybrid data are not provided by all seed companies and are not comparable between companies. It will be important for seed companies to provide such data, derived using the same methodology, for hybrid data to be practically used for N status prediction and improved N management.
Tillage was relatively more important for predicting AGB and PNU than for PNC and NNI. Previous studies indicated that more N fertilizer will be needed in no-till or minimum tillage systems to produce the same level of corn grain yield as conventional tillage systems [61][62][63]. Edalat et al. [63] found that corn leaf N concentrations were significantly higher in minimum tillage systems than in conventional tillage systems, possibly caused by higher water use efficiency or a dilution effect. In addition, NDVI values were found to be higher in conventional tillage plots than in minimum tillage plots, possibly due to differences in plant N nutrition and/or soil conditions [63]. More studies are needed to further evaluate the importance of including tillage information in the prediction models.
In general, the information on previous crop, irrigation, and seeding rate was not important for predicting any of the N status indicators. Others found that in rainfed agriculture, N supply from chemical fertilizers or the legume N of the previous crop can be similar, while the results can be different in irrigated agriculture [64]. The seeding rate is an important factor influencing corn yield because it influences stalk diameter, intercepted photosynthetically active radiation, leaf area index, and lodging, etc. [65]. Yet at times, corn PNC, AGB, and grain yield may not be affected by the seeding rate [66]. The seeding rate in this study had a narrow range (80,000 to 90,000 seeds ha −1 ), all recommended to be in the optimal range for these growing conditions. Irrigation can be an important factor influencing corn yield and responses to N fertilizers in dry areas [67]. However, in this study, irrigation was only used for sites with sandy soils, and the site-years that did not receive irrigation had enough precipitation so that water stress during the growing season was minimal. Thus, irrigation information in this study was not identified as important. Removing the irrigation information from the data did not change the performance of the models for predicting PNC and NNI (data not shown). However, since these data are easily available, keeping them in the models is advisable to make them more stable when they are applied across diverse conditions that include both rainfed and irrigated fields.
Although more detailed soil data are available in this study and can further improve the performance of the models, only variables typically used and easily available to farmers were included to make the models practical, and the results are very encouraging.

Model Selection
The RFR algorithm is a non-parametric statistical technique consisting of a combination of trees. Each tree is generated by bootstrap samples, leaving about one third of the overall sample for prediction. Each split of the tree is determined using a randomized subset of the predictors at each node. The final outcome is the average of the results of all the trees [68]. The random forest is capable of synthesizing regression functions based on discrete or continuous datasets and has the ability to deal with complex relationships between predictors due to the noise and large amount of data [68]. It has been reported that RFR can handle high data dimensionality and multicollinearity and is fast and insensitive to over-fitting [69].
The SVR algorithm is a nonlinear novel small sample learning method with a proven theoretical foundation, and the final decision function is determined by only a few support vectors [70]. The computational complexity depends on the number of support vectors, rather than the dimension of the sample space. This not only helps capture key samples and cull many redundant samples, but also makes the algorithm simple and robust [70]. In this study, SVR model performance was quite stable, with smaller differences between calibration and prediction results than the RFR model.
Feature selection and relevance can benefit the performance and interpretation of machine learning algorithms. Rogers and Gunn [71] indicated that the lack of implicit features within RFR had an adverse effect on the accuracy and efficiency of the algorithm. The randomization in both bagging samples and feature selection can cause the trees in the forest to select uninformative features for node splitting, especially in high-dimensional data [71,72]. The main cause is that, in growing a tree from the bagged sample data, the subspace of features randomly sampled from thousands of features to split a node of the tree is often dominated by uninformative features (or noise), and the tree grown from such a bagged subspace of features will have a low prediction accuracy with RFR models [72]. The SVR algorithm based on the structural risk minimization principle can minimize the expected error of learning and reduce the problem of over-fitting. Therefore, the selection of models is problem dependent. The results of this study indicated that, with limited input variables and a small dataset, the SVR models were more stable. When multi-source data are used, the differences between SVR and RFR were reduced compared with using sensor-provided vegetation indices alone, and the RFR models were preferred.
In summary, this study highlighted the importance of combining genetic, environmental, and management data with crop sensing data using machine learning models to predict and diagnose corn N status during the growing season across diverse growing and management conditions. This information can help growers to make in-season side-dress N application decisions [12,13,30]. More data need to be collected to evaluate more advanced machine learning algorithms to determine the potential to further improve in-season crop N status prediction and diagnosis across diverse on-farm conditions, such as deep learning models [73,74] and ensembles of models [75]. At the same time, studies are also needed to evaluate the potential of developing simpler models using fewer variables while retaining comparable performance for practical applications.

Conclusions
This study evaluated the potential of using machine learning methods to integrate easily available genetic, environmental, and management information with active canopy sensing data for corn N status prediction and diagnosis across 13 site-year environmental conditions. When genetic, environmental, and management information was used together with the RapidSCAN sensor data, corn AGB, PNC, PNU, and NNI could be reliably predicted (R 2 > 0.80) either using SVR or RFR models. It is concluded that incorporating genetic, environmental, and management information with crop sensing data will provide more reliable in-season corn N status prediction and diagnosis across diverse conditions than using canopy sensor data alone. The NDRE index was consistently the most important variable for predicting all four corn N status indicators. The preplant N rate was the second, third, fourth, and seventh important variable for predicting NNI, PNU, PNC, and AGB, respectively. Hybrid difference and soil texture information was more important for AGB prediction, while soil texture and weather variables were more important for PNC and NNI predictions. For PNU prediction, weather variables (CHU and SDI) were more important than other variables. Further studies are needed to evaluate the potential of multi-source data fusion strategies for on-farm, in-season N status diagnosis using large datasets from commercial farms and more advanced machine learning methods. Additional effort is needed to develop simpler models using less easily obtainable variables that have comparable performances for practical on-farm applications. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data can also be obtained from the Dryad repository: https://doi.org/10 .5061/dryad.66t1g1k2g (accessed on 11 December 2021).

Conflicts of Interest:
The authors declare no conflict of interest.