Evaluation of Soil Properties, Topographic Metrics, Plant Height, and Unmanned Aerial Vehicle Multispectral Imagery Using Machine Learning Methods to Estimate Canopy Nitrogen Weight in Corn

: Management of nitrogen (N) fertilizers is an important agricultural practice and ﬁeld of research to minimize environmental impacts and the cost of production. To apply N fertilizer at the right rate, time, and place depends on the crop type, desired yield, and ﬁeld conditions. The objective of this study is to use Unmanned Aerial Vehicle (UAV) multispectral imagery, vegetation indices (VI), crop height, ﬁeld topographic metrics, and soil properties to predict canopy nitrogen weight (g/m 2 ) of a corn ﬁeld in southwestern Ontario, Canada. Random Forests (RF) and support vector regression (SVR) models were evaluated for canopy nitrogen weight prediction from 29 variables. RF consistently had better performance than SVR, and the top-performing validation model was RF using 15 selected height, spectral, and topographic variables with an R 2 of 0.73 and Root Mean Square Error (RMSE) of 2.21 g/m 2 . Of the model’s 15 variables, crop height was the most important predictor, followed by 10 VIs, three MicaSense band reﬂectance mosaics (blue, red, and green), and topographic proﬁle curvature. The model information can be used to improve ﬁeld nitrogen prediction, leading to more effective and efﬁcient N fertilizer management.


Introduction
Agriculture is an important industry as the basis of food security, and as a significant aspect of the world economy. However, factors such as rapidly increasing global demand, fluctuations in production due to climate change, and a greater awareness of the negative environmental impact of agriculture on surrounding ecosystems, contribute to an increasing need for more efficient and sustainable farming practices. Especially in Canada, where agriculture is a significant industry, developing agricultural methods to be adaptable and resilient is necessary [1]. This is possible through precision agriculture (PA), a management technique that selectively applies crop farming resources such as fertilizer, water, pesticides, and herbicides based on the plant needs within a field [2][3][4].
One of the main fields of applications of precision agriculture is the management of nitrogen fertilizers [5,6]. Nitrogen is an essential macronutrient to plants, as a major constituent of organic material, enzymic processes, and oxidation-reduction reactions [7]. As such, nitrogen content in above-ground plant tissue is an important indicator of crop health and yield potentials. Several global studies demonstrate that the mean nitrogen recovery Remote Sens. 2021, 13, 3105 2 of 18 efficiency by annual crops was less than 50% of the amount of fertilizer applied [6,8]. Nitrogen is one of the most expensive nutrients to supply, and commercial fertilizers represent a major cost in plant production [9]. Rates of nitrogen fertilizer application depend on the crop type, desired yield, nitrogen present in the soil, and subsequently in the plants [7]. Excess nitrogen can reduce crop yield and can be leached from the soil, contaminating surface and groundwater, leading to harmful effects on human health and ecosystem consequences such as algal blooms and hypoxia in water bodies [10]. The United Nations Food and Agriculture Organization identifies classes of agricultural climate adaptation, one major class being management of field operation inputs including fertilizers [11]. As such, optimizing the management of nitrogen fertilizers is an important field of research as new methods and technology are developed to improve nutrient use efficiency, quality, and crop yield while minimizing significantly negative environmental impacts and cost of production.
Literature does include much research on crop canopy nitrogen retrieval using UAVs, but model parameters are often focused on spectroscopy with the use of vegetation indices and spectral remote sensors [12]. PA incorporates the use of many different types of spatial technologies such as geographic information systems (GIS), precision machinery, and remote sensing imagery to ground-based data collection [13]. In PA, remote sensing imagery is especially useful because it does not require physical or destructive contact with plants to gather valuable crop information. The spectral information provided by the imagery can be transformed into vegetation indices (VIs). VIs are mathematical combinations or transformations of spectral bands that have been widely used in agricultural research because they allow for deriving of specific plant properties such as chlorophyll or nutrient content by taking advantage of the differential spectral properties of plants in the visible and near-infrared wavelengths [14][15][16]. The resulting VI data can then provide timely information for monitoring field conditions and crop health, allowing for the optimal number of resources to be placed where they are needed, when they are needed.
In PA, crop monitoring has largely been conducted using optical satellites [17]. As demand for timely, accurate, and cost-effective data on the earth's surface increased in the last few decades, numerous satellite systems have been launched. Examples of optical satellites in operation include Landsat 8 (since 2013) and Sentinel-2 (since 2015), which have been used in studies on agriculture management [4]. The Landsat program, which began in 1972 with the launch of Landsat 1, is the longest-running program for satellite imagery of the earth [18]. Landsat 8 Operational Land Imager has nine spectral bands including visible, near-infrared, and shortwave infrared, with varying spatial resolutions of 15 to 30 m. Taking more than 700 scenes a day, it has a 16-day revisit time to the same area. Sentinel-2 has 13 spectral bands in the visible, near-infrared, and shortwave infrared with varying spatial resolutions of 10 m, 20 m, and 60 m [19]. With the constellation of twin satellites, the revisit cycle over an area is five days.
Limitations in optical satellite imagery include low spatial sensitivity, as the spatial resolution in the range of meters allows for analysis of larger-scale regional or national areas but is too coarse for small-scale crop fields. The temporal sensitivity can be rather low, as in the case of Landsat 8; within a 16-day revisit time, crops would have changed significantly and valuable information on the different stages of growth would not be obtained. In addition to factors such as geometric distortion, atmospheric distortion, and cloud cover obscuring view of the land, advanced processing expertise may be required to ensure sufficient image quality [20]. In comparison, UAV-based remote sensing can provide lower cost and higher spatial and temporal resolution data for crop management. Individuals with basic training can operate a UAV using programmed routes and collect images with <10 cm resolutions [21]. They can be flown to capture more frequent image data, including monitoring each significant stage of crop growth and offer flexibility in operation for times when weather is most suitable. This makes them ideal for field management conducted in a timely and accurate manner according to the needs of the crop [22]. Compared to satellites, overall UAV-based systems are often lower in cost for data collection and processing. As such, the use of UAV imagery in PA has become a research area of great interest due to its potential for larger environmental and economic impacts [4].
Corn was selected for this study because it is among the most grown crops in Ontario [17]. Recent studies have tested the use of linear regression, Random Forest (RF), and Support Vector Regression (SVR) models in UAV-based canopy nitrogen weight prediction models [23]. Although linear regression is a commonly used method to predict nitrogen, some VIs (e.g., NDVI) may saturate beyond the early growth crop stages and models may have reduced accuracy due to multicollinearity [24]. By contrast, machine learning-based regression methods such as RF and SVR have been found to produce more accurate models compared to classical linear regression methods, as they are unaffected by the assumptions of linear regression [24]. However, Lee et al. (2020) considered only UAV spectral information and canopy nitrogen weight prediction in their study. The nitrogen prediction may be improved if plant physiology, topographic metrics, and soil variables are included in the analysis, given that crop nitrogen highly depends on these variables [7].
To make well-informed fertilization management decisions, knowledge about the plant nutrient supply, health, and several environmental factors such as water availability, soil quality, and micro-topography of a field are key. The objectives of this study include, (i) studying the relationship between the spatial variation of canopy nitrogen weight and factors such as plant height, topographic metrics, soil chemical properties, and soil moisture conditions within a corn field in Southwestern Ontario using multispectral UAV-based imagery; (ii) determining the optimal combination(s) of spectral variable(s), crop plant physiology variables, and/or environmental conditions (soil, water, topographic data) for corn canopy nitrogen estimation and prediction; and (iii) evaluating the temporal variation of nitrogen estimation and prediction during early growth stages of corn using UAV images and select variables.

Study Area and Data Collection
The study site is in Central Elgin, Ontario, Canada ( Figure 1). Fieldwork was conducted during June-July 2020 with an average temperature of 25 • C and high humidity averaging 71%, typical of southwestern Ontario's humid continental climate zone. The study area is situated in a predominantly agricultural area, about 25 km south of London, Ontario's urban center. The corn was planted in early April, began sprouting in early June, and was harvested in late October once the plants were fully dried.
Data was collected from a corn field about 75 ha large, sown with cultivars "DKC48-56RI". Beginning in early June, a total of five sampling dates were collected with at least seven days between each visit as the crop reached different significant growth stages. Corn growth stages were recorded following the Biologische Bundesanstalt, Bundessortenamt und CHemische (BBCH) scale [25]. We acquired and used data during the corn crops' early growth periods (BBCH 0-49) in June for nitrogen estimation, which is especially important because fertilizers used during this time can have the greatest impact on the final quantity and quality of yield [7,26]. During July to August, corn crops reached middle to late growth stages (BBCH 50+) and application of nitrogen fertilizer is not recommended after plants begin tasselling [6]. The plant slows root nitrogen uptake, beginning to translocate nitrogen from vegetation to the grains, and excess fertilizer can leach from the field [27]. Corn plants reached full maturity in early September, and the crop was left to dry in the field before harvest in October. Before selecting sample points, a DJI Phantom 4 Real-Time Kinematics (RTK) UAV was flown over the bare soil of the field. With the UAV connected to an RTK global navigation satellite system (GNSS) base station, images have precise positioning metadata with 1 cm horizontal positioning accuracy and 1.5 cm vertical positioning accuracy. These images were used to create a bare earth digital elevation model (DEM). Using Google Earth, 40 sample points were selected in the field based on the DEM and the UAV imagery ( Figure 2). The sample points had to cover the variation of field topography sloping down from west to east. Considering corn row directions heading north-south, navigating along rows was more efficient compared to against rows. As well, the following factors were considered for sample point placement: the large dimensions of the field (1.2 km × 0.7 km), the intensive labour required, and the time-sensitive nature of in-situ data collection and processing. The sample points were placed in groups of ten spaced 60 m from one another, with a distance between groups at least 200 m apart to include a representative sample distribution of the field. A minimum distance of 50 m from roads was to reduce possible effects of transportation pollution. The sample points were exported from Google Earth to a KML file and downloaded onto mobile devices. During the first fieldwork date using the KML file for navigation, red flags were placed at the sample points for accurate positioning in the following weeks.
At each sampling point, fresh biomass samples were destructively collected by cutting the corn plant at the stem base above ground. The number of plants within a 1 m 2 block around the sample point were counted, and two plants were collected and placed in plastic bags. The average distance between rows was 80 cm. Following fieldwork collection on the same day, the fresh biomass was weighed in grams then fully dried in an oven at 60 °C for 48-72 h. Dry biomass was weighed and leaves at the top of the plant constituting the canopy layer were separated for A&L Canada Laboratories plant analysis using the Laboratory Equipment Company (LECO) FP628 nitrogen determinate combustion method [28]. The process involves grinding biomass leaves into a fine powder, which can be passed through a 1 mm sieve, and the combustion method obtains the leaf nitrogen content percentage. Before selecting sample points, a DJI Phantom 4 Real-Time Kinematics (RTK) UAV was flown over the bare soil of the field. With the UAV connected to an RTK global navigation satellite system (GNSS) base station, images have precise positioning metadata with 1 cm horizontal positioning accuracy and 1.5 cm vertical positioning accuracy. These images were used to create a bare earth digital elevation model (DEM). Using Google Earth, 40 sample points were selected in the field based on the DEM and the UAV imagery ( Figure 2). The sample points had to cover the variation of field topography sloping down from west to east. Considering corn row directions heading north-south, navigating along rows was more efficient compared to against rows. As well, the following factors were considered for sample point placement: The large dimensions of the field (1.2 km × 0.7 km), the intensive labour required, and the time-sensitive nature of in-situ data collection and processing. The sample points were placed in groups of ten spaced 60 m from one another, with a distance between groups at least 200 m apart to include a representative sample distribution of the field. A minimum distance of 50 m from roads was to reduce possible effects of transportation pollution. The sample points were exported from Google Earth to a KML file and downloaded onto mobile devices. During the first fieldwork date using the KML file for navigation, red flags were placed at the sample points for accurate positioning in the following weeks.
At each sampling point, fresh biomass samples were destructively collected by cutting the corn plant at the stem base above ground. The number of plants within a 1 m 2 block around the sample point were counted, and two plants were collected and placed in plastic bags. The average distance between rows was 80 cm. Following fieldwork collection on the same day, the fresh biomass was weighed in grams then fully dried in an oven at 60 • C for 48-72 h. Dry biomass was weighed and leaves at the top of the plant constituting the canopy layer were separated for A&L Canada Laboratories plant analysis using the Laboratory Equipment Company (LECO) FP628 nitrogen determinate combustion method [28]. The process involves grinding biomass leaves into a fine powder, which can be passed through a 1 mm sieve, and the combustion method obtains the leaf nitrogen content percentage. ments of soil moisture were collected using an ML3 ThetaProbe (Delta-T Devices Ltd., England) [29] and averaged. On the first fieldwork date of 8 June, soil samples in the 0-30 cm surface layer were collected and sent to A&L Canada Laboratories for Vittellus soil health analysis [30]. The test results include values of soil nitrate-nitrogen, mineralizable nitrogen, water extracted soil nitrate, water extracted total nitrogen, soil textural class, and A&L's Soil Health Index rating.

UAV Imagery
For this study, two types of UAVs were used: a Da-Jiang Innovations (DJI) (DJI, China) Matrice 100 and a DJI Phantom 4 Real-Time Kinematics (RTK) ( Figure 3). The DJI Phantom 4 RTK was released in 2018, designed for centimeter-accurate horizontal and vertical positioning in images using a complementary metal-oxide-semiconductor (CMOS) sensor [31]. Connected to an RTK global navigation satellite system (GNSS) base station, images with precise positioning metadata can be used to generate 3D point cloud datasets and digital elevation models (DEM). For this study, the DJI Phantom 4 RTK was flown at 30 m altitude as per manufacturer recommendations for optimal performance of the UAV's visual sensor system and RTK base station connection. The image resolution was 0.8 × 0.8 cm, and capture was set to 80% side and 80% front image overlap. Studies indicate that fine resolution (<10 cm) and high image overlap have higher success for mosaicking images together when crop canopy densifies through the season [21]. On every fieldwork date, at each sampling point within a 1 m 2 block, six measurements of plant height in centimeters were taken to calculate an average height. Detailed plant phenology was recorded to determine growth stages according to the Biologische Bundesanstalt, Bundessortenamt und CHemische (BBCH) scale at each sample point, as there can be variation in the field depending on growing conditions [28]. Six measurements of soil moisture were collected using an ML3 ThetaProbe (Delta-T Devices Ltd., England) [29] and averaged. On the first fieldwork date of 8 June, soil samples in the 0-30 cm surface layer were collected and sent to A&L Canada Laboratories for Vittellus soil health analysis [30]. The test results include values of soil nitrate-nitrogen, mineralizable nitrogen, water extracted soil nitrate, water extracted total nitrogen, soil textural class, and A&L's Soil Health Index rating.

UAV Imagery
For this study, two types of UAVs were used: a Da-Jiang Innovations (DJI) (DJI, China) Matrice 100 and a DJI Phantom 4 Real-Time Kinematics (RTK) (Figure 3). The DJI Phantom 4 RTK was released in 2018, designed for centimeter-accurate horizontal and vertical positioning in images using a complementary metal-oxide-semiconductor (CMOS) sensor [31]. Connected to an RTK global navigation satellite system (GNSS) base station, images with precise positioning metadata can be used to generate 3D point cloud datasets and digital elevation models (DEM). For this study, the DJI Phantom 4 RTK was flown at 30 m altitude as per manufacturer recommendations for optimal performance of the UAV's visual sensor system and RTK base station connection. The image resolution was 0.8 × 0.8 cm, and capture was set to 80% side and 80% front image overlap. Studies indicate that fine resolution (<10 cm) and high image overlap have higher success for mosaicking images together when crop canopy densifies through the season [21]. patterns and maintain GPS connection with the controller, the study field was divided into two flight plans. Unfortunately, images from one flight on 26 June were corrupted and only data from half of the field were usable. Images taken on and after 2 July could not be mosaicked in Pix4D due to the software limitations with recognizing and stitching the dense crop canopy at middle/later growth stages. Considering the nitrogen estimation for fertilizer management is most important in the early growth stages, data from 2 July onwards were omitted from the rest of this study.   First released in 2015, the DJI Matrice 100 is a model designed with a customizable aerial platform, ideal for research purposes of attaching small spectral sensors. Including batteries, it weighs 2431 g with a maximum take-off weight with a payload of 3600 g.
DJI Matrice 100 carried a MicaSense RedEdge (MicaSense Inc., Seattle, WA, USA) [32] narrowband multispectral camera. Multispectral imagery acquisition was aimed to be conducted on fieldwork dates before biomass collection. If the weather for the day was not ideal, flights were scheduled as soon as possible after that to maintain consistency with the plant physiology and field conditions ( Table 1). The field flight plan was made in the "Pix4Dcapture" app, part of the Pix4D software suite (Pix4D S.A., Prilly, Switzerland), to cover the whole field in a zigzag pattern (Pix4D Documentation 2020) [33]. Pix4Dcapture has the functions of adding custom UAV and sensor properties to calculate the flight plan's estimated total time, battery usage, and image resolution at selected altitudes. At the study field, wind and gust conditions >60 m altitude were often greater than the UAV's manufacturer-recommended limits. Flight altitude was set at 60 m and based on the MicaSense camera's specifications (image width, sensor width, and focal length) the resulting image resolution was 4 × 4 cm, suitable for the scale of crop spectral analysis [21]. Image capture was set to 80% side and 80% front overlap. To streamline UAV flight patterns and maintain GPS connection with the controller, the study field was divided into two flight plans. Unfortunately, images from one flight on 26 June were corrupted and only data from half of the field were usable. Images taken on and after 2 July could not be mosaicked in Pix4D due to the software limitations with recognizing and stitching the dense crop canopy at middle/later growth stages. Considering the nitrogen estimation for fertilizer management is most important in the early growth stages, data from 2 July onwards were omitted from the rest of this study.

UAV Image Processing
UAV images were processed according to the flowchart of Figure 4. DJI Phantom 4 RTK images, taken in April over the study site's bare soil, were inputted into Pix4Dmapper photogrammetry software to generate a continuous 3D point cloud dataset of the field. QGIS, an open-source geographic information software (GIS), was used to convert the point cloud dataset into a DEM in GeoTiff format [34]. The DEM enabled observation of topographic variation within the field, which can affect plant growth related to landscape shape, soil structure, and water flow [6]. Topographic metrics were computed from the DEM with the System for Automated Geoscientific Analysis (SAGA), free, opensource software for spatial data analysis [35]. Topographic metrics exported in GeoTiff format included slope, aspect, profile curvature, plan curvature, and two topographic wetness indices: (TWI) #1 using a Deterministic 8 algorithm, and TWI #2 using Multiple Flow Direction algorithm.

UAV Image Processing
UAV images were processed according to the flowchart of Figure 4. DJI Phantom 4 RTK images, taken in April over the study site's bare soil, were inputted into Pix4Dmapper photogrammetry software to generate a continuous 3D point cloud dataset of the field. QGIS, an open-source geographic information software (GIS), was used to convert the point cloud dataset into a DEM in GeoTiff format [34]. The DEM enabled observation of topographic variation within the field, which can affect plant growth related to landscape shape, soil structure, and water flow [6]. Topographic metrics were computed from the DEM with the System for Automated Geoscientific Analysis (SAGA), free, open-source software for spatial data analysis [35]. Topographic metrics exported in GeoTiff format included slope, aspect, profile curvature, plan curvature, and two topographic wetness indices: (TWI) #1 using a Deterministic 8 algorithm, and TWI #2 using Multiple Flow Direction algorithm. Multispectral images from the MicaSense camera were processed in Pix4Dmapper to create one orthomosaic image per band. Radiometric calibration of UAV images is important for the quality of image reflectance, taking into consideration the sensor influence and scene illumination. Before each flight, the MicaSense camera was positioned above a MicaSense Calibrated Reflectance Panel to acquire white reference images for each band. In Pix4Dmapper, the sensor settings, properties, and conditions can be obtained from the Exchangeable Image File Format (EXIF) metadata of the images. The white reference images and manufacturer-provided panel reflectance values were inputted in Pix4Dmapper processing options, enabling the software to calibrate and correct images' reflectance for each of the five bands. Pix4Dmapper then uses the Structure from Motion (SfM) algorithm to correct image perspectives to stitch images together [21]. In addition to UAV image geolocation, Pix4Dmapper processing options include georeferencing orthomosaics with ground control points (GCP) to improve the absolute location accuracy. Five GCPs were positioned around the corn field area using black and white checkered boards, with coordinates obtained from a Global Positioning System (GPS) connected with the RTK. The output includes an orthomosaic GeoTiff image file with reflectance values of the entire flight area for each Micasense band. Multispectral images from the MicaSense camera were processed in Pix4Dmapper to create one orthomosaic image per band. Radiometric calibration of UAV images is important for the quality of image reflectance, taking into consideration the sensor influence and scene illumination. Before each flight, the MicaSense camera was positioned above a MicaSense Calibrated Reflectance Panel to acquire white reference images for each band. In Pix4Dmapper, the sensor settings, properties, and conditions can be obtained from the Exchangeable Image File Format (EXIF) metadata of the images. The white reference images and manufacturer-provided panel reflectance values were inputted in Pix4Dmapper processing options, enabling the software to calibrate and correct images' reflectance for each of the five bands. Pix4Dmapper then uses the Structure from Motion (SfM) algorithm to correct image perspectives to stitch images together [21]. In addition to UAV image geolocation, Pix4Dmapper processing options include georeferencing orthomosaics with ground control points (GCP) to improve the absolute location accuracy. Five GCPs were positioned around the corn field area using black and white checkered boards, with coordinates obtained from a Global Positioning System (GPS) connected with the RTK. The output includes an orthomosaic GeoTiff image file with reflectance values of the entire flight area for each Micasense band.

Vegetation Indices
The orthomosaics for each of the five MicaSense bands were exported into ArcGIS to extract crop canopy reflectance values at the sample points. The MicaSense RedEdge camera bands include the following bands #: (1) blue, (2) green, (3) red, (4) red-edge, and (5) near-infrared (NIR) ( Table 2). The orthomosaics were used to compute 11 VIs (Table 3). VIs were already found to be suitable for estimating canopy nitrogen in crops, such as the normalized difference vegetation index (NDVI), green NDVI (GNDVI), and Double-Peak Canopy Nitrogen Index (DCNI) [36]. Some VIs are related to chlorophyll, which was found to be closely related to leaf nitrogen content as the photosynthetic enzyme, rubisco, comprises the largest proportion of nitrogen in leaves [7]. Chlorophyll absorbs more than 70% of blue and red radiation and reflects green and NIR radiation [37]. The VIs were computed with the Raster calculator function of PCI Geomatica Banff and exported as GeoTiff files. Using ArcGIS, the VI values were extracted at each sample point.

Canopy Nitrogen Weight Estimation
Canopy nitrogen is defined by calculating canopy nitrogen weight using the following method [49]: where CNW is the canopy nitrogen weight (g/m 2 ), N plants is the number of plants in the 1 m 2 area over the sampling point, Wd is the dry biomass weight (g/m 2 ), N biomass is the number of plants gathered for biomass at the sampling point, and LNC is the leaf nitrogen content (%). CNW assumes the plants collected for biomass within a 1 m 2 block around the sample point have the same amount of nitrogen. For corn plants, the leaves constitute a majority of dry biomass weight, hence the use of total biomass per area in the formula. Compared to other agronomic parameters, including plant nitrogen concentration (%), plant nitrate content, and Soil Plant Analysis Development (SPAD) readings, canopy nitrogen weight has been found to have greater correlation with spectral data [50].

Canopy Nitrogen Weight Modelling
The modelling approaches in this study include Random Forest (RF) regression and Support Vector Regression (SVR). RF is an ensemble learning method that can be used for classification or regression models of large, nonparametric datasets. The user defines a percentage of the dataset to be randomly selected as training data; 70% is commonly used. Using the training data, the algorithm generates many decision trees to determine the importance of variables in the regression. Decision trees split at nodes depending on the most contributing independent variable to the dependent variable. The remainder of the dataset not used in training is used as validation data, and the output average of the individual trees is used to evaluate the regression model's performance. Advantages of RF include the algorithm not overfitting from the training data, it is quick to compute, and it has relatively high performance in studies [23].
SVR is part of the Support Vector Machine (SVM) learning algorithm, which uses a decision boundary in a hyperplane to split training data into classes based on the data characteristics. The support vectors that are closest to training samples are used to determine the optimal position of the decision boundary using the midpoint of the margin. SVR performs modelling in a high-dimensional space using the hyperplane. For nonlinear data, SVR uses a kernel trick (i.e., Radial Basis Kernel), which places the data in a dimensional space to separate into groups using the radial distance between data points. Advantages of using SVR include its flexibility with nonparametric data, and it has been found to have better modelling capabilities compared to simple linear regression by capturing nonlinearity [50].
The modelling was written in R programming language using R Studio (Version 4.0.3), a free, open-source Integrated Development Environment (IDE) [51]. RF used the "random-Forest" package and SVR used the "e1071" package. For both models, the independent variables were the VIs, Micasense bands, plant physiology variables, topographic metrics, and soil metrics, and the dependent variable was the canopy nitrogen weight. For variable measurements at each sample point, the average of the variable values within a 1 m 2 block was used. Data from 8 June, 15 June, and 24 June were randomly divided into 70% training set and 30% validation set. Dates were selected based on the availability of the dataset including UAV imagery and in-situ ground measurements. The quality of the models was assessed using the coefficient of determination (R 2 ) and Root Mean Square Error (RMSE), calculated using Equations (2) and (3), respectively [52]: where y i is the observed value,ŷ i is the predicted value, and y is the mean of the observed values of the dataset; and: whereŷ i is the predicted canopy nitrogen weight value (g/m 2 ), y i is the observed canopy nitrogen weight value (g/m 2 ), n is the number of observations, and i is the index of summation in increments of one.

Nitrogen Statistics
Canopy nitrogen weight gradually increased in variation from 8 June to 9 July during the early to middle growth stages, then decreased slightly on 15 July ( Figure 5). As the crop developed into later growth stages, a decrease in canopy nitrogen results from the dilution effect, as discussed in [53,54]. An outlier is shown in Figure 5, but as it was consistent throughout the growing stages at the same sample point it is unlikely to be due to errors in measurements.
crop developed into later growth stages, a decrease in canopy nitrogen results from the dilution effect, as discussed in [53,54]. An outlier is shown in Figure 5, but as it was consistent throughout the growing stages at the same sample point it is unlikely to be due to errors in measurements.

Soil Statistics
The Vittellus soil health analysis [30] produced the following mean values: soil nitrate-nitrogen of 72.75 ppm, mineralizable nitrogen of 30.75 ppm, water extracted soil nitrate of 71.38 ppm, and water extracted total nitrogen of 88.75 ppm. The A&L's Soil Health Index rating was in the "Good-High" category, and the soil textural class for the field was predominantly silt loam.

Regression Models with All Parameters
First, all 29 parameters including VIs, MicaSense bands, plant physiology variables, soil metrics, and topographic metrics were used in calibrating the RF and SVR model (Table 4). Single date datasets and combinations of the multi-date dataset were tested to evaluate the temporal effect on the models. From all calibrated models, RF had better performance in comparison to SVR. The best performing RF model was obtained with a combination of all three dates, resulting in R 2 of 0.97 and RMSE of 0.71 g/m 2 . Other RF multidate combinations had high performance close to the best model. Of all the RF models, the lowest R 2 was at 0.92 with an RMSE of 0.20 g/m 2 for 15 June. For SVR, the 15 June model also had the lowest performance. Although SVR RMSE values were low overall but close to the RMSE of the RF models', single date models of 8 June and 15 June had low R 2

Soil Statistics
The Vittellus soil health analysis [30] produced the following mean values: soil nitratenitrogen of 72.75 ppm, mineralizable nitrogen of 30.75 ppm, water extracted soil nitrate of 71.38 ppm, and water extracted total nitrogen of 88.75 ppm. The A&L's Soil Health Index rating was in the "Good-High" category, and the soil textural class for the field was predominantly silt loam.

Regression Models with All Parameters
First, all 29 parameters including VIs, MicaSense bands, plant physiology variables, soil metrics, and topographic metrics were used in calibrating the RF and SVR model (Table 4). Single date datasets and combinations of the multi-date dataset were tested to evaluate the temporal effect on the models. From all calibrated models, RF had better performance in comparison to SVR. The best performing RF model was obtained with a combination of all three dates, resulting in R 2 of 0.97 and RMSE of 0.71 g/m 2 . Other RF multi-date combinations had high performance close to the best model. Of all the RF models, the lowest R 2 was at 0.92 with an RMSE of 0.20 g/m 2 for 15 June. For SVR, the 15 June model also had the lowest performance. Although SVR RMSE values were low overall but close to the RMSE of the RF models', single date models of 8 June and 15 June had low R 2 values at 0.73 and 0.48 respectively. SVR multi-date models had much better performance compared to single-date models, but RMSE values were higher than with the RF multi-date models. Next, the models were applied to the validation datasets (Table 5). RF with the combination of all three dates performed the best out of the models with R 2 of 0.75 and RMSE of 2.29 g/m 2 . Compared to the best model, there was only a small difference in the RF model of 8 June and 24 June with R 2 of 0.74 and RMSE of 2.48 g/m 2 . Single-date models for both RF and SVR had poor results overall. However, the small number of sample point data in single-date models may have resulted in the calibration model not encompassing normal variation in the field data. Overall, multi-date models had better performance compared to single-date models.

Variable Importance Plot
RF modelling can be visualized with a variable importance plot in R Studio using the "varImpPlot()" function ( Figure 6). The more important an explanatory variable is in the prediction of canopy nitrogen weight, the higher its IncNodePurity value is. Using the dataset containing all dates and the 29 variables used in the model, plant height was the most important predictor. The top ten variables were made by 7 from the 11 VIs used and 2 MicaSense band reflectance mosaics (blue and red). Among all the topographic metrics, the profile curvature was the top-performing variable. From the soil metrics, soil moisture was the top-performing variable.
RF modelling can be visualized with a variable importance plot in R Studio using the "varImpPlot()" function ( Figure 6). The more important an explanatory variable is in the prediction of canopy nitrogen weight, the higher its IncNodePurity value is. Using the dataset containing all dates and the 29 variables used in the model, plant height was the most important predictor. The top ten variables were made by 7 from the 11 VIs used and 2 MicaSense band reflectance mosaics (blue and red). Among all the topographic metrics, the profile curvature was the top-performing variable. From the soil metrics, soil moisture was the top-performing variable.  Table 1 for the full name of vegetation indices. N_Weight, plant nitrogen weight; Soil_Nitrate_N, soil nitrate nitrogen (NO3-N); Soil_Miner_N, soil mineralizable nitrogen; Soil_Total_N, water extracted total soil nitrogen; TWI_1, total wetness index #1; TWI_2, total wetness index #2.

Regression Models with Selected Variables
As the best performing model from the calibration and validation datasets was the combination of all three dates using RF, we tested numerous variable combinations with the data from all three dates. Based on Figure 6, we selected the top 6, 10, 15, 18, and 20 variables based on the evaluation of variable importance thresholds. In addition, we considered a separate group containing all spectral variables, because all the VIs and Mica-Sense band reflectance mosaics, except the red-edge band mosaic, were in the top 20 variables. Table 6 displays the statistics of RF and SVR applied to the various combinations of variables from calibration datasets. Overall, RF had better performance than SVR, but there was not a large difference in the R 2 or RMSE values. The best model was RF with the combination of top 20 variables, resulting in a R 2 value of 0.97 and a RMSE of 0.70 g/m 2 . However, in comparison to the best model, there were only small differences in R 2 and RMSE for the RF models.  Table 1 for the full name of vegetation indices. N_Weight, plant nitrogen weight; Soil_Nitrate_N, soil nitrate nitrogen (NO3-N); Soil_Miner_N, soil mineralizable nitrogen; Soil_Total_N, water extracted total soil nitrogen; TWI_1, total wetness index #1; TWI_2, total wetness index #2.

Regression Models with Selected Variables
As the best performing model from the calibration and validation datasets was the combination of all three dates using RF, we tested numerous variable combinations with the data from all three dates. Based on Figure 6, we selected the top 6, 10, 15, 18, and 20 variables based on the evaluation of variable importance thresholds. In addition, we considered a separate group containing all spectral variables, because all the VIs and MicaSense band reflectance mosaics, except the red-edge band mosaic, were in the top 20 variables. Table 6 displays the statistics of RF and SVR applied to the various combinations of variables from calibration datasets. Overall, RF had better performance than SVR, but there was not a large difference in the R 2 or RMSE values. The best model was RF with the combination of top 20 variables, resulting in a R 2 value of 0.97 and a RMSE of 0.70 g/m 2 . However, in comparison to the best model, there were only small differences in R 2 and RMSE for the RF models. The models were then applied to the validation datasets ( Table 7). The RF model using the top 15 variables had the best performance with an R 2 value of 0.73 and an RMSE of 2.21 g/m 2 . Compared to the top-performing model, the RF model using the top 18 variables had the same R 2 value with a slightly higher RMSE. Of the top 15 variables, only the plant height and the profile curvature were non-spectral variables. All RF models had higher R 2 values than SVM, but the RF model has RMSE values that were slightly higher as well. The RF variable importance plot allows identification of the variables that do not affect the model significantly. As found in the models with the top 20 variables, removing low importance variables from a model can improve the results.

Discussion
In this study, RF and SVM regression methods were used to predict canopy nitrogen weight of corn using UAV Micasense individual band reflectance, associated VIs, plant physiology variables, topographic metrics, and soil metrics. The variation of the in-situ canopy nitrogen weight measurements was very low in the earliest growth stage on 8 June and gradually increased until the latest sampling date of 15 July, with a marked decrease afterwards. The increase in canopy nitrogen variation during the early growth stages of BBCH 00-49 can be explained by the leaf growth and stem elongation because the crop biomass increases rapidly during that period. Then, as the plant reaches the BBCH 51 stage that corresponds to the inflorescence emergence and heading, the canopy nitrogen variation decreases because of the dilution effect [6].
The RF and SVR models were first calibrated with all the 29 variables using single and multi-date datasets. With the validation datasets, single-date models had overall poor performance. Combinations of multi-date models led to better results, with the best performance obtained with the RF model. In the variance importance plot of the best RF model, the plant height was the most important predictor out of all variables used. Freeman et al. [55] already found that plant height is a useful variable in identifying nitrogen uptake in corn. Precision agriculture studies have used crop height for phenology, biomass, and yield prediction successfully, and this crop height can be derived from UAV point cloud datasets [56]. Among all the individual MicaSense band reflectances, the reflectance of the red-edge band has the poorest performance. The red-edge region (680-800 nm) represents a sharp change in the canopy reflectance and can provide important details about phenology [47]. Our result agrees with Lee et al.'s [23] work that uses the same Micasense camera. Likely, the narrow 10 mm band range of the red-edge band of the Micasense camera did not capture the change in the region well. This could explain why our results are not in agreement with several other studies that find that the red-edge region is a sensitive indicator of leaf chlorophyll content, because of the high absorption in the red radiation and the high reflectance in the near-infrared region during plant growth stages [57,58]. Overall, most of the soil metrics had little to no effect on the models, but the soil was sampled once at the beginning of the growing season. With the consideration of costs and historical farm operations where recommendations for soil tests are only once a year, this study emphasizes the limitations of the current soil testing practices. Soil metrics results from Mulvaney et al.'s [59] and Tremblay et al.'s [60] studies on a soil-based approach in corn nitrogen management, found that soil tests were useful in their models when field conditions were conducive to soil nitrogen mineralization, crop uptake, and utilization. With different sampling methods, soil metrics may still be useful in models. There is therefore the need to conduct soil tests at different dates to better characterize the soil condition changes, for example, because of fertilizer applications, precipitation patterns, and crop growth [7].
The RF model's variable importance plot allowed selecting groups of top 6, 10, 15, 18, and 20 variables for developing new RF and SVM models. The top 20 variables included plant height, all the 11 VIs used in this study, all the MicaSense band reflectance mosaic but the red-edge one, the soil moisture, and the profile curvature, as well as the topographic wetness indices #1 and #2. The group of top 15 variables that performed best has only the plant height and profile curvature as non-spectral parameters. Considering that topographic metrics were derived from the UAV Phantom 4 RTK imagery along with the possibility of deriving crop height across the field from point cloud data, all variables in the best model can be measured from in-situ, non-destructive, UAV-based data collection [55]. Having all model data that can be collected by remote sensing could be a greater benefit, as common limitations of in-situ studies and subsequent application methods are the intensive labour and high costs required to obtain model input data.
In this study, the final validation of canopy nitrogen models with various combinations of variables indicated RF models had better performance than SVR in terms of R 2 values. This is consistent with results from Liu et al. [61], Zha et al. [62], and Lee et al. [23], with RF yielding better nitrogen content prediction in wheat, rice, and corn crops compared to SVR models. Although SVR had lower RMSE values in comparison to RF, overall RF RMSE values were low as well in the context of nitrogen estimation for g/m 2 . In comparison to the study by Lee et al. [23], the RMSE values of this study's models are much lower, which can be beneficial for fertilizer management recommendations to farmers in general.
In the case of this study, the RF algorithmic method of using many decision trees may better suit the use of numerous variables in regression models. In comparison, using many variables in SVR requires user hyper-tuning and the kernel trick function to separate data into groups, relying on the radial distance between points to be meaningful in the model. Overall, the performance of SVR models was good, but RF models can be considered more useful in terms of ease of use and the quality of results.

Conclusions
This study tested machine learning regression methods to predict corn canopy nitrogen weight using UAV Micasense band reflectance mosaics, associated VIs, plant physiology variables, topographic metrics, and soil metrics. With all 29 variables in RF and SVR models, the combination of all three dates with the RF model produced the best results: The validation model having an R 2 of 0.75 and an RMSE of 2.29 g/m 2 . From the multi-date RF model's variable importance plot, the top 6, 10, 15, 18, and 20 variables were tested in RF and SVR models. The best validation model was the RF model (R 2 value at 0.73 and RMSE at 2.21 g/m 2 ) with the top 15 variables, most of them being spectral variables.
We developed models for estimating canopy nitrogen weight from spectral, plant, soil, and topographic variables using machine learning algorithms, but the resulting models are still empirical, and their applicability can be limited to the dataset on which they were built and validated. This is a common limitation in agricultural research as in-situ measurements often require intensive labour, costs, and variable conditions. Overall, many factors need to be considered to define plant growth conditions such as plant species, soil condition, environmental factors of field topology, moisture supply, weather, and more. There is a need to test the developed models on other datasets to determine their efficacy and to understand their applicability in precision agriculture. Future work can consider using a more deterministic modelling approach, for example, the PROSAIL model [63], as it is less empirical and applies to a high variety of conditions but requires more advanced parameter calibration. The PROSAIL model uses spectral data of leaf and canopy level parameters to retrieve chlorophyll and nitrogen content, with robust results from lab and field studies [63]. Eventually, methods of crop height extraction from RTK UAVs [56], in addition to UAV-derived topographic and spectral variables, can be used to develop a final map for a whole field. The model information can be used to improve field nitrogen prediction, leading to more effective and efficient N fertilizer management.