A Novel Machine Learning Method for Estimating Biomass of Grass Swards Using a Photogrammetric Canopy Height Model , Images and Vegetation Indices Captured by a Drone

Silage is the main feed in milk and ruminant meat production in Northern Europe. Novel drone-based remote sensing technology could be utilized in many phases of silage production, but advanced methods of utilizing these data are still developing. Grass swards are harvested three times in season, and fertilizer is applied similarly three times—once for each harvest when aiming at maximum yields. Timely information of the yield is thus necessary several times in a season for making decisions on harvesting time and rate of fertilizer application. Our objective was to develop and assess a novel machine learning technique for the estimation of canopy height and biomass of grass swards utilizing multispectral photogrammetric camera data. Variation in the studied crop stand was generated using six different nitrogen fertilizer levels and four harvesting dates. The sward was a timothy-meadow fescue mixture dominated by timothy. We extracted various features from the remote sensing data by combining an ultra-high resolution photogrammetric canopy height model (CHM) with a pixel size of 1.0 cm and red, green, blue (RGB) and near-infrared range intensity values and different vegetation indices (VI) extracted from orthophoto mosaics. We compared the performance of multiple linear regression (MLR) and a Random Forest estimator (RF) with different combinations of the CHM, RGB and VI features. The best estimation results with both methods were obtained by combining CHM and VI features and all three feature classes (CHM, RGB and VI features). Both estimators provided equally accurate results. The Pearson correlation coefficients (PCC) and Root Mean Square Errors (RMSEs) of the estimations were at best 0.98 and 0.34 t/ha (12.70%), respectively, for the dry matter yield (DMY) and 0.98 and 1.22 t/ha (11.05%), respectively, for the fresh yield (FY) estimations. Our assessment of the sensitivity of the method with respect to different development stages and different amounts of biomass showed that the use of the machine learning technique that integrated multiple features improved the results in comparison to the simple linear regressions. These results were extremely promising, showing that the proposed multispectral photogrammetric approach can provide accurate biomass estimates of grass swards, and could be developed as a low-cost tool for practical farming applications.


Introduction
Silage is the main feed in ruminant meat and milk production in Northern Europe.In Finland, 23% of the cultivated area is used for silage production, which is approximately 513,500 ha.In the silage production, grass swards are harvested three times each season, and fertilizer is applied for each harvest.Timely information of the yield quality and quantity would be highly valuable for decision-making on harvesting time and rate of fertilizer application for the next harvest.In grass yield, the quantity increases rapidly in the spring growth.Simultaneously the quality decreases, especially in grass digestibility.Harvesting time optimization entails a balance between the highest possible yield quantity and an adequately high digestibility for feeding.
The grass biomass has a strong correlation with canopy heights [1][2][3].Therefore, the grass height is a fundamental parameter of interest when concerning precision management of grazing and silage harvesting.Accurate estimates of grass biophysical variables are important for monitoring vegetation growth and for analysing important physiological parameters during the grass growth cycle [4].In practical farming-particularly in rotational grazing management-physical measurements of grass height and biomass estimation are usually done by using devices such as the rising plate meter, capacitance meter and meter stick [5][6][7].However, these in situ physical measurements are laborious.Furthermore, it can be difficult to characterize the spatial variability due to vegetation growth characteristics by the physical sample collection, which limits their ability to provide robust estimates.Several factors cause variation in the grass swards.In particular, swards are composed of multiple species, each having different growing characteristics and variability in soil conditions and topography within the field [8].
Remote sensing methods, including digital imaging, photogrammetry, hyperspectral imaging, laser scanning and various sensor combinations can work as high-performance alternatives to physical measurement methods.Remote sensing offers a potential for rapid and automatic measurement of large areas with high spatial resolution.Several studies have investigated the use of remote sensing techniques to calculate plant heights for estimating crop parameters.Most of those have been using terrestrial laser scanning [9,10] and mobile laser scanning [8] due to the requirements of high spatial resolution.However, photogrammetric imaging using unmanned aircraft vehicles (UAV or a drone), structure from motion (SFM) techniques and dense image matching are becoming a very interesting tool to collect 3D information of objects due their low cost, efficiency and flexibility.These methods have already been investigated in several studies especially related to grass-to shrub transition zone [3], crops [11], winter barley [12,13], maize [14] and moss beds [15,16].Several studies have also utilized vegetation indices (VI) based on multispectral data [17][18][19][20] or hyperspectral data [21][22][23] to estimate the biomass and canopy height of crops.
Additionally, some studies have integrated drone-based 3D and spectral data to estimate crop parameters.Yue et al. [24] combined crop height information and spectral data from the Cubert UHD185 "Firefly" hyperspectral snapshot sensor (Cubert GmbH, Ulm, Germany) to estimate the biomass of winter wheat, and Bendig et al. [25] combined drone-based 3D data with ground-measured spectrometer data for biomass monitoring of barley.Many ground-based combinations of spectral and 3D data have also been utilized [26][27][28].However, only a few studies have integrated drone-based 3D and spectral data for the estimation of grass quality and quantity.Bareth et al. [29] studied the possibility to utilize drone-based canopy height models (CHMs) in the estimation of grass height and biomass using linear regression.Their results showed that the photogrammetric CHM gave accurate height estimates in the later phases of growth but were not accurate at the beginning of the growth when the canopy was still sparse.The VIs had the opposite performance; they provided good estimates in the early phases of growth but saturated at the later stages of growth with increasing plant height, which is also a known behaviour from earlier studies [30,31].Based on these findings, Bareth et al. [29] developed a Grassland Index (GrassI) which combines the advantages of the CHM and VIs based on RGB to provide accurate estimates for the entire growth season.Possoch et al. [32] utilized a low-cost RGB drone system to calculate the CHM and RGB VIs for estimating biomass for grassland.Their results indicated that the photogrammetric CHM gave the best results for the dry matter yield (DMY) when using linear regression models.
CHMs for the biomass estimation can be created in different ways [1,2].Pittman et al. [8] recommended measuring the digital terrain model (DTM) and digital surface model (DSM) using remote sensing technologies, such as ultrasonic or laser scanner.They studied the estimation of the biomass and canopy height of bermudagrass, alfalfa and the mix (which contained a mixture of both bermudagrass and alfalfa) using a ground-based mobile platform-a golf cart with ultrasonic, laser and spectral sensors.Their comparison of the performance of single-sensor and a combination of three sensors indicated that the use of multisensory systems improved the biomass estimation accuracy of grasslands.
Several studies have evaluated different regression techniques for biomass estimation.Marabel et al. [33] investigated biomass estimation of grasslands using field spectrometer data.They evaluated performance of the support vector machine (SVM) and Partial Least Squares Regression (PLSR).The most accurate model to predict the total biomass was obtained using the PLSR and spectral bands between 916-1120 nm and 1079-1297 nm.Yue et al. [34] compared eight different regression techniques for winter wheat biomass using near-surface spectroscopy.The results of the study showed that PLSR and multivariable linear regression were most suitable when high-accuracy and stable estimates are required from relatively few samples.In addition, Random Forest (RF) introduced by Breiman et al. [35] is highly robust against noise and is best suited to deal with repeated observations involving remote-sensing data that are usually affected by atmosphere, clouds, observation times and sensor noise [34].Additionally, RF's advantages over other methods, such as multiple linear regression (MLR) and Artificial Neural Network (ANN), are high prediction accuracy, feature selection is unnecessary and it is less sensitive to overfitting [36][37][38].RF has shown competitive accuracy compared to other methods estimating the biomass of forests [39][40][41] and agriculture [14,42,43].The majority of biomass estimation studies have utilized other estimators such as linear models and nearest neighbour approaches [3,41].Most of the drone-based biomass estimation studies have been carried out using linear models based on only a few features [18,23,25,28].An RF was used by Liu et al. [43] to estimate the level of rice seeds, by Li et al. [14] to estimate the biomass of maize, and by Turner et al. [16] to predict Antarctic moss health.However, no studies have performed an RF to determine the biomass of grasslands.
Our objective was to develop and assess a machine learning technique for the estimation of canopy height and biomass of grass swards based on a drone-based multispectral photogrammetric approach.In particular, our objective was to study the potential of ultra-high resolution canopy height models (CHMs) and vegetation indices (VIs) extracted from red, green and blue (RGB) and colour infrared (CIR) images.To generate high variation information to study swards, the Natural Resources Institute of Finland (LUKE) established in the Jokioinen test site an experiment using six different nitrogen fertilizer application rates and four harvesting dates.We first evaluated the feasibility of CHMs and VIs separately in the height and biomass estimation by using a simple linear regression technique.We then evaluated the performance of the combination of various height features and VIs in grass quantity estimation using machine learning techniques based on MLR and RF.

Study Area and Reference Measurements
The experiment was conducted at the LUKE research farm, which is located in the municipality of Jokioinen in southwest Finland (approximately 60 • 48 N, 23 • 30 E) (Figure 1).The experiment was established on a second-year silage production field.The grass sward was established in 2015 in spring barley as a companion crop with a timothy/meadow fescue (Phleum pratense and Festuca pratensis) seed mixture at a 25 kg ha −1 sowing rate (65% timothy and 35% meadow fescue on a weight basis).The row width was 12.5 cm.In 2016, the field was managed as a silage production sward.
A uniform and even site of the field was selected for the experiment.The soil type at the test site was clay, and the soil fertility values were as follows: pH 6.2, 377 mg K L −1 soil, 6.6 mg P L −1 soil, 1101 mg Mg L −1 soil, and 2580 mg Ca L −1 soil.The size of the experimental area was approximately 50 m by 20 m.The experimental setup was a split plot design with four replicates.The fertilizer treatment was in the 24 main plots (plot size 12 m × 3 m), and the harvesting time was in the sub-plot.The experiment had a total of 96 plots.Four replicates, 6 nitrogen fertilizer levels (0 kg/ha, 50 kg/ha, 75 kg/ha, 100 kg/ha, 125 kg/ha and 150 kg/ha), and four harvesting/measuring dates (6 June, 15 June, 19 June and 28 June) were used in the primary growth.The fertilizer application was carried out on 10 May 2017 by an experimental surface fertilizer broadcaster (tailor-made model) with a working width of 1.5 m.To maintain the sward free of weeds, a control spraying by Starane XL herbicide (active ingredients: 100 g L −1 fluroxypyr + 2.5 g L −1 florasulam) at the rate of 1.5 L ha −1 was carried out on 24 May by the farm scale sprayer Hardi twin stream 363 MA 1200 EEEC/5 15 HAL with a 12 m spraying boom (HARDI INTERNATIONAL A/S, Norre Alslev, Denmark).The borders between the main plots (fertilizer treatments) were regularly cut by a lawn mower to make the harvesting easy.The net size of the harvested and drone-measured plot was 1.5 m by approximately 2.6 m.After the harvest, the actual length of each harvested plot was measured, and the hectare yield was adjusted accordingly.were taken per plot, and the mean value of the three measurements was used.Height stick measurements were carried out according to Finnish guidelines for producing an estimate of biomass for a grass sward parcel [44].In that method, for each measurement a cluster of grass tillers and leaves is straightened up and the average height of that cluster is measured with a height stick.In this method, individual tillers that are higher than average are ignored in the measurement.We compared the plate meter and the height stick methods in the first dataset.We made five measurements in each fertilization level plots (12 m × 1.5 m) with the plate meter and calculated average height for each plot.The plate meter's bottom stick is placed on ground level but without penetrating the ground, and the plate part free falls down to the sward.While the plate part is going down, it compresses the sward down a few centimetres (Figure 2b).The plate meter works well when the sward is dense and has a height of around 20-30 cm.On the other hand, the plate meter struggles to work when the sward is high and has no flat top structure or when the grass sward is sparse and its growth is poor and non-uniform.In this study, sward heights were 60-70 cm during the last harvesting dates, and the plots without nitrogen input were sparse and had low height, which was not ideal for the plate meter.Additionally, previous studies have received worse correlations with biomass estimation with a plate meter than the height stick [8].The comparisons showed that the height stick provided slightly higher height values than the plate meter; the average difference was 1.46 cm and the RMSE was 1.82 cm.Harvesting was carried out by a Haldrup forage plot harvester (Model GR, HALDRUP GmbH, Ilshofen, Germany).The width of the cutting bar was 1.5 m.The stubble height was from 6 cm to 7 cm.The primary growth was harvested on four dates in June: 6 June was at the very early developmental stage for silage harvesting; 15 June was just prior and 19 June was very close to the targeted silage production developmental stage, and heading was just starting in the swards; and 28 June was clearly after the desired silage harvesting stage.The fresh yield (FY) was measured by the Haldrup forage plot harvester.However, on the first harvest date, due to operation failure in Haldrup scale, the plot yield was collected and measured by weight indoors.A sample was taken from each plot for dry matter yield (DMY) and quality analyses.On the first harvest date, the whole harvest was taken, and, on the later harvest dates, a 1 kg FY sample was taken from the harvest.The samples were chopped into 3-4 cm long pieces by a Wintersteiger (Model Hege 44, Wintersteiger AG, Ried, Austria) sample chopper, and the DMY was determined after 17 h drying at 100 • C in forced air drying ovens.

Remote Sensing Data Acquisition
On the day before each harvest, the reference canopy heights (H ref ) were measured with a height stick and with a height plate on the first date.
Idea of the experimental setup was to generate great variation of DMY, FY and nitrogen amount in the study sward.General guidelines for nitrogen fertilizer application rate is 100 kg/ha for the primary growth in clay soil for commercial grass silage production, and harvesting is targeted at a D-value (organic matter digestibility in dry matter) of around 690 g kg −1 which occurred on this field in Jokioinen around 19 June in 2017.
The start of the growing season was late in 2017 (5 May), and the early summer was cool.The mean monthly temperature was 8.9 • C in May and 12.9 The sward with zero nitrogen application was sparse and weak particularly on the first observation dates.The highest nitrogen application rates-125 and 150 kg ha −1 -produced a dense sward in mid-June which was susceptible to lodging, and this affected the growth of the stand.In addition, the highest nitrogen application rates seemed to increase the share of meadow fescue in the sward, particularly on the latest harvesting dates.Otherwise, the swards were predominantly of timothy.

Reference Field Data and Biomass Sampling
Devices such as the rising plate meter, capacitance meter, and meter stick are examples of devices used for physical measurements of vegetation height and biomass estimation [5,6].We used the height stick to measure the H ref of the grassplots on all the dates (Figure 2a).Three measurements were taken per plot, and the mean value of the three measurements was used.Height stick measurements were carried out according to Finnish guidelines for producing an estimate of biomass for a grass sward parcel [44].In that method, for each measurement a cluster of grass tillers and leaves is straightened up and the average height of that cluster is measured with a height stick.In this method, individual tillers that are higher than average are ignored in the measurement.
measurements were carried out according to Finnish guidelines for producing an estimate of biomass for a grass sward parcel [44].In that method, for each measurement a cluster of grass tillers and leaves is straightened up and the average height of that cluster is measured with a height stick.In this method, individual tillers that are higher than average are ignored in the measurement.
We compared the plate meter and the height stick methods in the first dataset.We made five measurements in each fertilization level plots (12 m × 1.5 m) with the plate meter and calculated average height for each plot.The plate meter's bottom stick is placed on ground level but without penetrating the ground, and the plate part free falls down to the sward.While the plate part is going down, it compresses the sward down a few centimetres (Figure 2b).The plate meter works well when the sward is dense and has a height of around 20-30 cm.On the other hand, the plate meter struggles to work when the sward is high and has no flat top structure or when the grass sward is sparse and its growth is poor and non-uniform.In this study, sward heights were 60-70 cm during the last harvesting dates, and the plots without nitrogen input were sparse and had low height, which was not ideal for the plate meter.Additionally, previous studies have received worse correlations with biomass estimation with a plate meter than the height stick [8].The comparisons showed that the height stick provided slightly higher height values than the plate meter; the average difference was 1.46 cm and the RMSE was 1.82 cm.

Figure 1.
Orthophoto mosaic from the Jokioinen test site from 15 June.In the first replicate (the leftmost column), the fertilizer rates were from 0 to 150 kg/ha (indicated with N0 to N150) in order from top to bottom in this photo.Nitrogen fertilizer application rates were randomized in Replicates 1-4 (in Columns 2-4) as well as the location of the harvesting date for reference harvests.We compared the plate meter and the height stick methods in the first dataset.We made five measurements in each fertilization level plots (12 m × 1.5 m) with the plate meter and calculated average height for each plot.The plate meter's bottom stick is placed on ground level but without penetrating the ground, and the plate part free falls down to the sward.While the plate part is going down, it compresses the sward down a few centimetres (Figure 2b).The plate meter works well when the sward is dense and has a height of around 20-30 cm.On the other hand, the plate meter struggles to work when the sward is high and has no flat top structure or when the grass sward is sparse and its growth is poor and non-uniform.In this study, sward heights were 60-70 cm during the last harvesting dates, and the plots without nitrogen input were sparse and had low height, which was not ideal for the plate meter.Additionally, previous studies have received worse correlations with biomass estimation with a plate meter than the height stick [8].The comparisons showed that the height stick provided slightly higher height values than the plate meter; the average difference was 1.46 cm and the RMSE was 1.82 cm.

Remote Sensing Data Acquisition
The Finnish Geospatial Research Institute's (FGI's) drone, a remotely piloted aircraft system (RPAS), was utilized for collecting the remote sensing datasets.The frame of the FGI drone was the Gryphon Dynamics quadcopter with detachable arms, and it was equipped with Pixhawk autopilot (Computer Vision and Geometry Lab, Zurich, Switzerland) with ArduPilot APM Copter (Version 3.4, Open-source, Raleigh, NC, USA) firmware [45].Endurance of the drone is approximately 25 min with a maximum payload of 2.5 kg.The drone was equipped with a positioning system consisting of an NV08C-CSM L1 Global Navigation Satellite System (GNSS) receiver (NVS Navigation Technologies Ltd., Montlingen, Switzerland), a Vectornav VN-200 IMU (VectorNav Technologies, Dallas, TX, USA) and a Raspberry Pi single-board computer (Raspberry Pi Foundation, Cambridge, United Kingdom).The drone was carrying an RGB digital camera, a Sony A7R (Sony Corporation, Minato, Tokyo, Japan) equipped with a Sony FE 35 mm f/2.8 ZA Carl Zeiss Sonnar T* lens (Sony Corporation, Minato, Tokyo, Japan).Sony A7R has a 35.90 mm by 24.00 mm complementary metal-oxide semiconductor (CMOS) sensor with 36.4 megapixels.The size of raw images is 7360 pixels × 4910 pixels.The camera is triggered to capture images in two-second intervals, and a GNSS receiver is used to record the exact time of each triggering pulse.Furthermore, we calculated Post Processed Kinematic (PPK) GNSS positions for each camera using National Land Survey of Finland (NLS) RINEX service, which offers observation data from FinnRef stations [46], in RTKlib (RTKlib version 2.4.2,Open-source, Raleigh, NC, USA) software rtkpost tool [47].A hyperspectral camera based on a tuneable Fabry Pérot interferometer (FPI) operating in the visible to near-infrared spectral range (500-900 nm) (VTT Technical Research Centre of Finland Ltd, Espoo, Finland) [22] was used to collect the spectral data cubes for each flight.The FPI camera is a lightweight, frame format hyperspectral imager operating in the time-sequential principle collecting spectral bands with 648 by 1024 pixels.In this study, we used it in a multispectral mode to provide multispectral bands in red (central wavelength L0 = 669.0nm; full width at half maximum (FWHM) of 27.0 nm) and the near infrared (NIR; L0 = 804.1 nm, FWHM: 28.3 nm) spectral range.
The flight parameters and conditions are introduced in Table 1.We used flying heights of 30 m and 50 m and a flying speed of 2 m/s.The ground sampling distances (GSD) were 3.9 mm and 6.4 mm for the RGB images and 30 mm and 50 mm for the FPI images with the flying heights of 30 m and 50 m, respectively.These settings resulted in 84-87% and 65-81% forward and side overlaps, respectively, for the RGB and FPI images, which are suitable for the photogrammetric processing of these scenes.In the resulting photogrammetric blocks, the area of interest was captured in the block J2_F1 in more than six images and in other blocks in more than nine images.The reason for the lower numbers of overlapping images in the block J2_F1 was triggering problems of the RGB-camera, however, as the overlaps were appropriate, we decided to use this data.
Five permanent ground reference points were targeted in the corners and centre of the block to be used as the ground control points (GCPs) and checkpoints (CP).The targets were black-painted plywood boards of size 0.5 m by 0.5 m with a white painted circle with a diameter of 0.3 m and they were mounted on wooden pillars.The reference points were measured with the Trimble R10 RTK DGNSS (Trimble Inc., Sunnyvale, CA, USA) with an accuracy within 0.03 m horizontally and 0.04 m vertically [48,49].Additionally, three reflectance panels with nominal reflectance of 0.03, 0.09 and 0.50 were installed in the area to enable transformation of the image digital number values to reflectance.

Remote Sensing Data Processing
Agisoft Photoscan Professional (version 1.3.4,Agisoft, St. Petersburg, Russia) software was used for the photogrammetric processing [50].We followed similar processing workflow in Photoscan as introduced in previous studies by several authors [3,[51][52][53].In the first stage, Photoscan uses SFM to determine the interior (IOP) and exterior orientation parameters (EOP) for each image and to calculate a sparse point cloud.We used the self-calibrating option and included the focal length, principal point coordinates, and radial and tangential lens distortions.In the orientation processing, the high quality setting was selected with 40,000 key points and 4000 tie points per image.In addition, we used PPK processed GNSS coordinates, for each image to preselect image pairs for the orientation process.The reference points (GCPs) were measured on the images manually; each of the GCPs were measured in ten or more images.We used the accuracy settings of ±0.005 m for the GCPs and ±5 m for the camera positions coordinates of the images.All IOPs, EOPs and point coordinates were optimized using "optimize camera alignment" tool in Photoscan [54].After the optimization, an automatic outlier removal was performed using the gradual selection tools of the software based on the re-projection error and reconstruction uncertainty.Additionally, some points were manually removed from the sparse point cloud, particularly points underground and up in the air.Overall, approximately 10% of the worst points were removed during the gradual selection and manual point removing for each dataset.Then the final optimization of the sparse point cloud, IOPs and EOPs was carried out.Next, dense point cloud generation was carried out using the high quality parameter and mild depth filtering.According to our testings and previous studies [3,[51][52][53], these parameters are suitable for flat areas such as grass fields to provide accurate results.The coordinate system in the processing was the ETRS89-TM35FIN.
Even though all datasets were processed using the same parameters in Photoscan, small differences in the flying height and overlaps between images resulted in slightly different processing results (Table 2).In particular, the lower flying height resulted in a smaller GSD and a higher point density.Additionally, the re-projection errors were smaller for the 30 m flights (0.534-0.589) than for the 50 m flights (0.783-1.25).Root Mean Square Errors (RMSEs) of block adjustments were calculated using a leave-one-out cross-validation (LOOCV) method.The LOOCV was carried out by performing the block adjustment five times by using four reference points as GCPs and one reference point as an independent CP.The error between the adjusted coordinate and the reference coordinate of the CP was calculated in each adjustment and finally the LOOCV RMSE was calculated using errors of each CP [55].The RMSEs were 0.5-2.4cm in the X-and Y-coordinates, and 1.0-4.8cm in height.These results indicated that the block adjustments were of good accuracy and the blocks were not deformed.The flights from a 30 m flying height provided slightly better 3D RMSE (2.7-2.9 cm) than the 50 m flying height (2.8-5.0 cm).The RGB orthomosaics were calculated with a GSD of 1.0 cm in the Photoscan using the orthomosaic blending mode.The orthomosaics of the FPI red and NIR bands were calculated with a GSD of 5 cm using FGI in-house C++ software [22].In this case, the orthomosaics were created using the most nadir image parts, and no blending was performed.The orthomosaic DNs were normalized to reflectance values using the empirical line method [56] using the three reflectance panels.We used exponential function for the RGB dataset and linear function for the FPI dataset.
We used both automatic and manual approaches to create the DTM in the Photoscan.The DTM auto was generated using Photoscan's automatic ground point classification procedure.In this procedure, the dense cloud was first divided into cells of a certain size, and the lowest point of each cell was detected.The first approximation of the DTM was calculated using these points.After that, all points of the dense cloud were checked, and a new point was added to the ground class if the point was within a selected distance from the terrain model and if the angle between approximation of the DTM and the line to connect the new point on it were less than the selected angle [3,52,54].Based on our preliminary testing we selected starting parameters for automatic classification of ground points and iteratively selected the most suitable parameters for the ecosystem of this study by visually comparing the classification results to the RGB orthomosaics.For all the datasets, a cell size of 3 m, a maximum angle of 0.5 degree and a maximum distance of 2.0 cm were selected.These parameters are slightly different from the parameters selected by other authors in different ecosystems, for example, Cunliffe et al. [3] used for grass-dominated-shrubs ecosystems the cell-size of 3 m, maximum angle of 3 • and maximum distance of 5.0 cm, and Méndez-Barroso et al. [57] used for classifying ground points in medium density forest sites the cell-size of 10 m, maximum angle of 3 • and maximum distance of 10 cm.The DTM manual was generated using Photoscan's free-form-selection tool to manually select and classify ground points.The classified ground points were used to interpolate DTM for the whole area.The DSM, DTM manual and DTM auto were exported as TIFF images with a 1.0 cm resolution.Finally, we calculated the CHM for both DTM manual and DTM auto by subtracting DTM from DSM for each dataset, using QGIS (version 2.18.14, Open-source, Raleigh, NC, USA) software.

Height Features
We created shapefiles for each plot using a margin of 0.25 m to the plot border to exclude possible border effects.Then, using the CHM and the border shapefile, we calculated the height features, including average height (H mean ), median height (H median ), minimum height (H min ), maximum height (H max ), height standard deviation (H std ), height 50% percentile (H p50 ), height 70% percentile (H p70 ), height 80% percentile (H p80 ) and height 90% percentile (H p90 ) (Table 3), for each plot using Matlab (version 2016b, MathWorks, Natick, MA, USA) software.Table 3. Definitions and formulas of CHM metrics in this study.h i is the height of the ith height value, N is the total number of height values in the plot, Z is the value from the standard normal distribution for the desired percentile (1.282, the 90th percentile) and σ is the standard deviation of the variable.

Index Name Equation
Mean height H mean 1

. Vegetation Indices
We calculated the VIs from the orthomosaics (and, in some cases, also utilizing the CHM) using QGIS (version 2.18.14, Open-source, Raleigh, NC, USA) software.The polygonal shapefile of each plot was used to extract digital numbers (DN) from the red, green, blue and NIR bands and the CHM.Then mean values for each plot were calculated by "zonal statistics" implementation in QGIS.The mean values were used as input values in the VI equations shown in Table 4.
We also introduced a new VI for grass fields, the ExG + CHM.The idea of the index is similar to the GrassI-Index, which aims to compensate for the weaknesses of CHM and VIs at different growth stages of the grass [29].The difference is that we used the ExG, while the GrassI is based on the RGBVI [25].The ExG was introduced by Woebbecke et al. [58], and it is commonly used for vegetation greenness identification.It has been widely used in different studies, such as maize biomass estimation [14] and vegetation fraction mapping for wheat [59].Pearson & Miller [64] MSAVI Modified Soil Adjusted Vegetation Index

Estimation Techniques
Estimation and validation were done using Weka software (version 3.8.1,University of Waikato, Hamilton, New Zealand).Multiple linear regression (MLR) and Random Forest (RF) were used as estimators.The validation of the estimators' performance was done using the LOOCV.In this case, the LOOCV was implemented so that one of the reference measurements was used as an independent reference measurement, and the estimator was trained with other reference measurements.The training and error calculation was repeated for each reference measurement and, afterwards, the statistics were calculated.
The MLR is a regression technique which models the relationship between two or more independent variables (which can be continuous or categorical) and a response variable by fitting a linear equation to observed data.The regression equation is used to calculate the parameters by using the least squares method, in which the sum of the squared errors is minimized.The model selection was performed using backward elimination, where the attribute with the smallest standardized coefficient was removed until no improvements were observed in the Akaike information criterion [67].In Weka, this attribute selection method is called "M5".
RF regression, introduced by Breiman [35], is a data analysis and statistical method that is widely used in machine learning and remote sensing research [37].It is an ensemble learning method for classification, regression and other tasks, which operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or the mean prediction (regression) of the individual trees.RF has high accuracy, good tolerance to outliers, and parameter selection.Furthermore, feature selection is unnecessary, and calculations include measures of the feature in the order of importance.This makes use of the full spectral information and combination of multiple variables such as CHM and spectral information.The default parameters of Weka implementations were used, except the number of decision trees to be generated was set to 500, as suggested by Belgiu and Drăguţ [37], instead of default number of 100.
We created seven different feature combinations for the MLR and RF to demonstrate different sensor setups:

Precision Evaluation
The estimation accuracy was quantified using Pearson's Correlation Coefficients (PCC), Root Mean Square Error (RMSE) and Normalized Root Mean Square Error (NRMSE) (Table 5).The prediction RMSEs were calculated using the LOOCV method.
Table 5. Definitions and formulas of precision evaluation metrics used in this study.x is the estimated value, x is the average of x, y is the reference value, y is the average of reference values, and n is the number of samples.

Mosaics and CHMs
The RGB mosaics and the CHMs of each measurement date are presented in Figure 3; the different fertilizer levels and harvested plots for each date are presented in Figure 1.The different qualities of the sward could be visually observed in the orthophotos.Generally, the higher the nitrogen fertilizer level of the plots was, the greener the grass was on all dates.All plots were greenest in the last date (28 June) and, in the middle of growing season (15 June and 19 June), the grass was overall greener than in the first date (6 June).On the last date (28 June), all of the plots with nitrogen fertilizer levels of 125 and 150 kg/ha and half of the plots with a fertilizer level of 100 kg/ha had vigorous stands, and lodging could be observed (Figure 3g,h).The plots without nitrogen input had less growth on each date and were still undergrown during the last harvesting.The plots with nitrogen fertilizer levels of 75 kg/ha and 100 kg/ha appeared visually to have the best quality.The CHMs show that, as the nitrogen level of the plot increases, the height of the grass increases (Figure 3).had vigorous stands, and lodging could be observed (Figure 3g,h).The plots without nitrogen input had less growth on each date and were still undergrown during the last harvesting.The plots with nitrogen fertilizer levels of 75 kg/ha and 100 kg/ha appeared visually to have the best quality.The CHMs show that, as the nitrogen level of the plot increases, the height of the grass increases (Figure 3).

Comparison of Manual and Automatic DTM Extraction Methods
We used the manual DTM from the first flight date captured from the flight altitude of 30 m (J1_F2) as the reference to the manual and automatic DTMs from other dates.The RMSEs were small, 1.05-3.28cm, excluding the second date's automatic DTM with the RMSE of 6.80 cm (Table 6).Other than that different DTM extraction methods resulted in very similar outputs.
Linear regressions of the CHMs based on the manual and automatic DTM to the DMY and FY and the physical height measurements (Href) indicated that both methods provided good correlations.However, the manual DTM provided better PCCs of 0.92-0.94than the automatic DTM that had PCC of 0.88-0.92(Table 7).For all datasets, the highest PCC of 0.92-0.94 between the CHM and physical measurements were obtained using the manual DTM from the first date's dataset with a 30 m flying height.The poorest PCC of 0.83-0.87were obtained with the DTM from the first date with a 50 m flying height.

Comparison of Manual and Automatic DTM Extraction Methods
We used the manual DTM from the first flight date captured from the flight altitude of 30 m (J1_F2) as the reference to the manual and automatic DTMs from other dates.The RMSEs were small, 1.05-3.28cm, excluding the second date's automatic DTM with the RMSE of 6.80 cm (Table 6).Other than that different DTM extraction methods resulted in very similar outputs.
Linear regressions of the CHMs based on the manual and automatic DTM to the DMY and FY and the physical height measurements (H ref ) indicated that both methods provided good correlations.However, the manual DTM provided better PCCs of 0.92-0.94than the automatic DTM that had PCC of 0.88-0.92(Table 7).For all datasets, the highest PCC of 0.92-0.94 between the CHM and physical measurements were obtained using the manual DTM from the first date's dataset with a 30 m flying height.The poorest PCC of 0.83-0.87were obtained with the DTM from the first date with a 50 m flying height.

Regressions Using Individual Features
When combining all the measurement data into a single regression (Figure 4 and Appendix A, Table A4), the best performing features were the indices integrating spectral and CHM features: GrassI max for the DMY (PCC: 0.94); ExG + CHM and GrassI VIs for the FY (PCC: 0.93); and GrassI VIs for the H ref (PCC: 0.96).When concerning the VIs without CHM features, the best performing feature was the MSAVI, giving PCCs of 0.89 for the DMY and 0.92 for the FY; the ExG performed the best for the H ref (PCC: 0.86).Overall, the performance of the VIs utilizing the NIR spectral band were better than the VIs based only on RGB spectral bands.The features based on the CHMs performed better overall than the VIs based only on spectral data; the H p90 , H p80 and H max were the best CHM features with PCCs of 0.91-0.94.
In the following sections, a more detailed analysis of the CHM and VI features is performed with respect to different measurement dates and nitrogen fertilization levels.
the Href (PCC: 0.86).Overall, the performance of the VIs utilizing the NIR spectral band were better than the VIs based only on RGB spectral bands.The features based on the CHMs performed better overall than the VIs based only on spectral data; the Hp90, Hp80 and Hmax were the best CHM features with PCCs of 0.91-0.94.
In the following sections, a more detailed analysis of the CHM and VI features is performed with respect to different measurement dates and nitrogen fertilization levels.

CHM Features
We studied the regressions between the in situ physical reference measurements, including the Href, DMY and FY and the height features Hp90 and Hmax taken from the CHMs (Figure 5 and Table 8).We performed the analysis in different phases of the growing season and with different nitrogen fertilizer levels.The Hp90 and Hmax provided similar correlations to the physical measurements, even though the Hp90 performed slightly better (Table 8 and Appendix A, Table A4).We also presented

CHM Features
We studied the regressions between the in situ physical reference measurements, including the H ref , DMY and FY and the height features H p90 and H max taken from the CHMs (Figure 5 and Table 8).We performed the analysis in different phases of the growing season and with different nitrogen fertilizer levels.The H p90 and H max provided similar correlations to the physical measurements, even though the H p90 performed slightly better (Table 8 and Appendix A, Table A4).We also presented the results of H mean in Table 8, but we did not analyse it further because it had generally poorer performance than the H p90 and the H max .The following analysis is based on the H p90 unless otherwise mentioned.
The overall view of the dataset shows that the photogrammetric CHM provided lower height values than the field measurements by the height stick (H ref ) (Figure 5).The underestimation of CHMs to H ref was largest on the last date; the RMSE was 16.6 cm for the H p90 and 12.2 cm for the H max .On other days, the underestimations (RMSE) varied from 11.8 cm to 13.3 cm for the H p90 and from 6.4 to 7.8 cm for the H max .
the results of Hmean in Table 8, but we did not analyse it further because it had generally poorer performance than the Hp90 and the Hmax.The following analysis is based on the Hp90 unless otherwise mentioned.
The overall view of the dataset shows that the photogrammetric CHM provided lower height values than the field measurements by the height stick (Href) (Figure 5).The underestimation of CHMs to Href was largest on the last date; the RMSE was 16.6 cm for the Hp90 and 12.2 cm for the Hmax.On other days, the underestimations (RMSE) varied from 11.8 cm to 13.3 cm for the Hp90 and from 6.4 to 7.8 cm for the Hmax.
An analysis of the PCCs of regressions with Href on different measurement dates showed that the earliest and latest dates displayed the poorest results (Table 8).The first measurement date provided the worst PCC of 0.78.There was a low correlation because during the first assessment the grass growth had just started and grass was sparse and nonhomogeneous.Therefore, the grass did not provide a homogenous canopy, which reduced the accuracy of the CHM-based estimates.During the growing period in the second and third measurement dates, the PCCs were 0.96.In the last measurement date, when the grass was already overgrown, the PCC was 0.88.When combining all the measurement dates into a single simple regression, the PCC was 0.94 (Figure 4 and Appendix A, Table A4).The regressions of the DMY and FY measurements with respect to the CHM features were consistent with the results of the Href.The best PCCs were obtained on the second and third measurement dates (15 and 19 June) when the canopy was homogeneous; for example, the PCCs were 0.96-0.97for the DMY and 0.95 for the FY.On the first and last dates, the PCCs were 0.85 for the DMY and 0.85 and 0.75, respectively, for the FY.
When comparing regressions with different nitrogen fertilizer levels, the fertilizer level of 0 kg/ha provided the worst correlations: the PCC was 0.68 with the Href, 0.74 with the DMY and 0.64 with the FY (Table 8).In addition, the nitrogen fertilizer level of 150 kg/ha performed poorly in the regressions.The fertilizer levels of 75 and 100 kg/ha were the closest to the ideal values.For these, the PCC was 0.96-0.98 in the regressions with the DMY and the FY, and 0.96-0.99 in the regressions with the Href.The poor sward volume and density with the 0-nitrogen fertilizer levels and the vigorous growth resulting in lodging with the nitrogen fertilizer levels of 150 kg/ha were the major reasons for the poorer results with these plots.
We also calculated regressions between the DMY and FY and the Href measured by the height stick to compare the photogrammetric and ground based measurements (Table 8).The PCCs were better with the CHM-features at the three first growth stages of the silage sward studied, whereas, in the last measurement date with overgrown and lodging grass, the Href outperformed the CHM features.In the analysis with respect to the nitrogen fertilizer level, the CHM features and the Href provided similar results with the fertilizer levels of 0-125 kg/ha; in the largest fertilizer level of 150 kg/ha the Href outperformed the CHM.8).The first measurement date provided the worst PCC of 0.78.There was a low correlation because during the first assessment the grass growth had just started and grass was sparse and nonhomogeneous.Therefore, the grass did not provide a homogenous canopy, which reduced the accuracy of the CHM-based estimates.During the growing period in the second and third measurement dates, the PCCs were 0.96.In the last measurement date, when the grass was already overgrown, the PCC was 0.88.When combining all the measurement dates into a single simple regression, the PCC was 0.94 (Figure 4 and Appendix A, Table A4).The regressions of the DMY and FY measurements with respect to the CHM features were consistent with the results of the H ref .The best PCCs were obtained on the second and third measurement dates (15 and 19 June) when the canopy was homogeneous; for example, the PCCs were 0.96-0.97for the DMY and 0.95 for the FY.On the first and last dates, the PCCs were 0.85 for the DMY and 0.85 and 0.75, respectively, for the FY.
When comparing regressions with different nitrogen fertilizer levels, the fertilizer level of 0 kg/ha provided the worst correlations: the PCC was 0.68 with the H ref , 0.74 with the DMY and 0.64 with the FY (Table 8).In addition, the nitrogen fertilizer level of 150 kg/ha performed poorly in the regressions.The fertilizer levels of 75 and 100 kg/ha were the closest to the ideal values.For these, the PCC was 0.96-0.98 in the regressions with the DMY and the FY, and 0.96-0.99 in the regressions with the H ref .
The poor sward volume and density with the 0-nitrogen fertilizer levels and the vigorous growth resulting in lodging with the nitrogen fertilizer levels of 150 kg/ha were the major reasons for the poorer results with these plots.
We also calculated regressions between the DMY and FY and the H ref measured by the height stick to compare the photogrammetric and ground based measurements (Table 8).The PCCs were better with the CHM-features at the three first growth stages of the silage sward studied, whereas, in the last measurement date with overgrown and lodging grass, the H ref outperformed the CHM features.In the analysis with respect to the nitrogen fertilizer level, the CHM features and the H ref provided similar results with the fertilizer levels of 0-125 kg/ha; in the largest fertilizer level of 150 kg/ha the H ref outperformed the CHM.

VI Features
We performed the sensitivity analysis with the VI features MSAVI, NDVI, ExG, ExG + H p90 and GrassI p90 .The results showed that both the date and the nitrogen application rate affected the correlations between the VI features and the physical measurements.However, especially the date had less impact on certain VIs than on the height features (Table 9 and Appendix A, Tables A1-A4).The best behaving VIs are presented in Table 9.In the analysis of regressions at different dates, the MSAVI provided in most cases the best PCCs of 0.94-0.99 for the DMY and FY, and the ExG + H p90 and the GrassI p90 provided the best PCCs of 0.82-0.97for the H ref .The VIs including height features (ExG + H p90 and GrassI p90 ) were impacted more by the date than the MSAVI and NDVI.With increased grass heights on the last measurement date (28 June), the RGB-based VIs were saturating (Table 9 and Figure 6b).
When concerning different nitrogen fertilizer levels, the VIs including height features (ExG + H p90 and GrassI p90 ) provided the best correlations.The nitrogen fertilizer level of 0 kg/ha performed the worst compared to other dates: the PCC was 0.75-0.87for the DMY, FY and H ref at best.The nitrogen fertilizer levels of 75 and 100 kg/ha provided a PCC of 0.96-0.99 for the best VIs.Overall, the nitrogen fertilizer level had less impact on the PCCs with the best VI features (Table 9) than with the CHM features (Table 8).

Biomass Estimation Using MLR and RF
We used the MLR to estimate the DMY and the FY using the RGB, the VI and the 3D features separately and in different combinations (Table 10).The best results when using the RGB, the VI or the 3D features separately were obtained with the VI features: the PCC and RMSE were 0.96 and 0.44 t/ha (16.7%) for the DMY and 0.91 and 2.94 t/ha (26.6%) for the FY, respectively.Using the 3D features provided slightly worse results: the PCC and RMSE were 0.93 and 0.59 t/ha (22.4%) for the DMY, and 0.92 and 2.79 t/ha (25.27%) for the FY, respectively.The RGB features provided clearly the worst results.Combining the 3D and VI features and all the features (3D, VI, and RGB) provided the

Biomass Estimation Using MLR and RF
We used the MLR to estimate the DMY and the FY using the RGB, the VI and the 3D features separately and in different combinations (Table 10).The best results when using the RGB, the VI or the 3D features separately were obtained with the VI features: the PCC and RMSE were 0.96 and 0.44 t/ha (16.7%) for the DMY and 0.91 and 2.94 t/ha (26.6%) for the FY, respectively.Using the 3D features provided slightly worse results: the PCC and RMSE were 0.93 and 0.59 t/ha (22.4%) for the DMY, and 0.92 and 2.79 t/ha (25.27%) for the FY, respectively.The RGB features provided clearly the worst results.Combining the 3D and VI features and all the features (3D, VI, and RGB) provided the best results; for example, in the case with all the features, the PCC and RMSE were 0.98 and 0.34 t/ha (12.7%) for the DMY and 0.98 and 1.25 t/ha (11.4%) for the FY, respectively.We performed a similar analysis using the RF estimator (Table 10).The best results when using the RGB, the VI or the 3D features individually were obtained using the VIs: the PCC and RMSE were 0.96 and 0.46 t/ha (17.4%) for the DMY and 0.97 and 1.67 t/ha (15.1%) for the FY, respectively.In addition, in the case of the RF, the 3D features provided slightly worse results than the VI features and the RGB features provided the worst results.Similar to MLR, using the combinations of the 3D and VI features and all the features, provided the best results; for example, the 3D and VI features provided the PCC and RMSE 0.97 and 0.37 t/ha (14.1%) for the DMY, and 0.98 and 1.51 t/ha (13.7%) for the FY, respectively.When comparing the two estimators, the RF estimator provided better results than the MLR when using individual RGB and VI features, and their combinations; the results with the VI and 3D features were on the same level.The RF estimator provided slightly worse results than the MLR when combining all the features.Regarding the importance of different features, the 3D features were the most important if they were included in the feature combinations: the percentile heights and ExG + CHM were the most important for the DMY and ExG + CHM, GrassI and the percentile heights for the FY (Table 11 and Appendix B, Tables A5 and A6).Thus, the new ExG + CHM index appeared to be significant in grass biomass estimations.We analysed the sensitivity of the biomass estimation with respect to the date and nitrogen application rate using the RF estimator with the combination of 3D, VI and RGB features.Similar to the analysis with the linear regression in Section 3.2, both the date and the nitrogen fertilization level had an impact on the results (Table 12).When concerning the impact of the measurement date, the results were the worst for the first date: the PCC and RMSE were 0.91 and 0.15 t/ha (13.1%) for the DMY, and 0.95 and 0.48 t/ha (12.1%) for the FY, respectively.The best performance was obtained during the third measurement date: the PCC and RMSE were 0.97 and 0.27 t/ha (9.1%) for the DMY, and 0.98 and 1.30 (10.2%) for the FY, respectively.The results of the second and last measurement dates were almost as good.Regarding the impact of the nitrogen application rate, the best performance for the DMY was achieved with the application rate of 100 kg/ha, which provided the PCC and RMSE of 0.98 and 0.35 t/ha (11.0%), respectively.In the case of the FY, the nitrogen application rates of 50-125 kg/ha performed evenly well giving the PCC and the RMSE of 0.97-0.98 and 0.95-1.71t/ha (11.3-12.0%),respectively.A nitrogen application rate of 150 kg/ha provided slightly worse estimation accuracy, and the nitrogen level 0 kg/ha provide clearly the poorest estimation accuracy.In the first measurement date, the most important features were ExG + H p90 , ExG + H max , GrassI p90 , GrassI max and NIR-based VIs; less important features were the 3D features (Appendix B, Table A6).During the second and third measurement dates, the importance of 3D features increased, even though the most important features were still ExG + H p90 , ExG + H max and GrassI p90 .On the last measurement date, the VI features started to dominate: the most important features were NDVI, OSAVI and MSAVI.Between different nitrogen fertilization levels, the selected features had less variation than different dates (Appendix B, Table A6).

Discussion
We developed and assessed a novel drone-based machine learning technique for estimating the height, fresh yield (FY) and dry matter yield (DMY) of grass swards.Our approach was to derive various features from the multispectral photogrammetric data sets, including the height features from the CHM and the colours and different VIs from orthophotos in the red (R), green (G), blue (B) and near-infrared (NIR) spectral bands.The MLR and RF estimators were trained using high variation timothy/meadow fescues mixture swards dominated by timothy.
Our study was the first to integrate various structural and spectral features from a drone multispectral photogrammetric system using machine learning techniques for the grass sward biomass estimation in the context of silage production.The best results were obtained when combining different height, RGB and VI features.The correlations and RMSEs were at best 0.98 and 0.34 t/ha (12.7%) for the DMY and 0.98 and 1.22 t/ha (11.05%) for the FY, respectively.The MLR and RF provided quite similar results (Table 10).Overall, the most important features for the RF were the new indices ExG + H p90 and ExG + H max (introduced in this study), the height features and the GrassI (Table 11 and Appendix B, Table A5).Additionally, the CHM-features gave better correlations to the DMY and the FY than the regressions with the physical canopy height measurements by the height stick (H ref ) at the three growth stages of the silage sward studied resulting from the three first measurement dates.We obtained the best results on the targeted silage harvesting date (19 June) and just before that (15 June) when the canopy was well-grown and homogeneous.Poorer estimation results were obtained early in the growing season when the sward volume and density was low, as well as after the targeted silage harvesting date when the stand was already heading, and lodging occurred in the most heavily fertilized plots.
In the trial area, we generated the DTM utilizing the paths that were cut down between the sample plots (Figure 1).We evaluated the performance of manual and semi-automatic DTM generation approaches.In the case of the manual DTM, we classified all points of the paths to ground points.The semi-automatic method also classified most of these same points as ground points.Both methods provided similar DTMs and correlations to the biomasses.The results indicated that it was possible to generate an accurate CHM from a single flight's data without the need to collect DTM data separately before or after harvesting.The second date's automatic DTM's worse RMSE of 6.80 cm was caused by the lower number of overlapping images in some parts of the model that was due to some camera triggering problems during the flight.Our plots had the cut paths (Figure 3), which probably improved ground point detection in all cases.However, the grass height in cut paths was 6-7 cm on each date, into which the photogrammetric DSM not penetrate well.Because of this, the interpolated DTMs were above the real ground level.Therefore, the canopy height values based on the CHM were lower than the physical height measurements.The limitation of photogrammetric DSM has been discovered in earlier studies [32,52].One solution for the underestimation could be to use oblique images or combine oblique and nadir images in the DSM generation as suggested in previous studies [3,68].Accurate georeferencing and a non-deformed photogrammetric block are important requirements in the proposed method.We performed the georeferencing using GCPs, which can be considered a laborious approach when aiming for fully automatic procedures and also expensive equipment for performing field measurements are required.In future studies, our objective will be to evaluate approaches that do not require in situ GCPs, particularly our objective is to implement better direct georeferencing process [69] utilizing a more accurate L1/L2 GNSS/IMU receiver and to investigate relative georeferencing of the multitemporal datasets.This improvement would also reduce the overall cost of the complete measurement system.Our results also supported the general expectations that the large image forward and side overlaps of approximately 80% and the self-calibration during the photogrammetric processing provided a non-deformed photogrammetric block.
In the simple linear regressions, the VIs performed well.The best performing individual VIs were the MSAVI for the DMY and FY (PCC: 0.94-0.99 for different dates) and ExG + H p90 for the H ref (PCC: 0.84-0.97for different dates) (Table 9).Overall, the NIR-based VIs gave better results than the RGB-based VIs.Several studies have shown that the NIR spectral range performs better in the estimation of crop biomass than the RGB spectral range [25,28].However, the new ExG + CHM indices, suggested by us, provided good results and outperformed all the RGB-based VIs in regressions with all of the physical measurements, and in addition outperformed the best performing VI (MSAVI) in the regressions with the DMY and the H ref .Similarly, in other studies, combined CHM and RGB based VIs have provided better regressions than the CHM and the RGB based VIs separately [29,32].With increasing grass height, the RGB-based VIs were saturating, which is consistent with previous studies [18,32].However, the saturation had less impact on the new ExG + CHM indices and the GrassI than other RGB-based VIs; and even less impact on the NIR-based VIs (Table 9 and Figure 6a).
All of the CHM features performed overall better than the VIs; H p90 being the best performing CHM feature (Appendix A, Table A4).The simple linear regression results of Possoch et al. [32] indicated also that the grass height was a suitable feature for grass yield estimation; regression of CHM to DMY provided a correlation of 0.80, and the physically measured height to the DMY provided the slightly worse correlation of 0.79.Previous studies have proven in different grasses and crops that the visible spectral range VIs (VI VIS ) perform well at the booting stage and do not perform as well in the other growing stages [25,61,[70][71][72].Wang et al. [42] found that laser scanning derived metrics combined with hyperspectral data can provide better biomass estimates of maize using partial least squares (PLS) regression.Pittman et al. [8] studied estimation of biomass and canopy height in bermudagrass, alfalfa, and the mix (containing a mixture of bermudagrass and alfalfa) using a golf car with mobile ultrasonic, laser and spectral sensors.For single sensor estimations, laser-estimated height measurements had the best correlations to the physically measured canopy heights and to the DMY: the PCC of height regressions was 0.88 for bermudagrass and 0.78 for the mix; correlations to the DMY were 0.88 for bermudagrass and 0.80 for alfalfa.However, combining two or three methods improved correlations of height estimation for bermudagrass giving the PCC of 0.92; and of DMY estimation in all cases, giving the PCC of 0.92 for bermudagrass, 0.83 for alfalfa and 0.89 for the mix.Additionally, combined laser-and ultrasonic-estimated height measurements provided equivalent and/or better estimates when compared to the physical canopy height measurements and plate meter DMY estimation methods.Differences between method-predicted values versus measured DMY were minimal.The average percent error was 11.2% for the differences between predicted values versus forage harvester and quadrat measurements of forage biomass values (1.64 and 4.91 t/ha), except at the lowest measured DMY, where the errors were larger, the average percent error was 89% and the absolute error <0.79 t/ha.With the greatest measured DMY, the average error was 18% and >6.4 t/ha [8].
Moeckel et al. [73] estimated the DMY and FY with a ground-based ultrasonic and spectrometer in grasslands with a heterogeneous sward structure on four dates representing different growth stages.The MPLSR approach resulted in a PCC of 0.69 (0.39-0.89 for date-specific models) for the DMY and a PCC of 0.82 (0.57-0.93 for date-specific models) for the FY.Fricke and Wachendorf [74] had similar results estimating biomass of legume-grass swards with combined ultrasonic and hyperspectral VIs: PCC were 0.91 in common swards and 0.94-0.95for species-specific calibrations for DMY.Marabel et al. [33] studied Support Vector Machine (SVM) and Partial Least Squares Regression (PLSR) for estimating the biomass of grasslands from field spectrometer data.The best results for DMY estimation were obtained using PLSR, and the maximum band depth index derived from the continuum removed reflectance in the absorption features between 916-1120 nm and 1079-1297 nm; the PCC was 0.97 and the RMSE was 71.2 t/ha for the DMY.We can thus conclude that our results with the photogrammetric drone data were in most cases better than the previous results obtained with typically more expensive measurement systems, and even with terrestrial measurements.
To obtain accurate estimations of the grass sward biomass using a low-cost drone-based system, we suggest utilizing a high-resolution RGB camera equipped with a good quality lens with a global shutter and potentially combined with an NIR band, e.g., a colour-infrared modified camera.Sensors such as the Parrot Sequoia [75] are not ideal for 3D reconstruction due to the rolling shutter, but the multispectral data are relevant.Datasets should be captured with high image overlaps at a low flying height to provide ultrahigh spatial resolution and dense point clouds.With the support of an accurate onboard direct georeferencing system the overall system and measurement cost could be reduced to a low level.The image processing and machine learning methods introduced in this study were proven to provide accurate results.We recommend integrating 3D, VI and RGB features using advanced machine learning methods.

Conclusions
Our study developed and assessed a machine learning technique based on multispectral orthophotos and photogrammetric 3D geometric remote sensing data for the estimation of fresh and dry matter yield of grass swards for silage production.Our approach was to extract various features from a remote sensing dataset by combining an ultra-high resolution photogrammetric canopy height model (CHM) with a pixel size of 1.0 cm and red, green, blue and near-infrared range intensity values and different vegetation indices (VI) extracted form orthophoto mosaics.We compared the performance of the Multiple Linear Regression (MLR) and the Random Forest estimator (RF).The best estimation results were obtained by combining the 3D, RGB and VI features.The RF provided similar or better results than the MLR.The Person's correlation coefficient (PCC) and RMSEs were at best 0.98 and 0.34 t/ha (12.70%), respectively, for the dry matter yield combining the 3D, RGB and VI features, and 0.98 and 1.22 t/ha (11.05%), respectively, for the fresh yield combining the 3D and VI features.We also evaluated the sensitivity of the method by creating variability in the swards by using different nitrogen fertilization rates and by repeating the data capture on four dates of the primary growth of the grass sward.Several points may have reduced the accuracy of the estimates: (1) nonhomogeneous sparse stand as a result of early season; (2) nonhomogeneous sparse stand as a results of low fertilizer application levels; (3) the heading of the sward due to a late harvesting date; and (4) lodging caused by high fertilization levels.
Our results are consistent with previous scientific results regarding the impact of date and combination of 3D and RGB features on the height and biomass estimation results.Furthermore, our novel approach integrating machine learning algorithms and the various 3D, spectral and VI features from the ultra-high resolution CHM and orthomosaics provided better results than most of the previous studies.The results were also highly precise in absolute terms.The yield estimation results had an excellent accuracy and outperformed the estimation results based on the measurement on the field using the height stick.Our results showed that the proposed method offers an accurate tool for estimating both the fresh yield and the dry matter yield of grass swards particularly close to the targeted silage harvesting stage.
In the next phases of our research, we will integrate the grass sward quality parameters to the estimation process, including nitrogen content and digestibility.We will also compare the proposed approach to multispectral datasets as well as to hyperspectral data.It is necessary also to integrate efficient direct georeferencing approach to the method to further improve the level of automation and cost of the overall system.An important future research topic will be to assess the generalization potential of the developed methods-in other words, to use the trained estimators in other fields without in situ training data.This is a highly relevant topic to develop efficient remote sensing tools with minimum in situ efforts.
Table A6.The most important features for the Random Forest (RF) (in the order of importance) for different dates and different nitrogen fertilizer levels.DMY: dry matter yield; FY: fresh matter; RGB: Red, Green and Blue spectral features; VI: Vegetation Index features; 3D: CHM 3D features; 0-150: Nitrogen fertilizer levels 0-150 kg/ha.

Figure 1 .Figure 2 .
Figure 1.Orthophoto mosaic from the Jokioinen test site from 15 June.In the first replicate (the leftmost column), the fertilizer rates were from 0 to 150 kg/ha (indicated with N0 to N150) in order from top to bottom in this photo.Nitrogen fertilizer application rates were randomized in Replicates 1-4 (in Columns 2-4) as well as the location of the harvesting date for reference harvests.

Figure 1 .
Figure 1.Orthophoto mosaic from the Jokioinen test site from 15 June.In the first replicate (the leftmost column), the fertilizer rates were from 0 to 150 kg/ha (indicated with N0 to N150) in order from top to bottom in this photo.Nitrogen fertilizer application rates were randomized in Replicates 1-4 (in Columns 2-4) as well as the location of the harvesting date for reference harvests.

Figure 2 .
Figure 2. In situ measurements of grass height by: (a) height stick; and (b) plate meter.

Figure 2 .
Figure 2. In situ measurements of grass height by: (a) height stick; and (b) plate meter.

Table 6 .
Statistics of comparison of manual and semi-automatic DTM from different dates to the DTM30m_man.DTM30m_man/auto: DTM manually/semi-automatically extracted from the 6 June 30 m flight; DTM50m_man/auto: DTM manually/semi-automatically extracted from the 6 June 50 m flight; DTMman: DTM manually extracted from each date's flight; DTMauto: DTM semi-automatically extracted from

Figure 4 .
Figure 4. Pearson Correlation Coefficients (PCC) for individual vegetation index (VI) and canopy height model (CHM) features from regressions to physical measurements of the dry matter yield (DMY), fresh yield (FY) and height (Href).

Figure 4 .
Figure 4. Pearson Correlation Coefficients (PCC) for individual vegetation index (VI) and canopy height model (CHM) features from regressions to physical measurements of the dry matter yield (DMY), fresh yield (FY) and height (H ref ).

Figure 5 .
Figure 5. Simple linear regression of (a) H 90 and H ref and (b) H max and H ref for different dates.H p90 : the 90th percentile value from the CHM; H max : maximum value from the CHM; and H ref : reference height measurement.

Figure 6 .
Figure 6.Simple linear regression of (a) MSAVI and dry matter yield (DMY) and (b) ExG + Hp90 and DMY for different time series.

Figure 6 .
Figure 6.Simple linear regression of (a) MSAVI and dry matter yield (DMY) and (b) ExG + H p90 and DMY for different time series.
• C in June 2017, while long-term averages (1980-2010) are 9.8 • C and 14.0 • C, respectively.The rainfall figures were 13 mm and 101 mm in May and June 2017, and 40 mm and 63 mm as long-term averages (1980-2010), respectively.The development of grass sward in the primary growth in Finland depends on the accumulated temperature sum.

Table 1 .
Datasets with their collection date, time, cloud conditions, sun azimuth, and solar elevation.GNSS: global navigation satellite system; FH: flight height.

Table 2 .
Dataset parameters: Date, FH (Flight Height), N Images (Number of Images), re-projection error, point density, and RMSE (Root Mean Square Error) of X, Y and Z coordinates and 3D.

Table 6 .
Statistics of comparison of manual and semi-automatic DTM from different dates to the DTM 30m_man .DTM 30m_man/auto : DTM manually/semi-automatically extracted from the 6 June 30 m flight; DTM 50m_man/auto : DTM manually/semi-automatically extracted from the 6 June 50 m flight; DTM man : DTM manually extracted from each date's flight; DTM auto : DTM semi-automatically extracted from each date's flight.Mean: mean error; RMSE: Root Mean Square Error; median: median error; min: minimum error; max: maximum error; std: standard deviation.

Table 7 .
Pearson correlation coefficients of the H p90 and the physical dry and fresh biomass and height measurements.DTM 30m_man : DTM manually extracted from the 6 June 30 m flight; DTM 50m_man : DTM manually extracted from the 6 June 50 m flight; DTM man : DTM manually extracted for each date; DTM auto : DTM semi-automatically extracted for each date; DMY: dry matter yield; FY: fresh yield; and H ref : reference height measurement.

Table 8 .
Pearson correlation coefficients for H mean , H max , H p90 and H ref to DMY and FY, and H mean , H max and H p90 to H ref for different dates and different nitrogen fertilizer levels of 0-150 kg/ha.DMY: the dry matter yield; FY: fresh yield; and H ref : reference height measurement.An analysis of the PCCs of regressions with H ref on different measurement dates showed that the earliest and latest dates displayed the poorest results (Table

Table 9 .
Pearson correlation coefficients for VIs and DMY, FY and H ref on different dates and different Nitrogen fertilizer levels (0-150 kg/ha).DMY: dry matter yield; FY: fresh yield; and H ref : reference height measurement.

Table 9 .
Pearson correlation coefficients for VIs and DMY, FY and Href on different dates and different Nitrogen fertilizer levels (0-150 kg/ha).DMY: dry matter yield; FY: fresh yield; and Href: reference height measurement.DateN-Level (kg/

Table 11 .
The most important features for the Random Forest (RF) (in the order of importance).DMY: dry matter yield; FY: fresh yield; RGB: Red, Green and Blue spectral features; VI: Vegetation Index features; 3D: CHM 3D features.