A Practical Application of Unsupervised Machine Learning for Analyzing Plant Image Data Collected Using Unmanned Aircraft Systems

: Unmanned aircraft systems are increasingly used in data-gathering operations for precision agriculture, with compounding beneﬁts. Analytical processing of image data remains a limitation for applications. We implement an unsupervised machine learning technique to e ﬃ ciently analyze aerial image data, resulting in a robust method for estimating plant phenotypes. We test this implementation in three settings: rice ﬁelds, a plant nursery, and row crops of grain sorghum and soybeans. We ﬁnd that unsupervised subpopulation description facilitates accurate plant phenotype estimation without requiring supervised classiﬁcation approaches such as construction of reference data subsets using geographic positioning systems. Speciﬁcally, we apply ﬁnite mixture modeling to discern component probability distributions within mixtures, where components correspond to spatial references (for example, the ground) and measurement targets (plants). Major beneﬁts of this approach are its robustness against ground elevational variations at either large or small scale and its proﬁciency in e ﬃ ciently returning estimates without requiring in-ﬁeld operations other than the vehicle overﬂight. Applications in plant pathosystems where metrics of interest are spectral instead of spatial are a promising future direction.


Introduction
Unmanned aircraft systems (UASs) have been discussed as a cornerstone of precision agriculture [1][2][3][4][5][6], supporting the collection of timely and abundant data at expansive spatial scales through low-altitude remote sensing (LARS). Data collected through LARS are of high resolution and volume and can easily represent spatial scales exceeding the size of current agricultural operations [7,8]. UAS image collection can be utilized when other means would be logistically limited, for example in flooded-field systems such as rice. Thermal and other spectral data can be collected at increasingly high resolution and detail as imaging technology advances [9][10][11][12][13][14][15]. As beyond-line-of-sight applications become feasible, the potential spatial scale of data collected through LARS becomes practically unlimited. The advancement of LARS applications also makes them useful where financial or logistical resources are limited, providing information with a short turnaround time and low cost for small farms or operations in developing countries [2].
Research concerning LARS is increasing, and work is ongoing to translate this research to applications in commercial agriculture [15][16][17][18]. LARS techniques have been used to describe various and maintained until plant maturity. Aerial imagery of rice fields was collected on August 2, 2017 from at 15 m above ground level (AGL). A 245 × 45 m field was divided into six research blocks, each consisting of forty 4.8 × 1.5 m plots wherein ten experimental treatments were each replicated four times. Nitrogen fertilizer (urea) was applied at four time points: pre-planting, pre-flooding, panicle initiation, and late booting (Table S1). All other field operations followed the 2014 Texas Rice Production Guidelines [37].
A private nursery (Amerson's Nursery) in Lamar, SC, USA, was aerially photographed, including a diversity of plants on May 13, 2018. Flights were flown at 15 m AGL. Our effort in this instance was to collect and process image data on a variety of plants to investigate the performance and robustness of an analytical approach. Areas populated by shrubs, trees, and potted plants were photographed for this purpose.
At the Clemson University Edisto Research and Educational Center (REC) in Blackville, SC, USA, a rotational study concerning agriculturally relevant nematodes was underway in 2018, aerial imagery of fields of soybean (Glycine max L.) and grain sorghum (Sorghum bicolor (L.) Moench) was collected on July 23, 2018. Here, our aim was to test approaches for their ability to recover accurate height estimates validated against "ground truth" measurements taken by personnel in the field. Accordingly, rectangular pieces of white cardboard were placed at ground level in these fields to mark locations at which in-field measurements were made on surrounding plants. Five randomly selected plants were measured with a ruler at each of eight such locations (four for grain sorghum and four for soybean), and these measurements were used for comparison with LARS-based estimates of height.
Supplemental Figures S1-S3 show representative imagery from each location.

Unmanned Aircraft Systems
The flight platform used was Dà-Jiāng Innovations (DJI) Phantom 4 (Dà-Jiāng Innovations, Shenzhen, China), with an attached red-green-blue (RGB) camera. The flight platform used a 30 mm 2 CMOS sensor with 12.4 effective megapixels. The viewing angle was 94 • , the electronic shutter speeds were 8-1/8000 s, and the image size was 4000 × 3000 pixels. Raw images were stored in digital negative (DNG) format and developed to joint photographic experts group (JPEG) format for analysis. The system used a GPS and global navigation satellite system (GLONASS) for positioning. Images were stored on a micro secure digital (SD) card. Flights were either structured using the flight planning functions of Pix4Dmapper software version 4.1 (Pix4D, Lausanne, Switzerland) [38] or conducted manually. Flights in Texas were planned, involving a double grid pattern with 75% pairwise overlap in both along-track and cross-track directions between sequentially adjacent photographs. Flights in South Carolina were manually directed, involving alternating forward and reverse passes and targeting 80% pairwise overlap between sequentially adjacent photographs in the along-track direction. Degree of overlap was chosen to conform with imaging practices reported in other studies, so that our data acquisition step would be in line with current practices [7,8].

Image and Data Processing
For each dataset, images were processed using the structure-from-motion algorithm of Pix4Dmapper software, resulting in point clouds. Point clouds comprise points inferred through the algorithm, in which each point is projected onto a three-dimensional space and has a spectral description. The structure-from-motion algorithm uses a single camera to infer three-dimensional structure in the way that multiscopic vision does, but by processing positional variation of the camera through time as equivalent to the physical distance between two or more cameras. Point clouds, as tables, were used directly for analysis. Point clouds were visualized as three-dimensional surface plots (Figure 1) through the G3D procedure (three-dimensional graphics) of the SAS System v9.4 (SAS Institute, Cary, NC, USA) [39] to confirm point cloud processing success, for data subset delineation, and to visually identify potential anomalies or features that could interfere with or be utilized in analysis. Three-dimensional renderings of data involved latitude and longitude for in-field position and height or spectral (red, green, Agronomy 2020, 10, 633 4 of 16 or blue) indices as the third dimension for visualization. Depictions were used to delineate data subsets representing areas of interest-fields, plots, or subplots in the research settings, and areas of the nursery. Random samples of a given size were taken from the dataset for visualization because the resolution of our raw data was much higher than that required for visualizing general features-in other words, we used a reduced-resolution preview of the full-resolution dataset to identify subareas such as plots or field margins, because these areas were easily identifiable without full-resolution depiction. For example, a point cloud representing the rice field at Beaumont, Texas, includes 3.5 million points, and 35 thousand points were randomly selected from the dataset for use in generating visualizations. Full-resolution point clouds were used for analysis.
Agronomy 2020, 10, x FOR PEER REVIEW  4 of 16 were taken from the dataset for visualization because the resolution of our raw data was much higher than that required for visualizing general features-in other words, we used a reduced-resolution preview of the full-resolution dataset to identify subareas such as plots or field margins, because these areas were easily identifiable without full-resolution depiction. For example, a point cloud representing the rice field at Beaumont, Texas, includes 3.5 million points, and 35 thousand points were randomly selected from the dataset for use in generating visualizations. Full-resolution point clouds were used for analysis. Figure 1. Representations of point cloud data. Subsets of point cloud data describing field N-6 are used to fit the wireframe surface plots shown here on the left, depicting the artifactual slope that formed primarily along the x-axis as a result of image processing without global positioning system (GPS) control points. This slope is marginalized by including the x-and y-axes as covariates in an analysis of covariance (ANCOVA), as shown on the right in a plot of residuals after accounting for variation associated with x-and y-axes.
Point clouds were rendered as contour plots using the GCONTOUR procedure of the SAS System v4.3 and visually investigated for quality and representativeness of the depicted area, as compared to images to identify imaged subjects, by eye. Forty plots from the rice research field in Texas were visually identified using a contour map rendered from the point cloud. Treatment information associated with each plot, including timing (date) and amount (kilogram per hectare) of nitrogen addition, was merged with the point cloud dataset by plot for analyzing relationships between treatments and inferred rice plant height. Subsets of point cloud data from Amerson's Nursery were generated on the basis of plant type, following visual identification of areas targeted for analysis, e.g., an area of potted plants, or an area of a given type of shrub. Subsets of point cloud data from the Edisto REC soybean and grain sorghum fields were generated to investigate how the spatial size of an imaged area can be chosen to optimize the accuracy or precision of estimated plant characteristics: square-shaped sample areas of two, four, eight, sixteen, and thirty-two square meters in area were taken from the whole-field point cloud. Each sample was centered on a reference point indicated by a white piece of cardboard placed at ground level in the field at random. This subsetting was done to emulate a researcher randomly choosing a sample location in a field for inclusion in a dataset.
A preliminary finding guided subsequent methods selection. Initial processing of images using Pix4D to generate point clouds resulted in three-dimensional depictions that included artifacts, specifically slopes that are known to not exist in reality, given that the depicted areas are flooded rice fields and thus necessarily level ( Figure 2). Point clouds were rendered as contour plots using the GCONTOUR procedure of the SAS System v4.3 and visually investigated for quality and representativeness of the depicted area, as compared to images to identify imaged subjects, by eye. Forty plots from the rice research field in Texas were visually identified using a contour map rendered from the point cloud. Treatment information associated with each plot, including timing (date) and amount (kilogram per hectare) of nitrogen addition, was merged with the point cloud dataset by plot for analyzing relationships between treatments and inferred rice plant height. Subsets of point cloud data from Amerson's Nursery were generated on the basis of plant type, following visual identification of areas targeted for analysis, e.g., an area of potted plants, or an area of a given type of shrub. Subsets of point cloud data from the Edisto REC soybean and grain sorghum fields were generated to investigate how the spatial size of an imaged area can be chosen to optimize the accuracy or precision of estimated plant characteristics: square-shaped sample areas of two, four, eight, sixteen, and thirty-two square meters in area were taken from the whole-field point cloud. Each sample was centered on a reference point indicated by a white piece of cardboard placed at ground level in the field at random. This subsetting was done to emulate a researcher randomly choosing a sample location in a field for inclusion in a dataset.
A preliminary finding guided subsequent methods selection. Initial processing of images using Pix4D to generate point clouds resulted in three-dimensional depictions that included artifacts, specifically slopes that are known to not exist in reality, given that the depicted areas are flooded rice fields and thus necessarily level ( Figure 2). The use of GPS control points would result in correcting this issue to a degree proportional to how accurately the points provide for estimating terrain surface. However, in this case, the effect is also easily marginalized by including positional variables (latitude and longitude) as covariates in a generalized linear model of height. For a flat surface (not level, but flat), as was the case for the rice fields, this technique works well. However, for any non-flat surface, terrain variation cannot be described by a linear model without concern that variation in plant height is also being described. The fact that (artifactual) terrain height variation in a special case (rice fields known to be both flat and level) could be marginalized without the use of GPS control points or classification of plants vs. ground suggested that a statistical method could be developed to do this in other cases. We sought a plant height estimation method that would be insensitive to terrain height variation, but without having to describe terrain height variation (as is conventionally done when a digital terrain model is developed). We settled on the unsupervised learning technique of finite mixture modeling, and the following methods were chosen for applying the technique.

Mixture Modeling
To investigate the applicability and performance of mixture modeling to infer plant height, we fit mixtures of frequency distributions to height observations (points in the cloud), irrespective of pixel class (ground vs. plant), using the finite mixture model (FMM) procedure of the SAS System v9.4. A FMM is a statistical model that describes variation in a population made up of a finite number of subpopulations, without classifying individuals' subpopulation membership. Equation 1 relates φ, the characteristic function of height, h, varying within a given spatial extent and composed of a number, k, of component Gaussian distributions each, i, with location and dispersion parameters, mean, μ, and variance, σ.
Initially, we compared mixture models with varying numbers of components (k) in order to confirm that k = 2 for images including plant biomass and the ground was an appropriate parameter value. Our expectation was that two subpopulations of points in the cloud would be detectable through this approach: the ground (terrain between plants visible to the UAS), and plant biomass (aboveground plant tissue). In this initial stage, we fit mixture models for k = 1 to k = 5 and confirmed that k = 2 resulted in best fit by minimizing Schwartz's Bayesian Information Criterion (BIC). In a few instances, the minimum BIC corresponded to k = 3, and visual inspection of histograms of height The use of GPS control points would result in correcting this issue to a degree proportional to how accurately the points provide for estimating terrain surface. However, in this case, the effect is also easily marginalized by including positional variables (latitude and longitude) as covariates in a generalized linear model of height. For a flat surface (not level, but flat), as was the case for the rice fields, this technique works well. However, for any non-flat surface, terrain variation cannot be described by a linear model without concern that variation in plant height is also being described. The fact that (artifactual) terrain height variation in a special case (rice fields known to be both flat and level) could be marginalized without the use of GPS control points or classification of plants vs. ground suggested that a statistical method could be developed to do this in other cases. We sought a plant height estimation method that would be insensitive to terrain height variation, but without having to describe terrain height variation (as is conventionally done when a digital terrain model is developed). We settled on the unsupervised learning technique of finite mixture modeling, and the following methods were chosen for applying the technique.

Mixture Modeling
To investigate the applicability and performance of mixture modeling to infer plant height, we fit mixtures of frequency distributions to height observations (points in the cloud), irrespective of pixel class (ground vs. plant), using the finite mixture model (FMM) procedure of the SAS System v9.4. A FMM is a statistical model that describes variation in a population made up of a finite number of subpopulations, without classifying individuals' subpopulation membership. Equation (1) relates ϕ, the characteristic function of height, h, varying within a given spatial extent and composed of a number, k, of component Gaussian distributions each, i, with location and dispersion parameters, mean, µ, and variance, σ.
Initially, we compared mixture models with varying numbers of components (k) in order to confirm that k = 2 for images including plant biomass and the ground was an appropriate parameter value. Our expectation was that two subpopulations of points in the cloud would be detectable through this approach: the ground (terrain between plants visible to the UAS), and plant biomass (aboveground plant tissue). In this initial stage, we fit mixture models for k = 1 to k = 5 and confirmed that k = 2 resulted in best fit by minimizing Schwartz's Bayesian Information Criterion (BIC). In a few instances, the minimum BIC corresponded to k = 3, and visual inspection of histograms of height variation confirmed three distinguishable mixture distribution components. Once we had optimized k for an FMM fit to data from an area of interest, we then proceeded to calculate an average height estimate as the difference between locations (for Gaussian mixture components, µ i , the means) of "ground" and "plants" distributions. Calculating this difference is possible without needing to classify individual Agronomy 2020, 10, 633 6 of 16 pixels as being members of one or another component distribution. For areas in which the optimal value of k was 3, due to some nursery plants being grown in pots, we estimated plant height as the difference between the "pot" and "plants" distributions. Additional comparisons between quantile limits of distribution pairs were calculated, expecting that for important reasons other than average plant height, the distribution of height in a given point cloud should vary. For example, plant row spacing or physical architecture should affect depth of visibility into the canopy ( Figure 3) and this can be addressed by referencing different portions of the "plants" distribution.
variation confirmed three distinguishable mixture distribution components. Once we had optimized k for an FMM fit to data from an area of interest, we then proceeded to calculate an average height estimate as the difference between locations (for Gaussian mixture components, μi, the means) of "ground" and "plants" distributions. Calculating this difference is possible without needing to classify individual pixels as being members of one or another component distribution. For areas in which the optimal value of k was 3, due to some nursery plants being grown in pots, we estimated plant height as the difference between the "pot" and "plants" distributions. Additional comparisons between quantile limits of distribution pairs were calculated, expecting that for important reasons other than average plant height, the distribution of height in a given point cloud should vary. For example, plant row spacing or physical architecture should affect depth of visibility into the canopy ( Figure 3) and this can be addressed by referencing different portions of the "plants" distribution. As the space between plants increases (e.g., due to increasing row spacing), the height of points detected at viewing angle (B) will become lower, and the "plants" distribution in a mixture will have a lower mean value, irrespective of actual plant height. This poses a problem for approaches that use one distributional characteristic (e.g., mean or maximum) for multiple plant spacing patterns or architecture. The problem is solved by fitting distributions to latent classes of data and choosing, empirically or analytically, the optimal parameter or quantile for a given application.
Where ground-truth data were available (soybean and grain sorghum fields in South Carolina), plant heights were estimated and thereafter tested for validity as predictors of actual measurements, with this validation being conducted through a regression approach using the generalized linear model (GLM) procedure of the SAS System v9.4: actual heights were regressed against predicted heights.

Rice Plant Height and Yield-Beaumont, Texas
A generalized linear model describing rice plant height, including treatment number (as a category), latitude, and longitude as independent variables accounted for 85% of height variation for field N-1, and 87% of height variation for field N-6. Table 1 includes coefficients estimated for this model, which marginalizes artifactual slope that arises from image processing and enhances ability to infer differences arising from treatments applied to plots in a randomized complete block design. As the space between plants increases (e.g., due to increasing row spacing), the height of points detected at viewing angle (B) will become lower, and the "plants" distribution in a mixture will have a lower mean value, irrespective of actual plant height. This poses a problem for approaches that use one distributional characteristic (e.g., mean or maximum) for multiple plant spacing patterns or architecture. The problem is solved by fitting distributions to latent classes of data and choosing, empirically or analytically, the optimal parameter or quantile for a given application.
Where ground-truth data were available (soybean and grain sorghum fields in South Carolina), plant heights were estimated and thereafter tested for validity as predictors of actual measurements, with this validation being conducted through a regression approach using the generalized linear model (GLM) procedure of the SAS System v9.4: actual heights were regressed against predicted heights.

Rice Plant Height and Yield-Beaumont, Texas
A generalized linear model describing rice plant height, including treatment number (as a category), latitude, and longitude as independent variables accounted for 85% of height variation for field N-1, and 87% of height variation for field N-6. Table 1 includes coefficients estimated for this model, which marginalizes artifactual slope that arises from image processing and enhances ability to infer differences arising from treatments applied to plots in a randomized complete block design. This technique results in a level field, confirmed by observing the slope of "ground" area not occupied by rice plots, which is known to be level due to being flooded with water. Height estimates calculated as the difference between FMM mixture components ("ground" and "plants" distributions) are also shown in Table 1, confirming that in an idealized situation of level and flat ground (obviating the need to construct a digital surface model or use GPS control points), FMM-derived estimates based on the difference between component means are equivalent to estimates calculated by subtracting a fixed ground position from a plot's average distance from a UAS's sensor. For FMM-based height estimation, a spatial buffer of 0.4 m was added to each plot to ensure adequate representation of the ground in point cloud data.

Multiple Plant Species at Amerson's Nursery-Lamar, SC
Images from Amerson's Nursery were initially processed, as were those from the rice field at Beaumont, TX, using Pix4D to generate three-dimensional point clouds thereafter analyzed using the FMM procedure of the SAS System. Because the use of FMMs proved well suited to inferring heights of rice plots in the earlier stage of this study, we applied the technique to a variety of plant types to test whether multiple mixture components could be reliably detected across plant types. In this instance FMMs could be used reliably to detect the presence of three mixture components where appropriate (Figure 4), representing ground, pots (being the soil surface contained in a pot), and plants

Soybean and Grain Sorghum-Blackville, South Carolina
Images from the Edisto REC at Blackville, SC, were processed as before, first using Pix4D and subsequently analyzing point clouds using SAS. Reported in Table 2 are heights estimated for data subsets corresponding to square areas of four, eight, and sixteen square meters around reference locations indicated by rectangular pieces of white cardboard ( Figure 5).

Soybean and Grain Sorghum-Blackville, South Carolina
Images from the Edisto REC at Blackville, SC, were processed as before, first using Pix4D and subsequently analyzing point clouds using SAS.
Reported in Table 2 are heights estimated for data subsets corresponding to square areas of four, eight, and sixteen square meters around reference locations indicated by rectangular pieces of white cardboard ( Figure 5).  were delineated around a point at which local ground truth measurements were recorded (indicated by a white board placed at ground level). In the second approach (B), 4 m squares were randomly placed to determine whether estimates derived from random samples throughout the field were representative of ground truth data.
Parameter estimates pertaining to one subfield appear in Table 3. These estimated heights were compared with ground measurement to assess the performance of the technique in recovering accurate heights, with the highest correlation being between median point (from point cloud subset) height and ground-truth height (Table 4). When fitting FMMs to data collected from grain sorghum fields with dense canopies, frequently only one mixture component was estimated, preventing calculation of the distance between component means to represent height. Sampling approaches for generating point cloud data subsets used for estimating soybean height. Two approaches were taken. In the first approach (A), concentric subsets (from 2 to 32 m 2 ) were delineated around a point at which local ground truth measurements were recorded (indicated by a white board placed at ground level). In the second approach (B), 4 m squares were randomly placed to determine whether estimates derived from random samples throughout the field were representative of ground truth data.
Parameter estimates pertaining to one subfield appear in Table 3. Table 3. Representative FMM parameter estimates for a soybean field. µ 1 and µ 2 correspond to the locations of "ground" and "plant" distributions, respectively. σ 1 and σ 2 are respective variances. The difference between µ values is calculated to estimate height, irrespective of the absolute value of µ values. Here, negative height values result due to the global average height being set to zero. These estimated heights were compared with ground measurement to assess the performance of the technique in recovering accurate heights, with the highest correlation being between median point (from point cloud subset) height and ground-truth height (Table 4). When fitting FMMs to data collected from grain sorghum fields with dense canopies, frequently only one mixture component was estimated, preventing calculation of the distance between component means to represent height.

Discussion
This study demonstrates the ability of an unsupervised learning technique to robustly generate useful results from LARS data in agricultural settings, with emphasis on efficiency. The approach was to generate three-dimensional point clouds using LARS data collected from different types of agricultural operations and from varying UAS flight patterns, and then to analyze variation without classifying areas or points in the cloud. Implicit in the process of measuring plants for height above ground is differentiation of plants from ground, and many supervised learning analytical approaches address this through classification and subsequent comparison. A conventional approach is to construct the entirety of the "ground" class via linear or other interpolation between GPS control points and treat a spatial subset of point cloud data as the "plants" class. Also common is to construct a surface model from point cloud data, enhancing visualization and summarizing features, but also collapsing variation in a way that prohibits its being analyzed in raw form.
We suggest that a difference between human measurement takers and an aerial sensor can be exploited. Whereas a human on the ground taking a measurement is reliably calibrated (ground position is unlikely to be mistaken), he/she does not record the millions of measurements that are represented in a point cloud assembled from images taken by an aerial sensor. Indeed, the human does not have reason to take millions of measurements because the few taken are likely to be calibrated very accurately. We propose that the aerial imager similarly does not have reason to calibrate relative to ground in the way that the human does because the aerial imager benefits from the generation of a large number of data points. The calibration of measurements derived from aerial imagery can thus be carried out post-measurement, and through a statistical approach. The volume of data thus generated supports unsupervised learning to infer the distance between two components latent in the resultant distribution of heights (Table 2). Results demonstrated that the mean height estimates derived from the FMM are robust to variation in spatial extent being examined (Table 2). Furthermore, the FMM-derived height estimates were well correlated (R 2 = 0.71) with in-field human-measured soybean heights (Table 3). This process can be generalized to situations in which some background, other than the physical ground, is the reference for comparison. For example, comparisons between colors or temperatures can be made. Such situations include spatial variation in spectral profile arising from disease, stress, or other phenomena of practical interest to agricultural producers. Because the approach does not rely on a priori or computational classification, it negates the consequences of classification error. In an unsupervised learning approach, the difference between plant and ground, or healthy and diseased plants, need only be detectable using analyzed metrics, not defined in terms of what underlies the difference. A producer monitoring a crop for the presence of disease can target areas where two classes of spectral profile are detected for investigation. Because the class-detection result of fitting an FMM does not depend on absolute values, differences between classes can be detected more reliably across varying conditions (lighting, weather, etc.) than may be possible through supervised learning approaches that require reliable classification before analysis. Additionally, FMMs are straightforward to estimate, such that fitting of models for this purpose can be easily accomplished using many statistical analysis platforms.
Application of FMMs to the types of problems faced by agricultural producers is fitting because of how the effort-intensive process of classification is (or is not) conducted. Classification of plant vs. ground in this context is not something that directly informs the agricultural producer-the producer can already readily identify ground vs. plant. Instead, classification is required by algorithms that rely on data being classified for supervised learning activities: these algorithms must know the difference between a "ground" point and a "plant" point before inference may be drawn from comparison between classes. Again, this information is not in itself of use to the producer-the producer does not need to know point or pixel class for any purpose outside of the analytical task being undertaken by the relevant algorithm.
The unsupervised machine learning technique used here provides estimates, but with a reduced number of required steps when compared to other approaches that involve the use of GPS control points and their inclusion in analysis. The presence of multiple classes in a dataset is inferred without data needing to be classified. Thus, we expect the approach taken here can be more efficient, simply on the basis of its requiring fewer activities to execute. An approach that involves GPS control points and supervised classification requires, among other things, (1) construction of physical markers to be detected by computer vision algorithms during orthomosaicking; (2) placement of physical markers in target areas, ensuring visibility to aerial cameras by clearing plant biomass or other materials; (3) precisely georeferencing GPS control points by physically visiting them with GPS equipment; (4) incorporating GPS control points into datasets and constructing terrain maps; (5) subsetting point cloud data to exclude pixels associated with actual terrain, so that statistical characterization of plants can be carried out. There are definitely situations in which these steps are required and advantageous-we highlight the most obvious of these, a situation in which the terrain is not visible to the aerial camera and thus the fitting of FMMs is not possible. However, where these effort-intensive steps are not required, efficiency in phenotyping can be gained by taking an alternative approach that does not require the steps. A consultant or scout operating LARS-based data gathering could collect data in the time it takes to conduct the imaging overflights, process data in the time it takes to assemble a point cloud, and then have an estimate of plant phenotype in the field for discussion with the producer. By fitting FMMs to spatial subsets of data, height estimates can be made precisely despite variable topography of terrain ( Figure 6). To construct a model of terrain using GPS control points, those points must be numerous enough to support fitting a terrain model that adequately represents variation in a field. If terrain varies appreciably at small spatial scales, then GPS control points must be distributed densely to account for that variation. However, to compare two latent height classes, those classes need only be each adequately represented by data, and not necessarily identified by class. Height estimates for subareas of a field, sampled randomly or with structure, can be averaged to estimate the overall mean height in a field. As in a Riemann sum, the size of these spatial subsets affects the accuracy of the estimate, practically limited in this case by the minimum size at which mixture distribution components can be reliably differentiated and estimated. We suggest that this approach to LARS data usage combines the well-established practice of random sampling (computing height estimates for multiple data subsets) with the resolution and extent advantages of aerial imagery. Our results demonstrate that small spatial subsets (2 m squares) can be used to generate representative estimates (Table 3).
Additionally, Figure 6 shows how the success of the FMM approach in recovering height estimates means that there is not a need to "level" point clouds or use GPS control points for generating height estimates. Importantly, sources of error or variation that result in a given field not being level can be addressed by the FMM approach regardless of whether those sources are an image-processing artifact, or a real-world sloped or hilly field. Correspondence between GLM-and FMM-derived height estimates (Table 1) confirms that the FMM can implicitly account for terrain variation that is explicitly included as covariates in the GLM. Where this covariate information is unavailable, the FMM approach is more useful. Taken together, these considerations underscore the robustness of the unsupervised learning approach to analyzing image data for practical purposes in agriculture. Agronomy 2020, 10, x FOR PEER REVIEW 12 of 16 Some situations require GPS control points and comparison between a constructed ground and a point cloud of observations. If the ground is not adequately visible to an aerial sensor, the "ground" component of a mixture distribution cannot be estimated. In such situations, calibration for height estimates cannot be accomplished through the technique used here, and a reference surface must be constructed. Thus, the application of this technique is limited to situations in which the ground is visible between crop plants-for example, row crops and early developmental stages of field crops. Situations calling for GPS control points and those well suited to application of FMMs are thus to a degree mutually exclusive. Where plants are dense such that canopy can be reasonably represented by a surface, an FMM attempting to differentiate (obscured) ground from plants will fail to return good estimates, and classification is not likely necessary, because all data in a related point cloud would describe plants. For instance, FMMs fit to data from grain sorghum resulted in only one mixture component, corresponding to the dense canopy of the crop when imaged (Supplemental Figure 4). In this situation, comparison between the plant canopy surface and a ground surface constructed using GPS control points would be required to estimate height. On the other hand, where plants are spaced such that the ground is visible between them and the canopy is discontinuous instead of being like a surface, an FMM is able to address the issue of classification efficiently without having to classify data points ( Figure 6). Were explicit classification attempted for comparison of the "plants" points to a constructed ground surface through a supervised learning approach, classification error would result in commensurate phenotype estimate error. In such an approach, excluding generous buffers from analysis to avoid attempting to classify ambiguous data Figure 6. Topography differences do not affect the difference of µ 1 and µ 2 when estimating height using a finite mixture model (FMM). Height variance changes dependent on both elevation and plant height variation, but this is independent of the difference between µ 1 and µ 2 . Three scenarios are shown as schematics on the left and are accompanied by real FMM output histograms representing each scenario. Imagery from which FMM output was generated is shown at the right. Scenario (A): flat terrain represented by a rice plot; scenario (B): sloped terrain present in a grain sorghum field; scenario (C): the uneven terrain of a soybean field. Some situations require GPS control points and comparison between a constructed ground and a point cloud of observations. If the ground is not adequately visible to an aerial sensor, the "ground" component of a mixture distribution cannot be estimated. In such situations, calibration for height estimates cannot be accomplished through the technique used here, and a reference surface must be constructed. Thus, the application of this technique is limited to situations in which the ground is visible between crop plants-for example, row crops and early developmental stages of field crops. Situations calling for GPS control points and those well suited to application of FMMs are thus to a degree mutually exclusive. Where plants are dense such that canopy can be reasonably represented by a surface, an FMM attempting to differentiate (obscured) ground from plants will fail to return good estimates, and classification is not likely necessary, because all data in a related point cloud would describe plants. For instance, FMMs fit to data from grain sorghum resulted in only one mixture component, corresponding to the dense canopy of the crop when imaged (Supplemental Figure S4). In this situation, comparison between the plant canopy surface and a ground surface constructed using GPS control points would be required to estimate height. On the other hand, where plants are spaced such that the ground is visible between them and the canopy is discontinuous instead of being like a surface, an FMM is able to address the issue of classification efficiently without having to classify data points ( Figure 6). Were explicit classification attempted for comparison of the "plants" points to a constructed ground surface through a supervised learning approach, classification error would result in commensurate phenotype estimate error. In such an approach, excluding generous buffers from analysis to avoid attempting to classify ambiguous data points is not only an additional analysis step with attendant potential for error, but also discards data points that may be informative. Unsupervised learning is more robust in situations to which it is appropriate.
Here, the FMM accounts for overlap between points belonging to each class, without that overlap detrimentally affecting estimates of each class' mean or other distributional parameters. A threshold-based classification method will misclassify at a rate proportional to the overlap between classes, resulting in biased estimates of class means. This bias will increase as a function of the two components' difference in proportional contribution to the mixture, and as a function of the difference between the two components' dispersions. In other words, misclassification is of no consequence to supervised learning-based height estimation if the ground and height classes are equally represented and equally variable in a given dataset. However, as representation and variability differ, misclassification becomes increasingly negative in its impact to such an approach. Again, unsupervised learning accounts for this varying issue more robustly.

Conclusions and Future Work
The analytical method described herein was first explored to estimate rice plot height in the absence of GPS control points. An application of the unsupervised machine learning approach, finite mixture modeling, was developed to reliably estimate plant heights in various terrain configurations. Using imagery data from a nursery, we determined that FMMs could be used to detect latent subpopulations present in the data-specifically, subpopulations that represented the physical ground and plant biomass, allowing expedient estimation of average plant height by calculating the difference between subpopulation mean heights. To validate the application, plant heights were physically measured in the field and compared to the FMM-derived estimates. Correlation between height estimates derived from the FMM approach and measurements taken on the ground was strong (R 2 = 0.96).
Further research should explore additional applications of unsupervised learning to technologically enhanced precision agricultural operations. We suggest that many contemporary applications are utilizing the remarkable resolution and scale of data available through remote sensing, while also involving a constraint that can be overcome: the need for explicit classification or calibration. The need for precise calibration is served by a human taking measurements in a field, and results in measurements that are precise and accurate, so that small sample sizes can be used to reach conclusions. Because LARS data are of such high resolution for an imaged extent, precise calibration (in this case, on the basis of the position of the ground) can be foregone because data are available to support inference on the position of the ground, as we have discussed. We believe that exploiting this type of approach will allow remote sensing and precision agriculture researchers to increase the efficiency of LARS data usage and enable increased focus on new challenges. Additionally, the ability to estimate relevant plant or disease phenotypes in the field without needing to construct a digital terrain model or use GPS control points means that crop managers can harness the benefits of LARS without having to also invest appreciable resources in computation or GPS systems. These investments can in turn be focused where they are required, as in cropping systems that do require GPS-based construction of a reference surface for height or other estimations.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4395/10/5/633/s1, Table S1: Nitrogen application tables for field N-1 and field N-6, Figure S1: Representative imagery of fields, Figure S2: An orthomosaic of Amerson's Nursery in Lamar, SC, USA, Figure S3: Representative images of the Clemson University Edisto REC, Blackville, SC, soybean (A) and grain sorghum (B) fields, Figure S4: Imagery from the grain sorghum field at the Edisto REC in Blackville, SC, and a corresponding single-component mixture distribution fit to the data, demonstrating the inability of the technique to facilitate height estimation in situations of very dense canopy.