Novel Feature-Extraction Methods for the Estimation of Above-Ground Biomass in Rice Crops

Traditional methods to measure spatio-temporal variations in above-ground biomass dynamics (AGBD) predominantly rely on the extraction of several vegetation-index features highly associated with AGBD variations through the phenological crop cycle. This work presents a comprehensive comparison between two different approaches for feature extraction for non-destructive biomass estimation using aerial multispectral imagery. The first method is called GFKuts, an approach that optimally labels the plot canopy based on a Gaussian mixture model, a Montecarlo-based K-means, and a guided image filtering for the extraction of canopy vegetation indices associated with biomass yield. The second method is based on a Graph-Based Data Fusion (GBF) approach that does not depend on calculating vegetation-index image reflectances. Both methods are experimentally tested and compared through rice growth stages: vegetative, reproductive, and ripening. Biomass estimation correlations are calculated and compared against an assembled ground-truth biomass measurements taken by destructive sampling. The proposed GBF-Sm-Bs approach outperformed competing methods by obtaining biomass estimation correlation of 0.995 with R2=0.991 and RMSE=45.358 g. This result increases the precision in the biomass estimation by around 62.43% compared to previous works.


Introduction
Rice is an essential grain to ensure global food security [1], contributing to 20% of food energy requirements worldwide.Major efforts have emerged to develop novel highthroughput methods for enabling precision biomass characterization aimed at improving crop yield, grain quality and crop management [2][3][4].
Monitoring rice biomass at larger crop scales requires remote sensing approaches for precision sampling, mostly based on unmanned aerial vehicle (UAV-driven) multispectral imagery [3][4][5][6].For this purpose, Above-Ground Biomass Estimation (AGBE) methods have recently gained significant traction [7][8][9][10], since machine learning models can be trained by features extracted according to different plant reflectance wavelengths, namely vegetation indices (VIs).These VIs, are computed from multispectral imagery captured by UAVs or satellites.Although the combination of several VIs tends to avoid saturation issues with higher values of accumulated biomass [11], in some cases the estimation of the biomass (based on VI extracted features) requires independent models that must be characterized and calibrated dependent on the crop growth stages.Furthermore, VI features are highly affected by the genotype (rice variety) and external abiotic conditions [5,6,[12][13][14].
Most of the AGBE methods reported in the literature require image segmentation algorithms to extract the VI formulas from the corresponding canopy [3][4][5][6]15].Therefore, the accuracy of the extracted VI-features strictly depends on the quality of the segmentation, where several factors such as image resolution, saturation and radiation noise, play an important role in achieving an accurate correlation between the segmented pixels (VI-features) with the biomass.To overcome this problem, recent approaches based on graphs [4] propose the fusion of different wavelengths images into a single graph, where the eigenvectors can be used as features.As a result, graph-based methods do not require image segmentation or other photogrammetry corrections that can be computational expensive.
In this paper, we compare two approaches for the extraction of relevant features from multispectral aerial imagery, allowing the estimation of above-ground biomass based on training classical machine learning models.In previous work reported in [3,4], we introduced the aforementioned methods.The former method is called GFKuts, which was primarily used as an image segmentation method to optimally segment the crop canopy.The latter was based on graphs data fusion, which instead of using VIs or a mask image to extract features, it use the structural information captured by the graphs.

Related Work
Several approaches are used to address the above-ground biomass estimation (AGBE) problem [3][4][5][6]15].For instance, authors in [15] conducted a comprehensive survey to identify which VIs were suitable for estimating rice biomass as a function of the growth stages of the crop.Furthermore, in [15,16], crop features were extracted by using classical k-means binary clustering for segmentation, where linear multi-variable regression models were applied to each crop stage independently for the biomass characterization.In [3], a more sophisticated method named GFKuts was presented, aimed at combining a Monte Carlo K-means with the well-known GrabCut segmentation method [17,18].Overall, the GFKuts approach is based on three image processing methods: Magic Wand, Intelligent Scissors [19,20], and GrabCut that is grounded on the Graph-Cut method [17,18].In GFKuts, the energy function of the Graph-Cut method was adjusted, to achieve a global optimization for a N-dimensional image through a Gaussian Mixture Model.Also, the proposed GFKuts algorithm is fully automatic [21,22].
In [23], a deep learning approach was used for the estimation of biomass in forage grass crops.Convolutional neural networks were trained with extensive datasets that are computational expensive to process, while over-fitting issues were also observed.Other approaches based on structural information combine multiple feature sources through a fused graph [4,24,25].These methods relies on the approximation of the graphs given by the Nyström extension.In this paper, we extend the original graph method presented in [4], by adding a new sampling technique known as blue-noise sampling.This is combined with a graph signal processing approach to enhance the estimation given by the Nyström extension of the graph in [4].As mentioned, this method uses the eigenvector from the graphs as the features to extract from the multispectral images.As a result, we expect to improve on the estimation of the above-ground biomass, by correlating the estimations against biomass measurements directly obtained from the crop by destructive sampling.

Materials and Methods
The general architecture for UAV-driven remote sensing of above-ground biomass in rice crops is presented in Figure 1.An UAV is equipped with a multispectral camera onboard that captures the reflectance of the light spectrum in four different bands (Green, Red, Red-Edge, and NIR).A set of 1868 (i.e., one image per band) images was acquired for the crop's three growth stages: vegetative, reproductive, and ripening.Further details on the dataset acquisition and crop characteristics can be found in [3].
As mentioned, this paper presents two approaches for feature extraction.The first approach is GFKuts, which is based on the image processing area.This approach is an entirely automatic proposal that integrates a binary classification technique with an op-timization approach based on a Gaussian mixture model, followed by a filtering stage, to extract a mask related to pixels that belong to the crop canopy.The second approach is a graph-based framework that aims to fuse data by taking advantage of the structural relationship between the data captured by independent graphs (i.e., the structural relationship of the pixels of an image with one graph per image channel).The method generates a resultant fused graph with the most relevant information.Unlike GFkuts, this method uses eigenvectors of the fused-graph as features rather than vegetation indices.Lastly, the extracted features obtained from both methods are the input of a regression model to estimate the above-ground biomass.The next subsections explain the features extraction approaches and the regressors.

GFKuts
Figure 2 details the proposed GFKuts approach.The first contribution of GFKuts is to apply a random sampling approach based on Monte Carlo method and integrate these samples into a binary classification method [26].This random arrangement and the binary classification algorithm solve a time-demanding computational iteration problem as they provide an efficient stochastic numerical method based on a photogrammetric image technique used in precision agriculture.An array I = (i 1 , . . ., i n ) of size N, denotes the input image in the standard RGB color space (sRGB).The binary non-supervised classification method implemented in this work is the k-means clustering algorithm that cluster the set of N samples into K groups.The semantic labeling of the groups is random according to the nature of the algorithm.Thus, there is no control of the true class of the label assigned.Given that the results are highly repeatable, the identification of each group is carried out through the characterization of the centroid.This characterization runs only once and operates as long as the conditions of the images do not change (e.g., the type of crop, growing, or environmental conditions).
The Montecarlo sample K-means classification strategy generates a selection of pixels with a uniform random distribution of the spatial coordinates n (x,y) of the image.The result is a subset of sRGB values for each selected pixel N.These values come from the refractive bands of the light spectrum captured by the sensor.The implementation of this first stage is shown in Algorithm 1.The result is a pair of masks that allow us to initialize regions of the image to implement the optimization process.

Algorithm 1: Montecarlo Sampled K-means.
Input: the image I, the number of samples N, and two heuristic values associated with the mean expected radiance of the canopy: C 1 , and the ground: for Each pixel in range (1 . . .n) do Random (x, y) pixel selection from I to P Store sRGB value from P to Feature n Store pixel coordinates I x,y end for Run K-means over Feature to get labeling I n,k=1 and I n,k=2 calculate the Euclidean distance between C 1 and the centroid of the group K = 1 to m1; calculate the Euclidean distance between C 1 and the centroid of the group K = 2 to m2; if m1 < m2 then group K = 1 to canopy group K = 2 to ground else group K = 1 to ground group K = 2 to canopy end if Create a mask T F and set the coordinates in I x,y of each pixel in I n,canopy as the foreground.Create a mask T B and set the coordinates in I x,y of each pixel in I n,ground as the background.
The image segmentation consists of inferring the unknown opacity variables, denoted as α, from the given image data I and the model θ [18].The opacity term α = (α 1 , . . ., α N ) is the image segmentation weighed by each pixel with 0 ≤ α N ≤ 1 for soft segmentation and α N ∈ (0, 1) for hard segmentation, with 1 denoted as foreground and 0 for background.The parameters θ describe image foreground and background distributions modeled by GMMs values, that introduces the covariance parameter k = (k 1 , . . ., k N ) with k n ∈ (1, . . ., K) assigned for each pixel.The "Gibbs" energy function of Equation (1) denoted as E, models a trend of solidity of the objects through the opacity parameter, whose minimum corresponds to an optimal segmentation.
The function U() in Equation ( 1) is explained in Equation ( 2); this function estimate the fitness of the opacity distribution α of the image, given the model θ, in which p() is a Gaussian probability distribution and π() are mixed weights coefficients.
The remaining function V() shown in Equation ( 3) is called the smoothness term.The constant γ is defined empirically as 50 by [18], given images in a 0-255 range.β is a constant that regulates the smoothness term.
The segmentation can be estimated as a global minimum of α parameter over the energy model, as seen in Equation ( 4) After the optimization process, an image filtering stage is introduced.Linear and time-invariant filters are usually used.A powerful approach, is to optimize a quadratic function that directly imposes constraints on the unknown output.The solution is obtained by solving a dispersion matrix encoded with the information from the guide image [27,28].Processing is done around each pixel using a weighted average of the nearby pixels.The processing information is based on the color and intensity of the guide image [27].This proposal is developed with the bilateral filter, starting with the guided filter.The filter can preserve the edges and smooth out small fluctuations [29].This approach is based on explicitly constructing the cores using an orientation image.The output is a linear transform of the guide image.This filter has the edge-preserving and anti-aliasing property such as a bilateral filter, but does not suffer from gradient inversion problems [28,29].
The filter stage has two inputs, a guide image 'I' and an input image 'p'.The output is expressed as a weighted average 'q'.The parameters i and j are the pixel indices, as seen in Equation ( 5) The guide image 'I' is the reference of the filter kernel W ij independent of 'p'.The kernel weights can be explicitly expressed by Equation ( 6) The parameters µ k and σ 2 k are the mean and variance of w k in image 'I' respectively, is a regularization parameter and w is the number of pixels in w k [28].

Graph-Based Data Fusion
Multi-channel input images denoted as X c n ∈ R m×n , with c = 1, . . ., C h and n = 1, . . ., N, where C h is the number of channels of each image and N is the total number of images.Here, the goal is to extract relevant features given by each channel of the images, in order to train a model that captures the biomass behavior across the crop stages.As explained in [4], each channel of images are represented by a corresponding graph, with the purpose of obtaining a single fused-graph from all the channels.The fused-graph is then used as an embedded space of images to extract relevant features.
The graphs used in the approach in [4] are undirected, and they are denoted as a triplet G = (V, E, w), consisting of vertexes V, and edges E ⊂ V × V, and a non-negative weight function w : V × V → [0, ∞).Therefore, a multichannel image X c n can be represented as graph G by defining a vertex set V, an edge set E ⊂ V × V, where each edge represents the relationship between two pixels, and a weight function w that measures the strength of that relationship.Typically w is defined by: where dist(v i , v j ) is the Euclidean distance between the nodes v i and v j associated to the pixels ∈ X c n , and σ > 0 is a scaling parameter.Given the graphs for all channels in the image (X can be defined by combining the weight functions of G c as follows: Therefore, the eigenvectors (u) and eigenvalues (λ λ λ) are computed from W f ∈ R mn×mn by solving the eigen problem of the normalized Laplacian, defined as: where the Laplacian L = I − D −1/2 W f D −1/2 and D is the degree matrix that is diagonal has elements d j = ∑ i w f (v i , v j )).

Nyström Extension
Since the matrix W f ∈ R P×P with P = mn cannot be computed the authors in [4] leveraged the Nyström extension to compute the eigenvectors of W f without computing the complete matrix, while using some samples (n s ) of X c n , as follows: where A ∈ R n s ×n s , B ∈ R n s ×(P−n s ) and C ∈ R (P−n s )×(P−n s ) .This method approximates C by using n s samples from the total number of pixels P ∈ X c n with n s P. Thus, the eigenvectors of the matrix W f can be spanned by eigenvalues and eigenvectors of A, by solving the diagonalization of A (A = U λ λ λU).Hence, the eigenvectors of W f can be spanned by Û = U; B Uλ λ λ −1 .Since the approximated eigenvectors Û are not orthogonal, as explained in [30], they can be obtained with S = A + A − 1 2 BB A − 1 2 .Then, by the diagonalization of S (S = U s λ λ λ s U s ) the final approximated eigenvectors of W are given by: Up to this point we have discussed the approach proposed by authors in [4] in which, they used the common Gaussian kernel-based graph that might not provide the best representation of the intrinsic behavior of the data.In addition, the uniformly spaced distributed sampling method used to extract samples for the Nyström extension can overlook relevant samples that contain structural information about the change detection.Moreover, as defined by [31], the approximation of W f given by the Nyström extension is highly affected by the way samples are selected.Thus, we propose to use a recent graph-based sampling method named as Blue noise sampling [25] alongside a graph using prior smoothness learning, which is based on graph signal representation [24].This is detailed in the following.

Graph Signal Processing: Smoothness Prior
The problem of learning a graph with a prior of smoothness, is to learn the Laplacian matrix (L), so that the signal variation on the resulting graph (Q(L)), is small.Here a small value of Q(L), means that the signal on the graph takes similar values to its neighbors, resulting in edge disconnections from the graph [32].The measure of smoothness of a signal x on a graph is given by: where w ij is the ijth entry of matrix W. To do so, we use the recent approach in [24] for large scale graphs with prior smoothness, where the authors leverage the desired graph sparsity to reduce computational cost.This model needs as inputs the number of neighbors K that are related to the edges per node, and the parameters α and β.Therefore, the minimization problem of [24] can be computed as: where Z is a pairwise distances matrix, α is a log prior constant (> α →> weights in W), β is a W 2 F prior constant (> β → less sparsity in W), and the parameter c encourages adjacency matrices close to a pre-specified adjacency matrix W 0 .To reduce the complexity of Equation ( 11) the authors in [24] fix the parameters α = β = 1 and multiply the pairwise distances matrix (Z) by a parameter θ to make the sparsity analysis simpler.The parameter θ is computed by each column b of Z, and it has upper and lower bounds, as described by Equation (12).
The process is summarized in Algorithm 2, as follows: Algorithm 2: Graph learning with prior smoothness [24].Input: Matrix of distances Z, K edges per node (sparsity level) Output: Graph learned with prior smoothness W s

Step to compute θ
Compute bounds of θ with Equation (12).
Compute θ as a geometric mean between the bounds θ lower and θ upper .
Step to compute adjacency matrix W s .
Compute W s from Equation ( 11) with Z = θZ

Blue-Noise Sampling
Many methods to perform a sampling over graphs can be found in the literature [33] in order to find a suitable sampling set S ⊂ V ∈ G.However, as explained in [34], is desirable to get samples from different structures or objects that are present in the image, that from the graph perspective this structures/objects can be capture by sub-graphs.Therefore, we will focus on the sampling method known as blue-noise sampling since it focus on extracting samples related to sub-graphs.This kind of sampling has been widely used in digital half-toning [35], but has recently been extended to graphs [25].
The blue-noise sampling gives as a result a subset S of vertices V that are as far as possible from each other in terms of the geodesic distances on G.The work in [25] demonstrated that the final subset given by blue-noise sampling leads to an accurate reconstruction of signals that is energy is mostly concentrated at the lowest eigenvectors of the graph Laplacian of G.
In order to assemble all the aforementioned methods into a single approach, we need a graph that contains the most relevant samples (i.e., pixels).To do this, we use a down-sampled version of the image taken from the GFKuts method [3] which contains probabilities of the pixels that belong or do not belong to the crop (see Figure 3).
The down-sampling is conducted to avoid computer memory saturation, by applying a square grid M ∈ R m 25 × n 25 .Secondly, to learn a graph by using the Algorithm 2 of the prior of smoothness, we use, as nodes, the vectorized version (M i ∈ R 625 ) of each square in the image, computing the distances between the nodes (Z ∈ R |V|×|V| ) and apply the prior of smoothness [24].Lastly, we apply the blue-noise sampling over the smoothed graph W s and inject these samples to the graph based fusion algorithm proposed in [4], in order to obtain the eigenvector with the highest eigenvalue of the fused-graph W f .Since the eigenvectors are embedded in a high-dimensional space equal to the resolution of the images (i.e., 960 × 1280), we use the same dimensionality reduction technique used in [4], based on the t distributed stochastic neighbor embedding (t-SNE) [36].Figure 3 summarizes proposed method.The purpose of a support vector machine regression is to solve a linear regression problem by mapping the input data into a high dimensional space (feature space).The following model is considered: where w e are the regression coefficients (weights), φ(x) : R in → R out with out in, and b is the bias term.To obtain the optimal values of W e , a loss function can be defined such as the Laplacian, Huber's, Gaussian or -intensive.We will use the most common loss function that is the -intensive with a regularization parameter C: where ξ * i and ξ i are slack variables with measure deviations larger than .By transforming the optimization problem in Equation ( 14) into a dual problem, the final solution is obtained by using Lagrange, the Karush-Kuhn-Tucker conditions, and the kernel trick [37], giving: where α α α, α α α * are the Lagrange multipliers and K(x) is a Kerrnel function.In this case we use the Radial Basis Function (RBF) Kernel (similar to Equation ( 7)).

A Nonlinear Autoregressive Exogenous (NARX)
An exogenous autoregressive model (NARX) aimed at the identification of non-linear systems involves both current and past values of the impulsive series that models the dynamics of the system.NARMAX (Nonlinear Autoregressive Moving Average with Exogenous Inputs) methods provide models that are transparent, and easily solve many problems such as contour errors in CNC machines, forecasting, network traffic, and prediction of daily solar radiation [38][39][40][41].The NARMAX model is defined in Equation ( 16).Where y(k) is the output system, u(k) is the input system and e(k) is the noise sequence.n y , n u y n e are the maximum lags for the system output, input and noise respectively; F[.] refers to the function and d is a time delay usually set to d = 1.
In this work, the function F[.] has been implemented using artificial neural networks (ANN) with two hidden layers and one output layer.

Experimental Setup
In order to compare the effectiveness of the proposed feature extraction based on graphs, the results were compared with two approaches: (i) the former graph method introduced in [4], namely GBF, and (ii) the GFKuts approach [3].We used the datasets and Ground-Truth reported in [3,4], containing 314 images for the vegetation stage, 82 for the reproductive, and 71 for the ripening.The captured images have a resolution of 960 × 1280 pixels, geo-referenced with the corresponding biomass measurements in grams (g) from the Ground-Truth (further information of the experimental protocol can be found at https://www.protocols.io/view/protocol-bjxskpne,accessed date: 24 June 2021), which was assembled as follows: 1 linear meter of the plants were cut from each plot of the crop and weighted to obtain the fresh biomass.Subsequently, the samples are placed inside an oven at 65 degrees Celsius for 4 days or until a constant weight is reached.This is known as the dry biomass.
Having the imagery dataset and the Ground-Truth, we trained two estimation models: classical SVM-R regression and a robust nonlinear autoregressive network with exogenous inputs (NARX), both accounting for 70% of the dataset for the training, and the remaining 30% for testing and validation.Lastly, the performance of the models was measured in terms of the root mean squared error (RMSE), the linear correlation (r), and the coefficient of determination (R 2 ).Additionally, we conducted several experiments with the SVM regressor considering different kernel functions (i.e., linear, polynomial, and RBF), while the best performance was achieved by the RBF Kernel (similar to the work in [4]).
The algorithms were tested in a server with two Intel(R) Xeon(R) CPUs E5-2650 v4 @ 2.20 GHz, with 24 physical cores, 48 threads of processes, and 252 GB of RAM.Parameters are detailed in Table 1.

Model Parameters
GFKuts [3] k-neighbors = 2 window radius = 60 = 10 −6 -GBF [ 6 show the experimental results obtained from the 3 methods in the following order: (i) GFKuts, (ii) former graph method (GBF), and (iii) proposed graph method with blue-noise and prior smoothness (GBF-Sm-Bs).Numerical data containing the aboveground biomass estimation results for each model are presented in Table 2.According to Table 2, the GKFkuts approach achieved the lowest RMSE = 129.490g, followed by the proposed graph-based model with RMSE = 155.498g.However, by applying the robust Narx regressor, the best performance was achieved by our proposal, with a RMSE = 45.358g. Results were also correlated to the linear fit, presented from Figures 4-6.There we can observe that the Narx regressor improves the fitting to a linear model, in which the proposed method achieved a linear correlation value of r = 0.995.
Contrasting with the previous work from [3], the Narx model enabled a slight improvement of about 3% over the former GFKuts method, due mainly to the configuration of the auto regressive inputs, and the use of vegetative indices with the soft mask generated by the GFKuts approach.Regarding the former graph-based method from [4] (GBF), results were improved by around 62.43% (155.498/249.058)and 45.42% (45.358/99.855)for the SVM and Narx regressors in terms of the RMSE metric, respectively.For the SVM regressor, the GBF and the GBF-Sm-Bs methods achieved a RMSE = 259.058g and RMSE = 155.498g respectively, as observed in Table 2, while for the Narx regressor, the GBF and the GBF-Sm-Bs methods achieved a RMSE = 99.855g and RMSE = 45.358g respectively, as presented in Table 2.This demonstrates the effectiveness of using a structured sampling method combined with graph learning, based on a prior of smoothness, for the extraction of relevant samples to be apply into Nyström extension.
Overall, the proposed GBF-Sm-Bs approach using Narx achieved an improvement in the RMSE metric of about 50%, while also improving the biomass correlations reported from previous works.Part of the improvement achieved with the GBF-Sm-Bs method comes from extracting more relevant features than GFKuts, precisely 89 features.While GFKuts works with only 7 VI-features (per image) highly sensitive to biomass variations, the graph-based method is characterized by the eigenvectors with the highest eigenvalue associated with the fused-graph W f , yielding 89 features per image.

Conclusions
The proposed methods for above-ground biomass estimation, not only enabled the precise characterization of the biomass behavior of the rice crops through the entire phenological cycle, but also worked as both image segmentation and feature extraction techniques, by associating relevant features from the canopy.It is worth mentioning that most of the existing body of work in remote sensing methods for high-throughput biomass estimations based on multispectral imagery, requires dedicated photogrammetry methods for image correction, segmentation and feature extraction.In this study, both GFKuts and the GBF-Sm-Bs methods solved those stages by combining them into one single approach.
According to the findings reported in Table 2, the proposed GBF-Sm-Bs approach obtained an average biomass correlation of 0.995 with R 2 = 0.991 and RMSE = 45.358g, increasing the precision in the estimation by around 45.42% (45.358/99.855),compared to the GBF method, and about 52.27% (45.358/86.769)compared to the GFKuts approach for the Narx regressor.This is a promising result towards GWAS gene characterization (Genomewide Association Study), that requires larger amounts of precise and accurate phenotyping data for the association of gene functions with a specific trait.In this regard, future work is aimed at the inclusion of clustering approaches within the proposed algorithms, to enable the extraction of features according to several plant varieties (genotypes).
As future work, it would be interesting to explore the graph convolutional network, which combines the structural information captured by the graphs with the high level of abstraction given by neural networks.Additionally, a strong approach for segmentation (i.e., different for square windows over the image) could lead to improved results, since the segmented regions are used to generated a graph with smoothness prior.

Figure 3 .
Figure 3. Proposed method based on the three stages: (a) Graph learning with prior smoothness, (b) blue-noise sampling to inject the samples to Nyström extension and (c) the fusion of the the multispectral images to extract features of the crop.