Estimation of Nitrogen in Rice Crops from UAV-Captured Images

: Leaf nitrogen (N) directly correlates to chlorophyll production, affecting crop growth and yield. Farmers use soil plant analysis development (SPAD) devices to calculate the amount of chlorophyll present in plants. However, monitoring large-scale crops using SPAD is prohibitively time-consuming and demanding. This paper presents an unmanned aerial vehicle (UAV) solution for estimating leaf N content in rice crops, from multispectral imagery. Our contribution is twofold: (i) a novel trajectory control strategy to reduce the angular wind-induced perturbations that affect image sampling accuracy during UAV ﬂight, and (ii) machine learning models to estimate the canopy N via vegetation indices (VIs) obtained from the aerial imagery. This approach integrates an image processing algorithm using the GrabCut segmentation method with a guided ﬁltering reﬁnement process, to calculate the VIs according to the plots of interest. Three machine learning methods based on multivariable linear regressions (MLR), support vector machines (SVM), and neural networks (NN), were applied and compared through the entire phonological cycle of the crop: vegetative (V), reproductive (R), and ripening (Ri). Correlations were obtained by comparing our methods against an assembled ground-truth of SPAD measurements. The higher N correlations were achieved with NN: 0.98 (V), 0.94 (R), and 0.89 (Ri). We claim that the proposed UAV stabilization control algorithm signiﬁcantly improves on the N-to-SPAD correlations by minimizing wind perturbations in real-time and reducing the need for ofﬂine image corrections.


Introduction
Farmers use soil plant analysis development (SPAD) devices as a field diagnostic tool to estimate nitrogen (N) content in the plant and to predict grain yield [1,2]. For rice crops, N provides critical insight into plant-growth, crop yield, and biomass accumulation [3]. Furthermore, monitoring N Remote Sens. 2020, 12, 3396 2 of 31 allows farmers to properly adapt crop management practices [4,5]. However, using SPAD devices for crop diagnosis is still time consuming. With the advent of low-cost unmanned aerial vehicles (UAVs), several authors have reported faster and accurate remote sensing tools and methods [6,7] to estimate the N canopy from aerial multispectral imagery [8,9]. One of the most common methods used, relies on sensing the canopy reflectance in both visible (VIS) and near-infrared (NIR) wavelengths, using hyperspectral sensors [10][11][12][13].
Multiple vegetation indices (VI) have been established to associate specific spectral reflectance wavelengths with the crop variables of interest [14,15]. Although these VIs can be used to estimate the N contents in rice plants, there is no single standard set of VIs that works across all crop stages, rice varieties, soils, and atmospheric conditions. Most of the existing body of work associated with the estimation of N contents in plants, has been conducted using multi-variable regressions [5,16], in which linear functions are defined by combining and weighting each VI according to the regression model. This approach is simple, but sometimes inaccurate since the evolution of the crop tends to exhibit highly nonlinear effects that affect the N content over time. Machine and deep learning methods have recently gained traction, in solving these issues. Machine learning has the potential to drive significant breakthroughs in plant phenotyping [17][18][19][20].
For canopy N estimation, training machine learning algorithms requires the precise extraction of VI features from the aerial multispectral imagery. Several authors have applied data fusion methods from different sensors [21][22][23] for applying image mosaicing techniques [24][25][26][27], computing crop surface models based on dense image matching (DIM) techniques [28], or applying photogrammetry methods that are computationally expensive [26,27,29,30]. Other approaches rely on the real-time segmentation and image registration for the extraction of relevant features associated with the leaf/canopy N, including edge detection thresholding, color histograms, and clustering (otsu, K-means, watershed) [31][32][33].
In this work, we improve on the stabilization control of the UAV for acquiring accurate plot imagery, instead of relying on mosaicing or photogrammetry methods. Commercial UAV quadrotors used for the monitoring of the canopy N usually come with a proportional-integral-derivative (PID) waypoint navigation autopilot. The lack of proper UAV angular stabilization in the presence of large wind perturbations limits the accuracy of image registration algorithms, therefore affecting the estimation of canopy N content through image processing. To address this problem, we demonstrate a nonlinear trajectory-tracking controller that enables precise UAV positioning through a stabilization/attitude control loop. We call this method backstepping+ desired angular acceleration function (BS + DAF). This approach incorporates UAV aerodynamics information within the control law to produce roll and pitch acceleration commands to reduce abrupt angular acceleration changes caused by external wind disturbances. This added compliance enables the UAV to hover over the crop plots precisely, which in turn allows for capturing imagery that can be individually processed in real-time.
Here, we combine an autonomous UAV to acquire multispectral imagery from a crop, with machine learning methods as a means to high-throughput nitrogen content estimation. Three major contributions are involved: (i) The development and validation of a novel UAV attitude control called BS+DAF. The UAV captures multispectral imagery with relevant above-ground data that must correlate with the SPAD measurements at the ground-level of the crop. The proposed controller is aimed at maintaining precise angular stabilization during flight by properly rejecting wind disturbances, enabling improvements in the estimations of N. (ii) The application and validation of a segmentation algorithm called GrabCut [34], instead of the traditional edge detection thresholding methods, color histograms, and clustering techniques used in agriculture. The GrabCut approach achieves smooth pixel information with richer detail of the canopy structure, enabling the accurate segmentation of the multispectral imagery and the proper VI-based feature extraction.
Remote Sens. 2020, 12, 3396 3 of 31 (iii) The integration of machine learning methods to process nonlinear N dynamics with the calculations of the VIs during all stages of crop growth. A comprehensive characterization of the crop by designing a ground-truth dataset with several contrasting rice genotypes and accurate direct measurements of leaf nitrogen (training model). Figure 1 details the proposed approach. Our UAV is a quadrotor called the AscTec Pelican (manufactured by Intel's Ascending Technologies (https://robots.ros.org/astec-pelican-and-hummi ngbird/)). This UAV comes with a C++ software development kit (SDK) that enables onboard code development with ease. A high-level Atom Intel processor (HLP) offers enough computing power to run solutions in real-time, whereas a low-level processor (LLP) handles the sensor data fusion and rotor control with an update rate of 1kHz. As shown in Figure 1, we developed a closed-loop attitude controller to drive the low-level rotor's control running in the LLP. This control method depends on the dynamics of the UAV to properly reject wind disturbances and keep the UAV steady. During flight, a dataset of multispectral imagery is precisely collected aiming at the above-ground estimation of nitrogen by using machine learning methods. In previous work [35], our UAV system was tested during two years of in-field experiments with the aim of estimating above ground biomass accumulation. Here, we have extended the capabilities of the system in Figure 1 by developing and integrating novel methods for UAV flight control, multispectral imagery segmentation for VI feature extraction, and their impact on nitrogen estimation using machine learning algorithms.   The UAV was equipped with the Parrot Sequoia multispectral sensor (https://www.parrot.com /business-solutions-us/parrot-professional/parrot-sequoia) fabricated with 4 separate bands: green, red, red-edge, and near-infrared. The camera offers a resolution of 1280 × 960 for each independent spectral sensor, yielding a crop-to-image resolution of 1.33 cm/pixel according to the flying altitude of 20 m. In addition, the multispectral camera comes with a radiometric calibration target that enables offline reflectance calibration across the spectrum of light captured by the camera, and an onboard irradiance sensor designed to correct images for illumination differences, all of which enables its outstanding performance in cloudy conditions, as evaluated in [36]. In our application, this calibration procedure was executed prior to any flight mission of the UAV. Additional sensors such as an IMU, magnetometer, and GPS are also embedded within the sunshine sensor. Figure 2a depicts the mounted camera setup, while Table 1 details the general specifications of our system.

Rice Crops
The crops were designed with 3 spatial repetitions containing 8 contrasting rice genotypes in terms of N concentration, biomass accumulation, and flowering: FED50, MG2, ELWEE, NORUNKAN, IR64, AZUCENA, UPLRI7, and LINE 23. These rice varieties have a phenological cycle ranging between 86-101 days , as detailed in Figure 3b. The experimental design consisted of six months of in-field sampling with three different planting crops. For each experiment, the amount of applied nitrogen and water was modified as follows: Experimental trials were conducted during the dry season. To assess the effects of limited and permanent irrigation on the crops, leaf temperature (MultispeQ (https://www.photosynq.com)), and soil humidity (AquaPro (http://aquaprosensors.com)) were constantly monitored. Fertilizers were applied accordingly in order to maintain the crops through the phenological cycle. Given that, the same amount of fertilizers were applied for the experiments: 60 kgha −1 of P 2 O 5 (diammonium phosphate) and 130 kgha −1 of K 2 O (potassium chloride), as detailed in the experimental protocol available in the Supplementary Materials section. Figure 2 details some characteristics of the crop. As shown in plots (a), (b), the length of a plot was about 2.75 m with a distance between plants of 25 cm and 30 cm between rows. Within each plot, we defined 6 sampled areas with 1 linear meter in length (containing four plants). A ground-truth was defined based on the direct measurements of plant chlorophyll using a SPAD 502 Plus meter (Konica-Minolta) over these sampled areas. Measurements from the crop were obtained during three stages of rice growth: vegetative, reproductive, and ripening. Figure 3b details Table 2 details how the dataset was collected, in which the SPAD value corresponds to the average of 24 measurements conducted over each plot, as depicted in Figure 3b. All ground-truth data are available through the Open Science Framework in the Supplementary Materials section. On the other hand, Figure 3a shows the linear correlation used to relate the measured SPAD value with the leaf-blade N concentration [37].
As mentioned, vegetation indices are widely used to quantify both plant and soil variables by associating certain spectral reflectances that are highly related to variations in canopy chemical components such as nitrogen. From the extensive list of vegetation indices (VIs) available [38][39][40][41], we selected 7 VIs with sufficient experimental evidence and quantitative trait loci (QTL)-based characterization regarding their high correlation with rice nitrogen [42]. Table 3 details the selected VIs. These formulas are applied in different wavelengths for taking into account the changes in canopy color, since several factors affect the spectral reflectances of the crop: solar radiation, plant morphology and color, leaf angles, undergrowth, soil characteristics, and water.
In our system, the parrot sequoia camera has a solar radiation sensor that compensates the light variations in the canopy. The change in the rice canopy color is perhaps the most notable variation during the phenological cycle. In the vegetative stage, the green color is predominant whereas in the reproductive stage, panicle formation starts and thus yellow features appear in the images. In ripening, the maturation of the plants occur while the leaves begin to senesce, turning the yellow color predominant. These changes can be observed in Figure 2c. Furthermore, different wavelengths of light have a different level of plant absorption depending on the leaf composition given by several genetic traits. In particular, the relation between the selected VIs in Table 3 with the photosynthetic activity and canopy structural properties has allowed the association of certain spectral reflectances that are highly related to the physico-chemical canopy N variations in plants, especially the green, red, red-edge, and near infrared bands. The selected VIs exhibit a strong dependence on the NIR reflectance due to leaf chlorophyll absorption, providing an accurate approach to determine the health status of the plants and the canopy N. Most of the existing body of research focused on multispectral-based N estimations [14,21,40,42,43], combine the information provided by several vegetation indices in order to avoid saturation issues. For instance, the NDVI, which is one of the most common VIs used, tends to saturate with dense vegetation. In turn, the NDVI alone is not accurate during the reproductive and ripening stages of rice growth. Here, we combine several VIs across the crop stages, to ensure data on wavelengths located in Remote Sens. 2020, 12, 3396 8 of 31 the red-edge and another spectral reflectances that accurately express the healthy status of the leaves (higher NIR and green-band readings).

UAV Stabilization Control
The estimation of the canopy N requires the precise extraction of Vegetative Index features from the acquired multispectral imagery. Capturing aerial images during UAV hovering that accurately matches with the GPS positions of the SPAD measurements at ground-level is essential.
As previously detailed in Figure 1, three closed-loop controllers are needed to regulate: (i) the X-Y position based on GPS feedback, (ii) the Z altitude based on barometric pressure and laser readings (pointing downwards), and (iii) the φ, θ, ψ attitude based on IMU data. The UAV is constantly subjected to wind disturbances that cause unsteady angular motions and therefore imprecise trajectory tracking. Consequently, aerial imagery captured across the crop with the UAV system is affected. To overcome this issue, we replaced both roll and pitch PID-based controllers by a robust nonlinear backstepping (BS) control.
The classical BS method has several advantages. It explicitly takes into account the nonlinearities of the UAV model defined in Equation (A2) (see Appendix A), and most importantly, allows the incorporation of a virtual control law to regulate angular accelerations. For this, we have derived a desired acceleration function (DAF) for roll and pitch. This enhanced controller is called backstepping+DAF (BS+DAF). Our goal is to use the dynamics equations of motion (EoM) defined in Algorithm A1 within the control law in order to compensate for abrupt angular acceleration changes, concretely in roll and pitch. The DAF terms improve the responsiveness of the control law to external perturbative forces. Appendix B details the control law derivation for roll and pitch.
The BS + DAF control supports on the Lyapunov stability concept that guarantee asymptotic stabilization around the equilibrium points. For our application, we require both roll φ a pitch θ angles to remain in zero while the UAV is hovering above the crop for capturing multispectral imagery, i.e., e φ = φ d − φ → 0 and e θ = θ d − θ → 0. Otherwise, the set-point references for both roll and pitch controllers are defined by the X-Y position controller.
As previously mentioned, high wind-speed perturbations affect the UAV during hovering. Nonetheless, our controller law is sensitive to angular acceleration changes caused by external wind disturbances, thanks to both error dynamics (e 2 ) and the DAF terms (φ d ) in Equation (A18) (Appendix B). Given that, we present how to model the effects of the wind over the UAV aerodynamics, by following the wind effect model (WEM) developed by [44]. Figure 4 details the control architecture and how the WEM module has been incorporated as an external disturbance acting on the UAV body frame.   . Attitude (roll and pitch) control architecture. The UAV has a low-level PID-based controller to drive each rotor. The proposed backstepping+DAF generates the references to the inner loop according to the dynamics of the system. A wind effect model has been adopted from [44] to add disturbances to our model. The WEM defines the relationship between the oblique airflow acting on a propeller and the resulting forces and moments applied onto the UAV. In our quadrotor system, the oblique airflow form and angle of 90 o with the propeller azimuth angle. In terms of aerodynamics, the incoming airflow induces an increase in vertical thrust (T oi ), a side force (F wind ), and a pitching moment (M wind ). The WEM presented by [44] uses blade element momentum theory to determine the aforementioned effects. Here, we are interested in incorporating the effects caused to the thrust and the pitching moment (M wind ) as a function of the magnitude of a given wind speed vector V wind and the rotational speed ω of each propeller driven by the rotor's model. In this regard, the WEM is defined as: The angle β determines the direction of the V wind vector. Depending on the direction of the applied relative wind, certain propellers will be more affected by this external disturbance. Tran et al., in [44,45], propose the use of a weight function that assigns values between 0 to 1, where the maximum value of 1 means the propeller is directly exposed to the wind. In Equation (1), C corresponds to the weighting vector that affects the thrust T generated by each propeller. The parameter l is the distance between non-adjacent propellers and M prop is: Therefore, the scale of the weighting vector C is defined based on the magnitude of the applied wind speed vector and the corresponding direction (the β angle between the relative wind vector pointing towards the UAV body frame). In addition, the rotor's speeds ω are required to calculate the augmented thrusts T and torques τ (weighted by C). Further details on this model can be found in [44,45].
Sections 3.1 and 3.3 present the results regarding the impact of precise UAV positioning on the accurate estimation of the canopy N through the entire phenological cycle.

Multispectral Image Segmentation
Multispectral imagery is evaluated by using a multispectral image segmentation method called GrabCut [34]. The original method is widely used for its ease of implementation and for the excellent results in generating a binary classification; however, it suffers from the drawback of being a semi-manual algorithm. The original GrabCut method requires a manual input during the algorithm iteration in order to properly determine both background and foreground pixel values.
This section introduces a modified version of the GrabCut algorithm that is fully automatic, thanks to an added refinement step using a guided filtering [46] to extract the relevant pixel information associated with the plant's canopy. Our approach solves an optimization problem using an energy function that allows the proper labeling of texture in the multispectral image by using a Gaussian mixture model. As mentioned, we use a refinement process based on a guided-filter by taking into account information from all band channels of the multispectral camera: green, red, red-edge, and near-infrared. The resultant multispectral image-mask includes only relevant pixel information that accurately represents the canopy for the estimation of N.

GrabCut Segmentation
The GrabCut method requires an initial model known as a trimap. This model consists of a partition of the input image into three regions: a foreground T F , a background T B , and a region with pixels that result from the combination of the foreground and background colors T U .
The image is modeled as an array of values z = (z 1 , ..., z N ), and the output segmentation can be a channel of values between 0 and 1 or a hard segmentation with a binary assignment (0 or 1). The segmentation is written as The GrabCut algorithm also requires a model for the distribution of the foreground/background colors and gray levels. This distribution can be represented as a histogram directly assembled from T F and T B , as: θ = {h(z; a), a = 0, 1}. Under this trimap model, the segmentation algorithm determines the values of α from the image z and the distribution model θ. The α values are calculated from an energy minimization function, as: where the sub-function U evaluates the fitness by assigning a low score if the segmentation (α) is consistent with the image z, defined as follows: Instead of using the histogram as the estimator of the probability density function, the algorithm uses a Gaussian mixture model in order to take into account the information coming from all channels.
In Equation (5), the sum set C ∈ 3 × 3 refers to the neighbors pixels in a given window, and the the term β can be calculated according to Equation (6).
By using the global minimum in 7, the image segmentation is estimated as: Algorithm 1 details the original GrabCut method. In order to eliminate the third manual step of the algorithm, a fixed background image T B mask is used during the iteration, and a guided-filter refinement process is applied to achieve an automatic segmentation, as detailed in Algorithm 2.

Algorithm 1: Original GrabCut algorithm.
Step 1: Initialization of T B , T F , T U .
Initialize the trimap T F as the foreground Initialize the trimap T B as the background All remaining pixels are set as a possible foreground pixels T UF Step 2: Iterative minimization. Assign and learn GMM Minimize energy function Estimate segmentation image α IF image α have visible errors GO to Step 3 Step 3: User editing.
Use the segmented image α as the new possible foreground pixels T UF Input pixel hints for T B and T F GO to Step 2 Algorithm 2: Modified GrabCut with guided filter calculation. In the algorithm, E r (I) denotes a function that calculates the image mean over a radius r, the operations . * and ./ denotes the matrix element-wise calculation, and q is the image output.
Step 1: Initialization of T B , T F and T U .
Initialize the trimap T F as the foreground Initialize the trimap T B as the background All remaining pixels are set as a possible foreground pixels T UF Step 2: Iterative minimization.

Assign and learn GMM Minimize energy function Estimate segmentation image α
Step 3:Input image p, input guidance I, radius r, and regularization .
The guided filter (GF) [46] can be defined as a convolutional Bilateral Filter with a faster response due to its O(N) computational complexity. In this regard, the output of each pixel denoted as q can be expressed as a weighted average over the convolutional window W (i, j denote the pixel coordinates): The GF implies that an image can be filtered using the radiance of another image as guidance. We exploited this concept to filter our segmented plot mask created with the GrabCut algorithm, with the aim of refining the segmented image with the original image as guidance. In this regard, the weight used by the GF is given by: where W GF ij depends entirely of the guidance image I. The parameters µ k and σ 2 k are the mean and variance of the guidance image I estimated over a window w k , denotes a regularization parameter and |ω| is the number of pixels in the window w k . Section 3.2 presents the results of applying the proposed modified version of GrabCut, by comparing the method against traditional segmentation techniques such as thresholding, K-means, meanshift, but also against the original semi-manual GrabCut method.

Machine Learning for N Estimations
As detailed in Table 2, the ground-truth for training machine learning (ML) algorithms was defined based on the direct measurements of plant chlorophyll using a SPAD 502 Plus meter (Konica-Minolta) over these sampled areas, as depicted in Figure 1. Datasets contain the measured SPAD value that directly correlates with the leaf-blade N concentrations by following the linear correlation [37] defined in Figure 3a. In this regard, measurements from the crop were obtained during three stages of rice growth: vegetative, reproductive, and ripening, in which 3 trials were conducted per crop stage. These datasets are the result of 10 flights per trial, capturing around 500 images, and yielding a database of 1500 images per stage. Since 3 trials were conducted per crop stage, around 13, 500 images were processed in this work. Figure 3b details the experimental specifications.
The ML methods were trained with a set of images accounting for the 60% of the entire database. For the final estimations of leaf nitrogen, we used the remaining 40% of the database (testing phase of the ML methods). The entire imagery dataset and the ground-truth available in the Supplementary Materials section. Here, we report on the use of classical ML methods based on multi-variable linear regressions (MLR), support vector machines (SVM), and artificial neural networks (NN) for the estimation of the canopy N.
For MLR models, the VIs from Table 3 were combined by following the formula: where the parameter β k changes accordingly to the growth stage of the plant, by weighting each VI independently. In this regard, each crop stage will have an independent MLR model that linearly fits the N content.
With the aim of improving the adaptability of the estimation models by considering several nonlinear effects (dense vegetation, optical properties of the soil, and changes in canopy color), this section presents the estimation results using support vector machines (SVM) and artificial neural networks (NN). SVM models were used with different kernel functions with the aim of determining the proper mathematical function for mapping the input data. Six different kernels based on linear, quadratic, cubic, and Gaussian models will be tested for each crop stage independently.
Contrarily to the MLR and SVM methods, in which N estimation models are determined for each crop stage independently, neural networks enable the combination of the entire dataset (SPAD values and VI extracted features) into a single model that fits for the entire crop phenological stages. Several non-linear training functions are tested with different hidden layers. In addition, we discarded the use of deep-learning methods such as convolutional neural networks (CNN), due to the high computational costs associated to the pooling through lots of hidden layers in order to detect data features. For this application, we use well-known vegetative index features (reported in Table 3) that have been widely used and validated in the specialized literature [14,40,42]. Other image-based features such as color, structure, and morphology do not work well with multispectral imagery of 1.2 mega-pixel in resolution, compared to the 16 mega-pixel in the RGB image. In fact, the main advantage of using VIs (as features for training), relies on having information at different wavelengths, providing key information of the plant health status related to N.
In Section 3.3, we report a comprehensive comparison among multi-variable linear regressions (MLR), support vector machines (SVM), and artificial neural networks (NN) for the estimation of the canopy N. We used three metrics based on the root mean square error (RMSE), Pearson's linear correlation (r), and coefficient of determination (R 2 ), detailed as follows: where n is the total number of samples, N is the measured value (SPAD scale cf. Figure 3a),N is the estimated nitrogen, and N is the mean of the N values. In this application, the Pearson's metric indicates the linear relationship between the estimated value of N VS the measured one (ground-truth).
On the other hand, the R 2 metric is useful since it penalizes the variance of the estimated value from the measured one. As mentioned, the ML models require the calculation of the VI formulas presented in Table 3, in order to determine the input feature vector. Every image captured by the UAV is geo-referenced with the DGPS system with a position accuracy of 35 [cm] (see Table 1). Given that, aerial imagery is registered by matching consecutive frames according to a homography computed with the oriented features from accelerated segment test (FAST) and rotated binary robust independent elementary feature (BRIEF) computer vision techniques [47]. The UAV path is planned by ensuring an image overlapping of 60%, which allows for the precise matching with the GPS coordinates of the ground-level markers to ensure a proper comparison between the aerial-estimations and ground-measurements of plant N. Part of this procedure is described in previous work reported in [48]. In turn, the metrics described in Equation (10) report on the performance of the ML models based on the aerial-ground data matching of canopy N. The aforementioned geo-referencing process is conducted by using the affine transformation, a 1st order polynomial function that relates the GPS latitude and longitude values with the pixel coordinates within the image. This procedure is also detailed in previous work reported in [48]. Figure 5 details the procedure required by the machine-learning models. After registration, images are segmented by using the proposed GrabCut method. Although all pixels in the image are evaluated in Algorithm 2, we use pixel clusters of 10 × 10 to calculate the VI formulas, as shown in Figure 5a. After segmentation, pixels representing the rice canopy are separated from the background, as shown in Figure 5b. All vegetation indices are calculated as the average within each image sub-region.
At canopy-level, several factors affect the spectral reflectances of the crop: solar radiation and weather conditions, plant morphology and color, leaf angles, undergrowth, soil characteristics, and ground water layers. As mentioned in Section 2.1, the multispectral camera comes with an integrated sunshine sensor to compensate light variations in the resultant image. In addition, the GrabCut segmentation method deals with the filtering of undergrowth and other soil noises. Despite that, it remains crucial to validate the accuracy of the selected VIs, since the estimation of N depends on the accuracy and reliability of the extracted features. In this regard, Figure 5c presents several VI calculations in order to analyze the variance of the VI features through the entire phenological cycle of the crop. For this test, we calculated the VI formulas from 360 • random images per crop stage under different environmental conditions. We show the most representative VIs to nitrogen variations: simple ratio (SR), normalized difference vegetation index (NDVI), green normalized difference vegetation index (GNDVI), and soil-adjusted vegetation index (MSAVI). As observed, the low variance exhibited through the entire phenological cycle allows us to use the VI features as inputs to the ML models detailed in Figure 5d.

Results
The experiments reported in this paper were carried out during 2018 in the rice farms of the Center of International Agriculture (CIAT). Trials were conducted during the dry season (June-September), as detailed in Table 1. We conducted 3 trials per crop stage (vegetative, reproductive, and ripening). Figure 6 shows simulation results to evaluate the performance of the proposed controller in terms of wind-disturbance rejection. In plot (a), the desired trajectory (red line) was defined for the UAV to cover the crop while following the trapezoidal velocity profile shown in plot (b). This trajectory profile enables the UAV to hover at certain points (black dots) to capture aerial images. The maximum UAV velocity was set to 1.5 ms −1 .

UAV Control Results
In plot (a), a wind disturbance (V wing = 9 ms −1 ) was added at the starting point of the trajectory, causing the UAV to mismatch the desired path (the V wing vector was applied onto the y + direction). The response of the UAV driven by the BS-DAF attitude controller corresponds to the black line, whereas the PID controller is the green line. As observed, the BS + DAF immediately counteracted the disturbance by generating the corresponding roll command φ depicted in plot (d). This response allowed the UAV to follow the path precisely, maintaining the position error along the y axis near to zero, as shown in plot (c). Under this scenario, our system obtained a maximum tracking error of 2%.
A second wind disturbance was added when the UAV reached 10 m in altitude. In this case, V wing was applied onto the z-direction (z points downwards). As observed, an augmented thrust caused a large altitude tracking error that affected the UAV for both control schemes similarly. Unlike for roll and pitch, the BS+DAF control does not drive the altitude loop. In general, the UAV was able to position in the hovering knot-points accurately (black dots in plot a), maintaining a minimum error with the GPS waypoints. For the rest of the trajectory, the results demonstrate that the backstepping + DAF is accurate and reliable to reject external wind disturbances.

Image Segmentation Results
Algorithms 1 and 2 are applied for each input image in an iterative manner. In Figure 7a, two markers placed as geo-referenced guiding points appear in the RGB image (red dots in the top-right corner). The algorithm provides to the user with the option to manually select these points in order to remove them from the segmentation process. An example of semi-manual GrabCut segmentation with minimal user interaction is shown in Figure 7d. If those points are not manually removed, the corresponding pixels will be classified either in the background of foreground cluster automatically. Subsequently, plots (b) and (c) show how the initial foreground and background are defined. Plot (d) shows an example of applying Algorithm 1 to the input image from (a), whereas plot (e) shows the results when combining the GrabCut with the guided-filter (GF) approach, yielding an automatic segmentation process, as detailed in Algorithm 2. As shown in plot (e), the GF refinement achieves richer detail in the final segmentation. This result can be compared against a traditional Otsu's threshold method shown in plot (f). Although the results from Figure 7e are promising, we implemented a final refinement process based on the so-called GF feathering filtering, in which Algorithm 2 is used with a carefully selected radius r and regularization parameter . The GF feathering filtering enables a faster implementation of Algorithm 2 by avoiding the O(N 2 ) computational restriction of the convolution.
The quality of the proposed segmentation method was tested and compared against the binary mask segmentation of three well-known methods: (i) k-means [49], (ii) mean-shift [50], and (iii) manual threshold over the HSV color representation [51]. The acronyms used in the comparison results are listed in Table 4. By applying the fully automatic segmentation method described in Algorithm 2, we used the precision, recall, and the F1-score to measure the performance of the proposed segmentation listed as GCFauto in Table 4. Figure 8 and Table 5 show the results.
For image pre-processing, the automatic GrabCut method in Algorithm 2 with the guided-filter refinement enabled precise plot segmentation of multispectral imagery with richer detail of the canopy structure and proper elimination of the background cluster. In this regard, Figure 9 shows the final segmentation results. In plot (a), the segmented image is achieved by combining the multispectral image shown in plot (b) and the RGB images from plot (c). In plot (d), the values for the radius and depend on the image size. For this application, those values were experimentally determined as r = 60 and = 0.001 2 . Without this segmentation method, significant image corrections would be needed.   Table 4.

Nitrogen Estimations
In the following, we present a comprehensive comparison among multi-variable linear regressions (MLR), support vector machines (SVM), and artificial neural networks (NN) for the estimation of the canopy N. All these models were trained using the ground-truth (cf. Table 2) containing the direct measurements of leaf chlorophyll based on SPAD readings. In this regard, the estimation results are all given in SPAD scale, in which the linear relationship with nitrogen is defined as: N = 0.079(SPAD) − 0.154, according to the work reported in [37]. Figure 10 shows the estimation results using the MLR. The samples-axis corresponds to the aerial imagery used for the estimation of nitrogen thought the phenological cycle. As previously mentioned, direct SPAD measurements were conducted for selected crop plots in order to assemble the ground-truth dataset. Given that, our system selects those aerial samples matching with the GPS coordinates of the ground measurements. Overall, the MLR achieved an average N correlations of 0.93 (V), 0.89 (V), and 0.82 (Ri). The data was filtered using a Moving Average Filter to reduce signal fluctuations and smooth the estimation result. Table 6 details the numerical data. Since the coefficients for the linear regressions were found and calibrated for each stage independently, we found strong linear relationships in the vegetative stage between the VIs and the N accumulation. Through the ripening stage, the yellow color becomes predominant and parcels cannot be distinguished with ease (as shown in Figure 2c). This changes in the canopy color and dense biomass accumulation tend to saturate the linear relationships between the VIs and the canopy reflectances.  Figure 10. Multi-variable linear regressions (MLR). Numerical data reported in Table 6.

SVM Models
We first tried to find a single SVM model for the entire crop phenological cycle; however, the canopy reflectances and the VIs changed drastically from vegetative through ripening. In this regard, an SVM model was defined for each crop stage independently, as shown in Table 7. In addition, Figure 11 details the estimation results for the N dynamics. Based on these results, the following SVM configurations are selected for our estimation system:

NN Models
Neural networks (NN) with several hidden layers and optimization algorithms were tested. We highlight the results with two and five hidden layers. In Figure 12, the NN with two layers achieved a correlation = 0.964, RSME = 3.315, and R 2 = 0.94. Increasing up to five layers, the NN achieved a correlation = 0.986, RSME = 1.703, and R 2 = 0.97. Several training functions were tested for each crop stage, where the BFGS Quasi-Newton functions achieved accurate results for most of the vegetative stages, whereas the Levenberg-Marquardt function for both reproductive and ripening. The numerical data are reported in Table 8. Based on the results, the following NN configurations are selected for our estimation system:

Máquinas de Soporte Vectorial
En esta etapa del documento se mostrarán las pruebas realizadas a las diferentes SVMs planteadas en el capítulo anterior.
Las pruebas se harán con los

Máquinas de Soporte Vectorial
En esta etapa del documento se mostrarán las pruebas realizadas a las diferentes SVMs planteadas en el capítulo anterior.
Las pruebas se harán con los Una vez se obtienen las máquinas que manifiestan una mayor correlación y sus correspondientes Kernel con base a las tablas, se procede a modificar el parámetro, con el fin de analizar el comportamiento de la máquina con diferentes valores de . Se presentarán los resultados por etapa variando el parámetro en la sección de análisis de resultados.   Figure 12. Artificial neural networks (NN). A 9:1 configuration was used for the two layer NN, whereas a 17:12:9:6:1 configuration for the five layer. In addition, several training functions were tested. Table 8 reports the numerical data.  Figure 13 shows overall N estimation results obtained for each machine learning model. In general, we demonstrated strong correlations between the canopy N and the corresponding vegetative indices. In the case of MLR models, the coefficients for the regressions were determined and independently calibrated for each crop stage. We encountered that the N dynamics have strong linear dependencies with MSAVI, GNDVI, and NDVI; concretely, during vegetative and reproductive stages. Table 9 reports the overall numerical data.
From the ROC curve reported in Figure 13d, the accuracy (ACC) was calculated for each ML model. Neural networks achieved an average ACC = 0.85, improving the estimations of canopy N over the other ML models. In fact, by comparing the N-to-SPAD correlations reported on Table 9, the correlation metric was improved over each crop stage. Table 9 compares the mean N estimations achieved by each ML method against the mean SPAD-based N readings measured at ground level. Mean results are presented for each crop stage: vegetative (V), reproductive (R), and ripening (Ri). The last columns in Table 9 report the mean linear correlations between estimations and measurements. La figura 58 presenta un modelado más acertado si es contrastado contra el modelado de la figura 57, dado que, al tener el parámetro épsilon menor, el híper-plano debe tener una distancia menor a los datos a estudiar, siguiendo la tendencia de una manera más aproximada.
Las siguientes gráficas muestran la unión de todos los datos diferenciando la forma de entrenamiento. La figura 59 muestra el resultado de la estimación con los datos entrenados por separado con las máquinas que mejor correlación y 2 presentaron, mientras que la figura 60 muestra el resultado de la unión de todos los datos y un solo entrenamiento para todos. Dado el resultado encontrado en esta sección, este set de datos fue entrenado con Kernel cuadrático con un valor = 2, parámetros que mostraron el mejor desempeño en el entrenamiento por separado. En la figura 62 se muestra el comportamiento total de las redes neuronales definidas previamente. A su vez en la figura 61 se observan los resultados obtenidos si en vez de realizar una red neuronal por etapa se implementa solo una red para el conjunto de datos total. Esta red neuronal tiene seis neuronas en su capa oculta y su función de aprendizaje es el algoritmo Levenberg-Marquardt, ya que esta fue la red que predominó en el mayor desempeño entre las demás redes implementadas.

Análisis ROC
Una vez conocidas cada una de las máquinas que mayor acercamiento tienen a la realidad para cada una de las técnicas de Aprendizaje de Máquina, se procede a valorar el desempeño de clasificación de las distintas máquinas respecto a las etapas de cultivo. Para esto, se evalúa la capacidad que tiene cada una de las máquinas para discriminar los niveles de nitrógeno estimados según la etapa de cultivo a la que 47 corresponden, ya sea etapa de reproducción o de cosecha. En la figura 63 se puede observar las curvas de desempeño ROC entregadas por cada una de las máquinas.

Figura 63. Curva ROC
Además, en la tabla 23 se observa la tabla de Accuracy, que entrega el porcentaje de exactitud de cada una de las máquinas, indicando cuál de todas posee una mayor probabilidad de clasificar los datos correctamente. La máquina con un mayor nivel de exactitud según esta medición son las redes neuronales. Realizando una comparación entre los datos numéricos y las gráficas (28,29,30), se analiza que; aunque los datos filtrados suavizan como tal la tendencia de los datos, no siguen la referencia de cambios bruscos de los datos groundtruth. Si se fija la atención en la primera pendiente de subida de la figura 28, cabe notar que la estimación sin filtrar sigue esta recta de forma más aproximada que la función filtrada. Esto se remite al hecho de evitar vértices muy marcados por parte del filtro. Sin embargo, si se realiza una estimación poco acertada, el filtro minimizará este error, caso de los sets de datos correspondientes a las etapas de reproducción y cosecha (Figuras 29 y 30 La figura 56 muestra las tendencias de datos superpuestas, tanto la estimada como los datos groundtruth. Esta gráfica muestra la unión de los datos estimados a partir de entrenamientos separados. A partir de esta gráfica se muestra una linealidad en los datos de entrada que pueden ser representados a través del método de regresiones lineales. En la figura 55 se muestra la unión de los datos, buscando los coeficientes de modelamiento todos a la misma vez, por lo que al intentar modelar una función más compleja éstos no presentan una buena correlación, la linealidad entre todo el conjunto de datos no se encuentra tan presente y los datos estimados se presentan muy distantes de los datos de referencia groundtruth, tal como se ve en la tabla 4.  Table 9 reports the numerical data. By comparing the neural networks (NN) against the multi-variable linear regressions (MLR) and support vector machines (SVM), the former estimator achieves higher correlations partly due to the reliability and size of the assembled ground-truth database for training, i.e., reliable N datasets (direct N-leaf measurements) with sufficient variations of the crops during the growth stage (see the experimental protocol in Section 2). In addition, some VIs, such as the simple ratio (SR), tend to saturate as long the the crop grows, but other vegetative indices that behave better with higher biomass and N concentration compensate the effect of the saturated ones. The nonlinear combinations of these VIs enable the NN to achieve accurate estimations.
Furthermore, we tested the effects of the UAV control architecture proposed in Section 2.3, for improving the N estimations by means of precise position tracking during flight. Figure 14  Sample Regions  The paths followed by the UAV are shown for both controllers; one driven by the PID that comes standard with the UAV autopilot, and the other driven by the proposed BS+DAF. Likewise, the star markers represent the GPS points of the images taken while hovering (black color for the PID and red for the BS + DAF). As observed, the attitude stabilization of the UAV seems to be more affected with the PID, since the aerial samples within each hovering area (black star markers) are spread over different positions. Instead, the aerial samples obtained with the BS + DAF (red star markers) tend to be more concentrated within the crop plot of interest. The goal is actually capturing most images in the GPS coordinates that accurately matches with the GPS position of the white markers at ground-level, as depicted in Figure 14a.
Finally, Figure 14c presents the N estimation results achieved under both control flight scenarios. Larger fluctuations, noise, and estimation errors are presented when the ML methods are trained and tested with the imagery dataset captured by the PID-driven UAV. This is mainly due to the imprecise X-Y positioning of the aerial vehicle during hovering (caused by improper attitude stabilization), in which several images capture crop areas that do not necessarily match with the ground-level markers, as detailed in Figure 14b. On the other hand, the proposed BS+DAF controller demonstrated being suitable for the proper regulation of the angular motions of the UAV by compensating external disturbances faster and achieving accurate X-Y positioning. As a result, The N estimations correlate higher with the ground-truth. As shown in the histograms in Figure 14d, the N-to-SPAD correlations grouped higher for the experiments driven by the proposed BS+DAF controller.

Discussion
The results presented in this work demonstrate a significant improvement in leaf nitrogen prediction in rice crops from images obtained via UAVs, validated with high SPAD correlations. These results are thanks to a combination of technological contributions, namely: (i) a novel UAV stabilization control scheme named BS+DAF that enables precise multispectral aerial image acquisition and registration with ground-truth fiducials, even in the presence of strong wind perturbations, (ii) an automated GrabCut image segmentation method, which leads to finer detail of the plant's canopy in the RGN space, as detailed in Figures 5b and 9d, and (iii) the successful application of machine learning methods trained with the vegetation indices (VI) extracted from the segmented multispectral imagery. Figure 5c confirms the increased reliability and accuracy per VI by calculating the data variance at each crop stage.
In a similar manner, Table 9 summarizes the higher nitrogen-to-SPAD correlations achieved with the implemented neural networks, and confirms the significant nonlinear relationships between leaf spectral reflectances (VI) and chlorophyll-based estimated N concentrations. These N estimation results are comparable to other state-of-the-art results presented in the scientific literature. In [52], Figure 7b shows N estimations in rice using linear regression models, for which the authors report: RMSE = 0.212, correlation r = 0.89, and R 2 = 0.803. Our results (tabulated in Tables 8 and 9) reach mean correlations for the vegetative stage (r = 0.986), reproductive (r = 0.9442), and ripening stage (r = 0.89). In [53], Table 6 and Figure 5 present N-status estimations in rice using vegetation indices calculated with aerial UAV imagery. Authors compared several regression models, with a highest R 2 of 0.74 using the LDM method. In our case, Table 8 lists higher R 2 values from the Levenberg-Marquardt method.
Our approach contributes novel solutions to commercial UAV-based phenotyping technologies, particularly to image-based remote sensing applications that adopt photogrammetric geometric image post-processing methods for image correction, and enables crop/plot analysis in real-time.
One important drawback of our solution was the estimation of canopy N for crops with higher biomass. Counter to the expectation of finding a higher N correlation as a function of crop growth, the correlation results decreased in the ripening stage, as depicted in Figure 13e and from the numerical data presented in Table 9. We attribute this to the disproportionately high number of senescent leaves (yellow color with a bandwidth of 570-590 nm) that do not properly match with the VI dominant wavelengths (see Table 3). Consequently, we suggest introducing plant senescence reflectance as an added variable for the discovery of novel vegetative indices to enhance our system.

Conclusions
This paper presents an integrated UAV system with non-invasive image-based methods for the estimation and monitoring of the N dynamics in rice crops. Several challenges were addressed, associated with image segmentation and UAV navigation control, to achieve reliable machine learning training and highly correlated results to SPAD. The proposed BS+DAF attitude controller led to precise UAV way-point tracking, with position errors below 2%. This was key to capture aerial multispectral imagery during hovering that accurately matched with the GPS positions of the SPAD measurements at ground-level. This reduces the need for computationally expensive photogrammetry methods.
Higher N correlations were achieved with neural networks: 0.98 in the vegetative stage, 0.94 in reproductive, and 0.89 in ripening, with an ROC accuracy of 0.85. This is a promising result towards the autonomous estimation of rice canopy nitrogen in real-time. With the advent of small AI-dedicated systems on chip (SoC) and the powerful computing capabilities of our UAV system, we expect upcoming work to achieve real-time computation of the machine learning methods presented in this work.
Several challenges still remain for improving the N estimations, especially when the crop biomass is high. Rice crops commonly have several mixed varieties per small plot areas. Therefore, besides the assessment of the N dynamics as a function of the crop stages, it is also crucial to identify and for deriving aerodynamics equations. We used Blade-Element-Theory computation to calculate both lift and drag coefficients using the FoilSim III simulator provided by NASA [54]. Our UAV has a lift coefficient of C L = 1.6 and a drag coefficient of C D = 0.042, since l/r = 0.7.
Both rotationalω and translationalυ accelerations could be derived from the Newton-Euler formulation, as: being I b ∈ 6x6 the spatial inertia operator calculated at the Center of Mass (CM) of the body frame {b}. It can be expressed as: where J b ∈ 3x3 is the inertial tensor with diag(I xx , I yy , I zz ) being the moments of inertia, m is the mass of the UAV, and U is a 3 × 3 identity operator. Likewise, the term F b ∈ 6x1 in Equation (A2) is the 6D spatial force acting on the CM of {b}. F b contains the effects caused by both inertial (N b ) and aerodynamics (T b ) forces acting on the body frame: In Equation (A4), we have determined an expression that incorporates the thrust produced by each independent rotor (T oi ) ∀oi : 1...4. These aerodynamic terms govern the generation of rolling (τ φ ), pitching (τ θ ) and yawing (τ ψ ) torques at the CM of the UAV, where the term s oi,b = 0.18m is the distance between each rotor to the body frame (see Figure A1a). In addition, T oi depends on the lift (L) and drag (D) forces acting on each propeller, as shown in Figure A1b. It can be written as: where ρ air = 1.20 Kgm 3 is the density of air, ω oi , ∀oi : 1...4 is the rotor speed, A prop = 0.013 m 2 is the propeller transversal area, C L is the lift coefficient, and C D is the drag coefficient. As shown in Figure A1b, we have estimated both values as C L = 1.6 and C D = 0.042, respectively. In this sense, the net vertical thrust (T b ) generated at the CM of the UAV can be calculated as: As observed in Equation (A4), T b governs the generation of the linear forces. The expressions sψ, cψ denote sin(ψ) and cos(ψ), respectively. Finally, the term m = 0.43 Kg is the mass of the UAV and g = 9.81 ms −2 is the gravitational acceleration. In the forthcoming section, we will derive the control strategy to regulate the angular motions precisely. Since our control approach will depend on the UAV model, we introduce the computational steps to calculate the EoM in Algorithm A1.