Remotely Piloted Aircraft and Random Forest in the Evaluation of the Spatial Variability of Foliar Nitrogen in Coffee Crop

: The development of approaches to determine the spatial variability of nitrogen (N) into coffee leaves is essential to increase productivity and reduce production costs and environmental impacts associated with excessive N applications. Thus, this study aimed to assess the potential of the Random Forest (RF) machine learning method applied to vegetation indices (VI) obtained from Remotely Piloted Aircraft (RPA) images to measure the N content in coffee plants. A total of 10 VI were obtained from multispectral images by a camera attached to a rotary-wing RPA. The RGB orthomosaic was used to determine sampling points at the crop area, which were ranked by N levels in the plants as deﬁcient, critical, or sufﬁcient. The chemical analysis of N content in the coffee leaves, as well as the VI values in sample points, were used as input parameters for the image training and its classiﬁcation by the RF. The suggested model has shown global accuracy and a kappa coefﬁcient of up to 0.91 and 0.86, respectively. The best results were achieved using the Green Normalized Difference Vegetation (GNDVI) and Green Optimized Soil Adjusted Vegetation Index (GOSAVI). In addition, the model enabled the evaluation of the spatial distribution of N in the coffee trees, as well as quantiﬁcation of N deﬁciency in the crop for the whole area. The GNDVI and GOSAVI allowed the veriﬁcation that 22% of the entire crop area had plants with N deﬁciency symptoms, which would result in a reduction of 78% in the amount of N applied by the producer. promising approach to map and quantify nitrogen status in a coffee plantation and also indicates the possibility to apply fertilizers in a localized manner. The proposed model allowed the assessment of the N spatial distribution in the coffee leaves and the quantiﬁcation of area presenting N deﬁciency. The model also indicated that the most efﬁcient vegetation indices to evaluate the nitrogen status in coffee plants were GNDVI and GOSAVI. Additionally, the application of the methodology proposed in this study can contribute to a more rational management of N in the crops. manuscript.


Introduction
To succeed in coffee production, good fertility of the soil, along with an intense fertility care program is necessary [1]. Without the proper assessment of soil fertility, nutritional deficiencies will affect the survival and productivity of coffee plants [2,3]. Among the essential nutrients that are necessary for coffee crops, nitrogen (N) is considered the one that limits the development and productivity of the coffee [4]. The adequate administration of N promotes an increase of pairs of leaves and branches per node in the plants, which is immediately related to coffee [3]. Additionally, the N determines the plant settlement and root development, influencing numerous aspects of the plant's health [5].
Meanwhile, the proper management of N in coffee crops is still a challenging task for most producers. Cases of both excessive and deficient N application are problems in coffee production [6]. Excessive application of nitrogen fertilizers is a common practice among small and big producers, being the main cause of low-efficiency fertilization, reducing the quality of the production and lowering the profit for farmers [7,8]. Furthermore, the excess of N can cause susceptibility for plague attacks on the plants [9,10]. Environmental damage has been reported in the literature caused by leaching and volatilization of excessive N. In contrast, N deficiency causes fall of leaves, fewer leafy plants, smaller sized fruits, and drying in branches [11,12].
Consequently, to achieve satisfactory levels of N in coffee plants, producers apply nitrogen fertilizers based on a calendar or samples occasionally collected from soil and leaves for laboratory analysis [5,6]. In extreme cases of nutrient deficiency, symptoms like the yellowing of leaves start to show and are used for reactive decision-making [5,13]. However, these methods are impractical in large areas, requiring a lot of human resources, materials, sample collection, laboratory testing, and data processing [14,15]. Additionally, only a few plants are randomly sampled, making it possible that it does not represent the spatial variability properly across the crops [16]. In the long term, these methods are not suitable for balancing the nitrogen need of the coffee plants, resulting in an unstable production with productiveness deficits [5].
As an alternative to those methods, remote sensing technologies have been used to estimate nitrogen content in coffee plants. For example, [5] used vegetation indices obtained from Sentinel-2 MultiSpectral Instrument (MSI) images and machine learning to model the leaf nitrogen content in coffee bushes. The results have shown precision up to R 2 = 0.78. The study performed by [8] obtained precision up to R 2 = 0.81 while estimating nitrogen levels in coffee plants for different growing stages and field conditions. However, these studies have not explored the potential of Remotely Piloted Aircraft combined with machine learning techniques.
In the past few years, remote sensing based on Remotely Piloted Aircraft (RPA) has rapidly developed, due to its low cost, operation facility, and wide field of view [17,18]. With a nondestructive approach, data from remote sensing obtained using multispectral images from RPAs is commonly used to monitor nitrogen content in vegetation [19][20][21][22]. Still, due to the amount of data that is generated, remote sensing combined with high-resolution images requires a robust technique, such as methods of machine learning [23]. These methodologies, when applied to agriculture, achieve advantages, like the ability to solve nonlinear problems using datasets from various sources [24][25][26] and discover information hidden on the data [27]. There are still few studies in the literature contemplating RPA, management of nitrogen fertilizers, and machine learning [23,[28][29][30]. For coffee production, [31] study used RGB-based vegetation indices from RPA images with Random Forest to monitor the nitrogen status of the plants. However, the models developed in the study were not capable of explaining and predicting the spatial variability of the nitrogen in the coffee plants. Therefore, a solution to these unsatisfactory results is possibly to explore vegetation indices from multispectral images.
Thus, based on the hypothesis that the learning method Random Forest applied to vegetation indices obtained from RPAs can contribute to the more efficient management of the nitrogen content in coffee crops, the objectives of this study were (i) to map the spatial variability of the nitrogen content in the coffee plantations, (ii) quantify the deficiency of nitrogen in coffee plants, and (iii) determine the most efficient vegetation index to predict N content in coffee plants using the Random Forest (RF) machine learning method.

Materials and Methods
The methodology proposed in this study is briefly explained in Figure 1. In the first step, multispectral images of the study area were acquired by a multispectral camera coupled to an RPA. These images were processed, creating the orthomosaics, to then later calculate the vegetation indices. In the second step, the RGB mosaic composition was used to define sampling points in the study area, used for analyzing the chemical content of N in the coffee leaves. Finally, in the third step, values of the vegetation indices and chemical analysis of N in the leaves in each sample point were used as input parameters to calibrate the algorithm Random Forest and classify the images in three N content categories.
The accuracy of the classification was measured by the overall accuracy metrics, kappa coefficient, receiver operating characteristic (ROC) curve, and area under the curve (UAC). The methodology is further detailed in the following topics.

Study Site
The study was carried out in a field located in the commercial farm Bom Jardim, in the city of Santo Antonio do Amparo, state of Minas Gerais, Brazil, geographic coordinates  According to the Köppen-Geiser climate classification, the region has a Cwa climate, humid subtropical, with hot and humid summers and cold and dry winters, with an annual average air temperature of 19.8 • C and average annual total rainfall of 1670 mm [32]. The climatic data [33] of the study area from the period of the analysis is shown in Table 1.

RPA Image Acquisition and Preprocessing
The images were captured using a commercial RPA 3DR Solo (3D Robotics, Berkeley, CA, USA), with four engines (quadcopter), powered by the automatic pilot system 3DR Pixhawk 2 and flight controller APM:Copter. A multispectral camera Parrot Sequoia with a focal distance fixed at 4.0 mm was coupled to the RPA. This camera has an RGB sensor with 16 megapixels resolution (4608 × 3456) and four extra sensors with 1.5 megapixels resolution (1280 × 960) in the spectral bands of green (550 nm BP 40), red (660 nm BP40), red-edge (735 nm BP 10), and near-infrared (790 nm BP 40). This camera is designed to map and monitor vegetation; it includes an upward-facing sunshine sensor, which allows the radiometric calibration of those 4 multispectral bands during image collection. The software Mission Planner from a complete ground station, open-source for RPA automatic piloting systems, was used to plan the flight mission. The flight altitude was fixed at 60 m above ground, and the flight speed was an average of 3 m s −1 . The images were captured every 1 s with a spatial resolution of 2.07 cm and overlapping the frontal position of 80% and lateral position of 75%.
The image processing was made using the software Agisoft PhotoScan ® Professional, version 1.2.4 (Agisoft LLC, St. Petersburg, Russia). This software works in a three-step workflow. The first step was the alignment of the images by identifying corresponding resources. For executing the alignment of the image, the software calculates the parameters that orient the internal and external camera, including the radial nonlinear distortion. For this study, this task was executed with a high precision set. The results of this step are positions of the camera matching each image, representing the parameters for internal calibration, and the 3D coordinates of the sparse cloud of points in the terrain. In the second step, the sparse cloud is georeferenced in a local coordinates system (WGS 84-UTM Zone 23S), and the densified cloud of points is obtained using the heightfield method, which is based on paired depth map computation, resulting in a detailed 3D model. The third step applies a texture to the mesh obtained previously, generating the orthomosaic. This way, orthomosaics were created for each spectral band (green, red, red-edge, and near-infrared).
To minimize the effects of other targets in the spectral response of the coffee plants, such as soil and weeds, the mosaics were differentiated in the software eCognition version 9.0.1 (eCognition Developer, Munich, Germany). Two distinct classes were visually defined: coffee plants and noncoffee plants. From that, the classes were manually sorted for the orthomosaics of each spectral band.

Determining Leaf Nitrogen
Leaf nitrogen analyzes were performed on 10 December 2018. After obtaining the orthomosaics, the nitrogen level in the leaves of the coffee plants was visually evaluated, defining three distinct regions as deficient, critical, and sufficient N content ( Figure 3). For that, we used the orthomosaic of the RGB composition (red, green, and blue) that allows a better perception than the human eye of vegetation. Thus, 10 targeted sampling points were chosen, with 2 sampling points for regions with N content in the plants considered sufficient, 4 points for regions of critical level, and 4 points for deficient regions ( Figure 2). Each sample point was composed of 5 plants, 1 central plant and 2 plants oriented north-south from the central plant. From every plant, 3 leaves were collected from both sides of the planting line in three different canopy heights, totalizing 30 leaves collected for each sample point. After the collection, the leaves were sent to the laboratory of leaf analysis in the University of Lavras to quantify the nitrogen content using the Kjeldahl method. This method has been widely applied to determine nitrogen content, especially in the analysis of plant tissues. The Kjeldahl procedure involves three steps-destruction, distillation, and titration [8]. The leaf diagnosis for mature coffee plants proposed by [34] was used to classify the nitrogen content in the leaves as N-deficient (<2.5%), N-critical (2.5 to 3.0%), and N-sufficient (3.0 to 3.5%).

Vegetation Indices
The vegetation indices (Table 2) were chosen based on their capability to determine the nitrogen content in the coffee crops from remote sensing data [5,7,8].

Random Forest (RF) Classification
The Random Forest (RF) algorithm was used for classifying the N content in the leaves of the coffee plants from the vegetation indices proposed in this study ( Table 2). The modeling was created in R [45], using the R Random Forest package [46]. For the training of the algorithm, the values of the vegetation indexes in the pixels referring to each of the 5 plants of each sampling point were used with the results of the chemical analysis of the N content in the leaves of the coffee plants of these sampling points. RF is one of the most successful classifiers based on learning strategies. The algorithm consists of a collection of three-based classifiers {h (x, Θk), k = 1, . . . }, where x is the input vector and {Θk} are the random vectors distributed identically and independently [47]. Each decision tree was constructed using a bootstrap deterministic algorithm, allowing the remaining data points to validate and issue a unitary vote to the most popular class. The RF uses the procedure of initialization with replacement to increase the diversity of the classifying trees, which allocated pixels to one class, according to the maximum number of votes in the collection [48]. Table 2. Vegetation indices of multispectral images obtained using Remotely Piloted Aircraft (RPA).
To execute the RF for classifying images, two parameters must be defined: the number of trees (ntree), and the number of predictors concerning the maximization of the model (mtry) [49,50]. In this study, the ntree was set to 1000 trees, while the mtry was set to equal to the square root of the total number of input resources.

Accuracy Assessment
To validate the results of the RF classification, a crossed validation approach was used 10 times [51]. This involved the random division of the reference objects in 10 joints of datasets, each one including around 15% of data from each class. In the steps of the assessment, the RF was tuned with 85% of the reference data and then applied to the other 15% left (that were the validation joint of data). This step was repeated ten times. In the end, the results were aggregated to one confusion matrix. The classification performance was assessed based on common statistical measures [52], derived from the confusion matrix. The select statistical measures included the overall accuracy (Equation (1)), kappa coefficient (Equation (2)), receiver operating characteristic (ROC curve), and area under the curve (UAC). The ROC curve was obtained by plotting a graph of sensitivity (true positive rate) versus specificity (false positive rate), and UAC was estimated using the method proposed by [53].
where q is the number of classes, n represents the total number of considered pixels, n ii are the diagonal elements of the confusion matrix, n i+ represents the marginal sum of the rows in the confusion matrix, and n +i is the marginal sum of the columns in the confusion matrix.

Nitrogen Content in Coffee Leaves
First, as described in the Material and Methods section, one flight was performed in the area; then, from the RGB images, the orthomosaic was generated, which allowed the identification and establishment of the leaves sampling points. The results of the statistical descriptive analyses of the chemical analysis of nitrogen content in coffee leaves are shown in Table 3. The areas in the crops with N levels considered sufficient, critical, and deficient are denoted with values within the nutritional scale established by [34]. However, as can be seen in Figure 2, regions with a critical level of N presented plants with a considerable amount of yellowish leaves, which is a symptom of nutritional deficiency. This may have occurred because the plants in this region had leaves with a medium nitrogen content (2.69%), with the content value close to the level considered nutritionally deficient (<2.5%). Table 3. Results of the descriptive statistics of the chemical analysis of N content in plants located in the sampling points, considered as N-sufficient, -critical and -deficient. The high-resolution images captured using RPA showed great potential in differentiating samples to evaluate the nitrogen content in the coffee leaves. According to [13], changes in the nutritional status of the plants have a direct impact on the canopy color of the coffee plantation. The color of leaves deficient in nitrogen is lighter, causing the color of the canopy to be green-yellowish, whereas leaves with sufficient nitrogen content reflect energy more intensely in the green wavelength than in the red wavelength. In contrast, leaves with N deficiency reflect energy intensely both in the green and in the red wavelengths, resulting in leaves with a yellow coloration [8,13].

Overall Accuracy and Kappa Performance
The results of the overall accuracy and the kappa coefficient of the images classified by RF from vegetation indices are shown in Figure 4. In general, the classification has presented a high performance in assessing the nitrogen content of the leaves in coffee plants: values of accuracy and kappa coefficient were from 0.78 to 0.91 and 0.46 to 0.86, respectively. It was possible to notice that vegetation indices that use a combination of the nearinfrared band with bands in the visible spectral region presented better results. Between those indices, the Green Normalized Difference Vegetation (GNDVI) and the Green Optimized Soil Adjusted Vegetation Index (GOSAVI) obtained the greatest values. These results are associated with the larger predominance of plants with sufficient N content in the crops and, consequently, higher chlorophyll content in the leaves. According to [54,55], the reflectance in the green band is greatly affected by the variations in the chlorophyll content, when compared to the red band reflectance. Thus, the spectral indices based on the near-infrared and green bands have proved more sensitive to changes in chlorophyll content than the indices with near-infrared and red bands [56,57]. The GNDVI and GOSAVI have also been reported by other authors as more suitable indices to evaluate N status in vegetation. Similarly to this study, [29] used an approach based on RF and vegetation indices from RPA images to describe the variation of plant N uptake (PNU), and N nutrition index (NNI) for rice crops in the northwest lowlands of China. The results, resembling this study, presented GOSAVI and GNDVI indices as having better performances. In [22], they applied the RF model to vegetation indices obtained from three sensors (RGB, color infrared, and multispectral), coupled to an RPA, for determining the N status in the rice crops, finding the GNDVI to have the better results. Therefore, the results of this study align with the literature and recent research.
For the Medium Resolution Imaging Spectrometer (MERIS) Terrestrial Chlorophyll Index (MTCI) and Normalized Difference Red Edge (NDRE) indices, the bandwidth of the red-edge band that is 4 times smaller than the green and red bands may have contributed to the worst performances among the indices that use the combination of the near-infrared band with visible bands. As demonstrated by [5], the width of the red-edge band can make indices less sensitive to the vegetation characteristics. However, better results were expected from them, since the relation between near-infrared and red-edge spectral bands with the N content in the plants is well established in the literature [5,6,22].
For the indices Excessive Red (EXR), Modified Photochemical Reflectance Index (MPRI), Green-Red Ratio Index (GRRI), and Normalized Different Index (NDI) that use only visible spectral bands, the results had inferior performances. The possible explanation for that is the lack of radiometric correction for RGB images, letting fewer alterations in the lighting conditions influence the precision of the color reproduction [58,59]. However, even with those indices showing worse performances, when compared to indices that combine near-infrared bands to visible bands, their results still express the potential for the application of RGB cameras. These types of cameras can be a viable alternative to evaluate the N status in plants, especially for medium and smaller farmers. RGB cameras are easy to operate and accessible for most researchers. Furthermore, the RGB images can be used right after downloading from the memory card, without preprocessing, for example, for band registry, and to process images, various sophisticated commercial platforms are now available, such as Agisoft Photoscan and Pix4D.
Regarding the RGB indices, the results obtained by this study diverge from the reported by [31]. The Random Forest models based on RGB indices from RPA that were developed by the authors did not succeed in evaluating and predicting the N content of the coffee plants. This may have occurred because the Parot Sequoia camera that was used in the present study presents superior techniques, compared to the RGB camera used by [31], for example, when calibrating the radiometric individual bands. Additionally, the focus of the sampling points in this study may have been crucial for more effective training samples and clearer results.

ROC Curve and AUC
Regarding the analysis of the image classification using the ROC curve and AUC, in general, all the vegetation indices presented excellent performances for assessing the nitrogen content in coffee plants ( Figure 5). It was possible to observe all vegetation indices approximate to the point (0, 100%) in the ROC curve. According to [60], to rate a model as optimum, its analysis has to be as close as possible to the point (0, 100%). Moreover, [61] describes the ROC curve as a description of the relative compensation between benefits (true positives) and costs (false positives) of a classification. Thus, a point located in the upper left corner means a greater number of positive and negative examples were classified correctly, consequently creating a lower cost to the classification.
For the AUC, the values were from 0.817 to 0.934, where greater values came from the indices that combined near-infrared band to visible bands, matching what happened with overall accuracy and kappa coefficient. According to [62], the closer the AUC gets to 1, the better the overall test performance, meaning the AUC= 1 test is the perfectly accurate result. For the approach in this study, the indices GNDVI and GOSAVI had the greatest performances, while the indices MPRI, GRRI, and NDI presented the worse performances.

Mapping and Quantifying N Spatial Distribution in Coffee Leaves
Predictive maps for nitrogen spatial distribution in coffee leaves and quantification in the percentage of those maps, obtained from the classification of images using RF from vegetation indices data, are shown in Figure 6, Figure 7 and Table 4, respectively.   Table 4. Percentage (%) of areas in the coffee plantation that had N content sufficient, critical, and deficient, according to the appraisal maps of the N content spatial distribution in coffee leaves, presented from the best performance to the worst performance. GNDVI and GOSAVI indices had the best performances ( Figure 6), presenting the best class definition. These maps had the largest area at the N-critical level (52%) and the smallest area for deficient N levels (22%). According to these maps, the plants in the central region of the crops showed a greater N deficiency. Similar to these indices, NDVI and SAVI maps indicated greater N deficiency in the central region of the crops, although they also indicated a large number of N-deficient plants in the western region, evinced by the 9% increase in the N-deficient areas, when compared to the GNDVI and GOSAVI maps. Regarding the MTCI and NDRE indices (Figure 6), they both presented the same percentage of N-deficient areas as the GNDVI and the GOSAVI, but they overestimated areas with sufficient N content (31%). Additionally, in these maps, the area with N deficiency extended from the central region to the eastern region of the plantation. The indices that had worse performances were MPRI, GRRI, and NDI (Figure 7). They overestimated the percentage of N-deficient areas and underestimated areas with sufficient N levels (21%) as well as areas with critical N levels (41%). The maps for these indices showed mixed classes, especially between critical and deficient levels of nitrogen, showing deficiency in practically all regions of the crops.
These results confirmed the possibility of managing nitrogen locally using images obtained from RPA. Knowing the spatial distribution of areas with N deficiency becomes essential for managing nitrogen properly in the plantation. Overestimating or underestimating areas with N deficiency can increase production costs, reduce productivity, and cause environmental damage. In this context, the methodology proposed by this study, especially for GNDVI and GOSAVI indices, can promote a 78% reduction in nitrogen fertilizer treatments in the studied area, when compared to the conventional management and application methods. For this, applying fertilizers would be recommended only for plants showing N deficiency, which means 22% of the total area, unlike the conventional application of fertilizer, where producers apply it to the entire area of the plantation.
Based on information previously published by [34,63], it is possible to illustrate how the methodology proposed in this study could prove economically relevant to monitor the N status in coffee plantations. The data used in the simulation were as follows: study area: 1.5 ha; ammonium nitrate price for 50 Kg: $15.00; average amount of N applied by coffee producers: 300 kg ha −1 per year; N concentration in ammonium nitrate: 33%; total ammonium nitrate applied by coffee producers: 833 kg ha −1 per year; and reduction of the demand for N observed in this study, considering results from the GNDVI and GOSAVI indices: 78%.
To that end, the calculation of the total price per hectare of N was 833 kg ha −1 per year × $15.00 (50 Kg) = $249.90 kg ha −1 . Considering the 1.5 ha plantation area, it would have a cost of $374.85 per year for nitrogen fertilization. When considering the actual demand for N for this coffee field, which, in this study, would reduce by 78% of the total, the coffee producer cost could be reduced to $374.85 × 0.22 = $82.47 per year. In conclusion, the coffee producer could save $292.38 during the year with nitrogen fertilization. This was possible, considering the use of N only for plants that showed deficiency. This example is hypothetical. However, it shows the application and importance of carrying out properly the monitoring of N status in coffee plantations, which is especially relevant to minimize the use of N and reduce the costs of nitrogen fertilization, as well as its impacts.
In this study, the learning method of the RF algorithm applied to the vegetation indices from images obtained using a multispectral camera coupled to an RPA was used to evaluate the N content in a coffee plantation. The results proved that this methodology could be used to diagnose the nitrogen status in coffee plants and also orient producers on how to do the application of nitrogen fertilizers at a variable rate. However, other local conditions, such as density, canopy cover, age, species of plant, and spacing can reduce the accuracy of the model suggested. From this, to deal with the problem complexity, a systemic approach that simulates and predicts the impact of applying N fertilizers over space and time, affecting the cultivation yield, is necessary. To validate this relation, more studies using different conditions and plants are needed. In addition, the use of RPAs with greater flight autonomy and different multispectral cameras with better resolution can improve the model efficiency, as well as the size of the monitored area. To administer N fertilizers at a variable rate, the process used in the practical application needs to develop into management zones, pixel-based, and plant-based, to make easy the reduction and viable the application for producers.

Conclusions
The results of this study showed that the machine learning method Random Forest, applied to vegetation indices from multispectral images obtained by RPA, offers a very promising approach to map and quantify nitrogen status in a coffee plantation and also indicates the possibility to apply fertilizers in a localized manner. The proposed model allowed the assessment of the N spatial distribution in the coffee leaves and the quantification of area presenting N deficiency. The model also indicated that the most efficient vegetation indices to evaluate the nitrogen status in coffee plants were GNDVI and GOSAVI. Additionally, the application of the methodology proposed in this study can contribute to a more rational management of N in the crops.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.