Crop Row Detection through UAV Surveys to Optimize On-Farm Irrigation Management

: Climate change and competition among water users are increasingly leading to a reduction of water availability for irrigation; at the same time, traditionally non-irrigated crops require irrigation to achieve high quality standards. In the context of precision agriculture, particular attention is given to the optimization of on-farm irrigation management, based on the knowledge of within-ﬁeld variability of crop and soil properties, to increase crop yield quality and ensure an efﬁcient water use. Unmanned Aerial Vehicle (UAV) imagery is used in precision agriculture to monitor crop variability, but in the case of row-crops, image post-processing is required to separate crop rows from soil background and weeds. This study focuses on the crop row detection and extraction from images acquired through a UAV during the cropping season of 2018. Thresholding algorithms, classiﬁcation algorithms, and Bayesian segmentation are tested and compared on three different crop types, namely grapevine, pear, and tomato, for analyzing the suitability of these methods with respect to the characteristics of each crop. The obtained results are promising, with overall accuracy greater than 90% and producer’s accuracy over 85% for the class “crop canopy”. The methods’ performances vary according to the crop types, input data, and parameters used. Some important outcomes can be pointed out from our study: NIR information does not give any particular added value, and RGB sensors should be preferred to identify crop rows; the presence of shadows in the inter-row distances may affect crop detection on vineyards. Finally, the best methodologies to be adopted for practical applications are discussed.


Introduction
According to the most recent projections presented by the Intergovernmental Panel on Climate Change (IPCC), the variation in precipitation is altering hydrological systems in many agricultural areas of the planet, affecting water resources in terms of both quantity and quality [1]. In particular, climate change is leading to more frequent extreme events (droughts and heat waves, occurring more often and lasting longer), to the alteration of spatial and temporal precipitation (less precipitation concentrated in a few heavy rainfall events), as well as to an increase in air temperatures and crop water needs in many geographical areas.
In Mediterranean countries, freshwater resources are currently highly exploited due to the rapid population growth and intensive water use in agriculture, industry, and tourism activities. In many areas of Europe, including Italy, the effects of climate change are already detectable and have led to the need to irrigate crops and areas traditionally not irrigated [2].
The vineyard variety was Chardonnay, cultivated in rows oriented along the east-west axis, with a distance of plants on the row of 0.8 m and a distance between the rows of 2.4 m, while the row height was about 1.3 m. The plant cover fraction in the phase of maximum development of the canopy was estimated to be about 25% (equivalent to a row width of 0.6 m). The soil, both under the rows and between the rows, was grass-covered with periodic mowing to regulate the excessive development of vegetation [19].

Pear Orchard
The pear orchard was situated in the "Dotti" farm, a research facility of the University of Milan, located in Arcagna locality, Montanaso Lombardo (province of Lodi (LO)).
The orchard covers an area of about 1 ha and is flat, positioned at about 80 m a.s.l. According to the data recorded in the period 1993-2017 at the nearest ARPA agro-meteorological station (located at about 12 km southeast from the experimental site in Cavenago d'Adda), the climate is characterized by two rainy periods, respectively in April and September, while the highest temperatures occur in July, when rain is minimum.
The soil is loam with more clay in deeper horizons. In the orchard, four different pear varieties were cultivated, namely Williams, Abate, Kaiser, and Conference varieties, distributed in 17 rows with an inter-row distance of about 4 m, while the distance between plants on the row was about 1.5 m, depending on the variety; the soil, under the rows and between the rows, was grass-covered with periodic cutting.

Tomato Field
The third site included in the project is an area of about one hectare cultivated with industrial tomato "Pietra Rossa F1", inside the CREA research center in Montanaso Lombardo (LO). The site is characterized by loamy soils with an increase of clay with the depth, while climatic conditions are the same as the ones above described for the pear orchard site. Tomatoes were 0.3 m high at images' acquisition and cultivated in parallel rows of 0.5 m width with a distance between rows of about 1.5 m.

UAV Surveys and Photogrammetric Processing
Two UAV surveys were conducted on each study site, during the agricultural season of 2018. For each site, in the first survey, the Parrot Bebop 2 (Parrot S.A., Paris, France) was used to acquire RGB images, and the Parrot Sequoia camera (Parrot S.A., Paris, France) was mounted on the Parrot Disco (Parrot S.A., Paris, France) to collect multispectral imagery during the second survey. The photogrammetric processing of all the surveys was performed with Pix4Dmapper Pro software (Pix4D S.A., Prilly, Switzerland), Version 4.1.1. The details of the surveys and their processing are described in the following sections.

Vineyard
The RGB survey of the vineyard took place on 28 June 2018, while the multispectral survey, six days later on 4 July. The flight height of the Parrot Bebop 2 used in the RGB survey was set to 40 m Above Ground Level (AGL), while the Parrot Disco flew at 60 m (AGL) during the multispectral survey. The same overlaps among images were fixed for the two surveys: longitudinal overlapping equal to 80% and transversal equal to 70%. An amount of 130 and 560 images were collected during the RGB and multispectral survey, respectively. According to the sensors' characteristics (i.e., Parrot Bebop 2 fisheye camera and Sequoia camera), the final Ground Sample Distance (GSD) of the acquired images was about 0.1 m for both cases. As suggested in [20], the coordinates of nine Ground Control Points (GCPs) were measured with the Global Navigation Satellite System (GNSS) receiver Leica Viva GS14 (Leica Geosystems, Heerbrugg, Switzerland) in Network Real-Time Kinematic (NRTK) mode, to ensure the georeferencing of the photogrammetric products with high accuracy. According to the results of [20], eight GCPs were distributed all around the perimeter of the vineyard, while the ninth target was placed in the middle of the field ( Figure 2). In all the study sites, some of the GCPs were adopted as Check Points (CPs) during the photogrammetric processing, to assess process accuracy. At the end of two independent photogrammetric processing, performed with Pix4Dmapper Pro software, a Digital Surface Model (DSM), an RGB orthophoto, and a four band multispectral orthophoto were produced, with GSD of around 0.1 m, to exploit for the crop row detection. The DSM generated from the multispectral survey had lower quality than the RGB one; therefore, it was not considered in further analysis. Table 1 summarizes the residuals on GCPs after photogrammetric processing, while in Figure 3, DSM and orthophotos of the vineyard are shown.

Pear Orchard
In the pear orchard site, the RGB and multispectral surveys were conducted on 26 June and 2 July, respectively. The characteristics of the flights were the same as the surveys performed on the vineyard site: longitudinal overlap among images equal to 80%, transversal overlap equal to 70%, flight height set at 40 m and 60 m for the multirotor UAV and the fixed-wings UAV, respectively, thus ensuring a GSD of the images of about 0.1 m. 140 images were acquired during the RGB survey, while 540 during the multispectral one. During the UAV flights, seven targets were placed on the terrain to be used as GCPs, well distributed all around the orchard, as shown in Figure 4. As the case of the vineyard, the DSM and the orthophotos were produced in Pix4Dmapper Pro with a spatial resolution of 0.1 m ( Figure 5). The residuals on the GCPs computed after the photogrammetric workflow are reported in Table 2.

Tomato Field
Given the proximity of the two sites, the tomato field was surveyed on the same days as the pear orchard, with the same equipment and flight characteristics. During the RGB survey, 96 images were acquired, as well as 388 images came from the multispectral flight. From these datasets, one DSM and two orthophotos, having a GSD equal to 0.1 m, were generated after bundle block adjustment in Pix4Dmapper Pro. The various photogrammetric products were georeferenced by means of six GCPs, whose center coordinates were measured with GNSS-NRTK on the dates of the surveys. The distribution of the GCPs on the tomato site and their computed residuals are shown in Figure 6 and Table 3, respectively. The photogrammetric products are reported in Figure 7.

Crop Row Detection Methods
Differentiating between crop canopy and background can be very challenging. Moreover, the different crop types considered in this study led to the need for many methods to extract crop rows, each one more suitable for a specific crop type. In the following sections, five different detection methods were proposed; some of them were taken from the existing literature (i.e., classification algorithms and Bayesian segmentation), while two methods, labeled as thresholding algorithms, were developed ad hoc for the purposes of the project.
In order to achieve the best possible crop mask to be used to identify crop rows on orthophotos, Vegetation Indices (VIs) were chosen as the inputs of the detection methods, as already proposed by many authors [15,16,21]. Considering vegetation, most of those indices take into account Red (R) and NIR reflectance bands (ρ λ ): the greater is the difference between ρ R and ρ N IR , the greater is the amount of green and healthy vegetation in that particular pixel. Among all the possible VIs, only those composed of spectral bands that sensors involved in the surveys could provide were used in this study (Table 4).

Normalized Green Channel Brightness
Green Red+Green+Blue [27] To exploit the proposed methods fully, also the DSM and RGB orthophoto were individually used as inputs. This ensured that the methods were still operational, even in the cases where only imagery resulting from UAVs supporting only RGB sensors was available, as for the Parrot Bebop 2.
The assessment of the final results was performed by computing error matrices and classification accuracies (Overall Accuracy (OA), User's Accuracy (UA), and Producer's Accuracy (PA)), on some validation samples manually identified on orthophotos, through visual inspection. In particular, the quality of the crop detection was defined according to the value of PA of the class crop canopy: the greater is the PA, the lower is the probability to omit crop pixels; therefore, to underestimate the detected crop rows.

Thresholding Algorithms
Two algorithms (specifically local maxima extraction and threshold selection) were developed ad hoc for the purpose of the project, by starting from [28], who proposed a method for crop rows' extraction by using as input the 3D point cloud. Both methods were generated in MATLAB R2017b [29] and are based on the concept that high pixel values generally correspond to crop rows. They are illustrated in the following sections.

Local Maxima Extraction
This method aims at generating a binary raster, where non-null values refer to the presence of the crop canopy. First, the input raster (e.g., VI map or DSM) is divided into square cells (macro-cells), then inside each macro-cell, the crop pixels are identified as those corresponding to a percentage of pixels with the highest values. It is a semi-automatic algorithm, where the user has to define the dimensions of the cell and the percentage value.
This method is sensitive to the user's choices: in particular, the macro-cell dimension should be selected in order to include both crop and ground pixels, thus being close to the distance between rows or slightly larger. Macro-cell size should be neither too small nor too big with respect to the distance between rows. If it is too small, a wrong selection of pixels is performed whatever the chosen percentage is. When the cell includes only crop pixels, whatever percentage not equal to 100% causes an underestimation of crop pixels; on the other side, when the cell overlaps only ground pixels, they would be selected as crop pixels, thus producing an over-estimation in the crop mask. The overestimation arises also when the dimension of the macro-cell is too big, because the probability of selecting pixels belonging to the ground increased with the chosen cell size.

Threshold Selection
This method produces a binary crop mask, by selecting as crop pixels all pixels with values higher than a reference value. The challenging part of this algorithm is the definition of this reference value. When both DSM and Digital Terrain Model (DTM) of the area are available, the Canopy Height Model (CHM) could be derived as difference between DSM and DTM, and zero could be considered as the reference value, while in other cases (i.e., VIs as input, no availability of an accurate DTM), it should be determined as described below.
Starting from the input raster, create a smoothed raster with a moving window average filter. Subtract the smoothed raster to the input raster, and define on the differences a threshold to be considered as the reference value, by visual checking of the results with an empirical trial and error approach. Pixels with values greater than the threshold are retained as crop. Even in this algorithm, the user intervention is twofold, choosing the dimensions of the moving window and the value of the threshold, and could cause problems of under-/over-estimation of crop canopy pixels.

Classification Algorithms
Two well-known classification algorithms were exploited in this study, K-means clustering and the Minimum Distance to the Mean (MDM) classifier, to be representative of both unsupervised and supervised classification algorithms. Both methods were applied in QGIS (Version 3.4) [30], to allow users not familiar with programming languages to run the algorithms thanks to a dedicated user-friendly GUI.

K-Means Clustering
This is a well-known algorithm for hard unsupervised thematic classification [31]. The clustering made by K-means is based on the minimization of the objective function f (Ω), defined as the Euclidean distance of samples of a cluster from the respective centroid.
The number of classes (K) are known a priori. Once K is defined, the method consists of three iterative steps. In the first step, for each class K i , a centroid is automatically chosen. The rest of the data are assigned to K clusters based on the minimum distance criterion. The Euclidean distances of each sample from the centroids are computed, and in the second step, the sample is assigned to the cluster for which the computed distance is minimum. In the last step, centroids are re-calculated, and all the samples are re-assigned. This step is iterated until the clustering converges to a stable solution, namely when centroids of clusters do not change meaningfully.
The final configuration is stable and does not depend on the initial position of centroids arbitrarily selected. The initial configuration only influences the number of iterations necessary to reach the convergence.

Minimum Distance to Mean Classifier
This method finds the mean values of all the training sets and classifies all the image pixels according to the class mean they are closest. The process is performed for all image pixels, one at a time. Bounds are determined using statistics derived from the training sets, and the distance used is the Euclidean one.
As for all supervised algorithms, a crucial phase of the MDM classifier is the selection of the training samples. They are used to compute class spectral signatures, therefore must be representative of all the classes. In this study, training samples were defined by visual inspections of the UAV images and grouped in two macro-classes: crop canopy and background, which included weeds, soil, and shadow pixels.

Bayesian Segmentation
This method relies on the Bayesian approach, where any uncertainty can be considered random variables that are fully described by probability distributions [32]. Given the vector of data y and the vector of parameter x, the conditional distribution of parameters is described by the Bayes theorem [33]: where: P(x|y) is called the posterior probability and describes the new level of knowledge of the unknown parameters x given the observed data y. P(y) is a normalization constant used to impose that the sum of P(y|x) for all possible x is equal to one. P(x), instead, represents the prior probability distribution. It describes the knowledge of the unknown parameters x without the contribution of the observed data. P(y|x) is defined as the likelihood and is a function of x. It describes the way in which the a priori knowledge is modified by data and depends on the noise distribution. The terms in Equation (1) can be adapted to match the purpose of this study, the detection of crop rows: the posterior probability is the probability of a pixel to be part of the class crop canopy or background, and the prior probability is defined starting from the mean and standard deviation values, a priori assigned to each class, while the likelihood is described by a Gaussian distribution, in which the parameters are the mean and standard deviation of the two classes: The final goal of Bayesian approach can be identified in finding the optimal parameters x that maximize the posterior probability distribution P(x|y). This is called the Maximum A Posteriori (MAP) estimate [34], and it is defined as: x MAP = arg max x P(x|y) In crop row detection, it consists of assigning a unique class to each pixel of the image, depending on the posterior probabilities estimated for each pixel. In order to obtain outputs less affected by pixel noise, smoothing filters or image adjustment can be applied on the input raster.

Results
Considering all the detection methods, their parameters, and all the possible input rasters, the number of crop masks potentially available is very high. For the vineyard site, 191 outputs were tested and 166 and 104 for the pear orchard and tomato site, respectively. For the sake of brevity, only the best masks, representing each detection method, are reported and compared. An exhaustive analysis of all the tests performed was described in [35].

Vineyard
For the vineyard site, the parameters of each detection method that resulted in the best results are reported in Table 5.   Table 6, the accuracies of the five selected best results are summarized, and the detail of each crop mask is shown in Figure 8.

Pear Orchard
The parameters for the best results of each detection method for the pear orchard are summarized in Table 7. The error matrices were computed starting from a validation set composed by 37 polygons (N pixels = 44,889) for the class crop canopy and 34 polygons (N pixels = 62,378) for the class background. The five selected best results and their respective accuracies are presented in Figure 9 and in Table 8.   Table 9 reports the parameters chosen for each detection method, which gave the best crop mask outputs. The assessment of the results for the tomato site was performed on 52 polygons (N pixels = 81,290) for the class crop canopy and 42 polygons (N pixels = 155,296) for the class background. The accuracy values are presented in Table 10, while Figure 10 shows the crop detection for the five selected methods.

Discussion
As general findings, it can be stated that all the methods tested in this study performed well for crop row detection, with OA close or even greater than 0.9. The vineyard site seemed to be the most challenging (OA values lower than 0.9 for some methods), due to the concurrent presence of weed, bare soil, and shadow in the inter-row distance, while the high contrast between bare soil and crop canopy facilitated the crop detection in the tomato site (OA values always higher than 0.9).
Despite the high accuracy values achieved, the proposed algorithms did not require any particular computational resources, and the calculation time is reasonable with mass-market hardware, even if it varied according to the method. Local Maxima Extraction (LME) and Bayesian Segmentation (BS) overall returned the best outputs in terms of accuracies values, but with different performances in terms of time cost and parameter setting. The first method was faster, and choosing a cell size comparable with the rows' distance, or slightly larger, and a percentage of maxima between 30% and 40% could produce high quality results in all cases. The latter required a high level of a priori knowledge, and parameters had to be ad hoc fine-tuned with a time-consuming trial and error approach. The Threshold Selection (TS) algorithm needs to be run with an accurate DSM, and the definition of the reference value is crucial and sometimes can fail, as demonstrated by the low accuracy values registered in the vineyard site (OA = 0.76), especially on fields characterized by a relevant slope. On hilly fields and non-flat areas, the use of a real CHM is necessary and cannot be bypassed by the creation of a smoothed raster, on which the reference value of terrain height is identified. Classification algorithms are easy to run, especially in the QGIS implementation, and widely used in remote sensing, but cannot reach, in all cases, the same level of accuracy as the other methods. In addition, these algorithms require considerable human intervention, either in the labeling phase, as the case of K-means clustering, or in the delineation of the training samples, as for starting the MDM Classifier.
Regarding input rasters, DSM, RGB orthophotos, and VIs obtained as a combination of RGB bands are the most adopted in the selected methods. Only the cases of Bayesian segmentation on the pear orchard site and classification algorithms for tomato site require NDVI and NDVI jointly with SAVI to obtain the best results. Therefore, NIR information does not give any particular additional value in crop row detection, and RGB sensors can perform accurate canopy extraction, as already demonstrated by other authors [15,28], saving the time and cost of the UAV surveys and processing.
According to crop characteristics, specific considerations can be stressed for each single crop. In the case of a vineyard, it is important to maintain the continuity of the crop row, when detecting the crop canopy. This characteristics was enhanced in the Bayesian segmentation, as shown in Figure 8e, also thanks to the Gaussian filter applied to the input raster before launching the algorithm. The continuity of the vine rows was also guaranteed by using the local maxima extraction algorithm as the detection method (Figure 8a), apart from some rare and sparse pixels. Indeed, the two aforementioned methods registered the highest accuracy values, in particular PA values: 0.97 and 0.95 for BS and LME, respectively, considerably greater than the PA values of the other detection methods. The major issue of detecting vine rows is the presence of shadows, weeds, and bare soil in the inter-row distance. Our results demonstrated that the shadows made the classification methods practically unusable on the vineyard. In Figure 8c,d, it is clearly visible how the pixels at the edges of the shadow areas were detected as crop pixels. Classification algorithms are unable to separate the vine canopy from its shadow on the terrain, resulting in an overestimation of the crop rows (UA values around 0.8).
In the orchard, pear trees are planted quite distant from one another (in our study site, around 1.5 m); therefore, a good detection has to identify single plants rather than rows. In these terms, classification algorithms return the best results, as visible in Figure 9c,d and confirmed by the highest values of UA practically equal to one (Table 8). On the other hand, these detection methods also returned the most noisy outputs and underestimated the presence of pear trees in the orchard, in particular the MDM classifier with a PA equal to 0.68. The height of the trees favored their extraction from the background, also in presence of weeds in the inter-row distance. To exploit these characteristics fully, it is advisable to use the DSM as the input of whatever detection method; in particular, the threshold selection algorithm gave the best outcomes in the pear orchard site, with PA value for the class crop canopy of 0.97.
As already mentioned, the tomato field site had the highest accuracy values for the crop detection, thanks to the regular alternation of bare soil and vegetation canopy. The OA values for all the tested methods were higher than 0.9; the LME algorithm, BS, and K-means also returned PA values above 0.9, while the PA of the TS method was slightly lower than 0.9 due to the use of the DSM as the input for starting the algorithm. At the time of the survey, the plants had a height of 0.3 m, equivalent to a few image GSDs; therefore, the errors in the photogrammetric processing (column height in Table 3) in this specific case affected the results of the canopy detection.
Precision viticulture is already widespread in the world, and recent articles have demonstrated the added value that remote sensing from UAV platforms can give to this sector [36]. Hence, numerous studies can be found in the literature dealing with vine canopy extraction [15,28,37,38]. The results presented in this work had accuracy values similar to those available in the literature. To the best of authors' knowledge, very few studies have already been published related to the detection of pear trees in orchards or tomato canopy, thus hampering the availability of reference values to compare the outputs. The assessment of the reliability of the illustrated results was based on similar case studies present in the literature. The detection of pear plants was performed with accuracy values slightly lower than the results obtained by [39], in two orchards in China, but close to the outcomes of the chestnut tree extraction described in [16]. The high values found for the tomato site were in agreement with the results presented in [17], about the estimation of crop emergence in potatoes.
The potential utility of this study in precision agriculture is high. The methods herein described allowed deriving from UAV imagery vegetation properties specifically related to the characteristics of the crop under investigation. In particular, in the irrigation management context, it must be taken into account that usually, irrigation for row-crops is provided through localized pressurized systems, wetting only the areas near the crop. The soil between rows is usually grass-covered, so analyzing crop maps (for example NDVI or thermal maps) of the field without masking the soil between rows could lead to errors when evaluating the water status or vigor of the crop under study. In Figures 11-13, NDVI maps for the three analyzed study sites are shown: on the left, the original maps, while on the right, the canopy maps generated after the extraction of crop rows. This information could be used in precision agriculture applications for mapping vegetation stress status or to optimize on-farm irrigation management. As an example, in [40], the use of a crop row detection method to delineate Site-Specific Management Zones (SSMZs) maps on a vineyard was described.
Moreover, it is crucial to stress that in the case of row-crops, the importance of using UAV imagery with respect to satellite imagery becomes fundamental. In satellite images, in fact, pixels have larger dimensions, and the operation of extracting crop rows is not possible. This leads to the presence of mixed pixels, for which Vegetation Indices' maps give information about both row and inter-row vegetation, therefore not very usable for agronomic inputs' management [41,42].

Conclusions
This study demonstrates the feasibility to perform crop row detection from high-resolution UAV imagery, for different crop types, including vineyards, orchards, and horticultural crops. DSM, RGB, or multispectral orthophotos can be used as input for the detection methods; in particular, the DSM performs better with crop characterized by high heights (i.e., grapevine and pear), even in the presence of inter-row weed, but it should be avoided to detect horticultural crops (i.e., tomato). Commercial RGB sensors give high accuracy values for crop row detection; therefore, for this purpose, it is not necessary to perform surveys mounting more expensive multispectral cameras, if no additional infrared information is required. Furthermore, in the presence of shadows produced by the crop canopy on the terrain, indices based on the NIR band and classification algorithms could lead to an overestimation of the crop rows.
Although all applied methods need some level of human intervention, among all, the local maxima extraction algorithm, developed ad hoc within this study, allows reaching the best compromise in terms of time-cost, automation, and quality of results. Bayesian segmentation applied on VIs performs better than the other methods in the presence of bare soils, but it depends on a priori information.