Early Detection of Broad-Leaved and Grass Weeds in Wide Row Crops Using Artificial Neural Networks and UAV Imagery

: Significant advances in weed mapping from unmanned aerial platforms have been achieved in recent years. The detection of weed location has made possible the generation of site specific weed treatments to reduce the use of herbicides according to weed cover maps. However, the characterization of weed infestations should not be limited to the location of weed stands, but should also be able to distinguish the types of weeds to allow the best possible choice of herbicide treatment to be applied. A first step in this direction should be the discrimination between broad-leaved (dicotyledonous) and grass (monocotyledonous) weeds. Considering the advances in weed detection based on images acquired by unmanned aerial vehicles, and the ability of neural networks to solve hard classification problems in remote sensing, these technologies have been merged in this study with the aim of exploring their potential for broadleaf and grass weed detection in wide-row herbaceous crops such as sunflower and cotton. Overall accuracies of around 80% were obtained in both crops, with user accuracy for broad-leaved and grass weeds around 75% and 65%, respectively. These results confirm the potential of the presented combination of technologies for improving the characterization of different weed infestations, which would allow the generation of timely and adequate herbicide treatment maps according to groups of weeds.


Introduction
Weeds are one of the main causes of crop losses in arable crops worldwide [1]. Traditionally, their control has been addressed through the application of herbicides to the entire crop field without taking into account that weeds usually have a patchy distribution and there are weed-free areas [2][3][4][5]. This has led to excessive consumption of herbicides which causes economic consequences and environmental concerns. To decrease both problems, there is a set of guidelines reported in European legislation addressing the Sustainable Use of Pesticides [6,7] which are compatible with the use of site-specific weed management (SSWM) techniques that allow the design and application of herbicide treatments that target only the areas where weeds proliferate. One of the key components of SSWM is the aim of providing accurate and timely early weed control based on weed infestation maps obtained by proximal (ground) or remote sensing [8].
In recent years there have been major advances in weed detection and different novel technologies have been developed that make it possible to detect weeds in the early postemergence stage from both ground and aerial platforms by means of computerized processing of data [9]. One of the most widely used platforms with the greatest potential for installing sensors for early weed detection has been unmanned aerial vehicles (UAVs) [9][10][11][12]. This is because UAVs have significant advantages over other remote platforms, such as the possibility of flying at low altitudes, providing very high spatial resolution imagery (less than 1 cm per pixel [13]), flying under clouds, using a wide spectral and size range of embedded sensors, and providing the option of obtaining images on demand at almost any time. In comparison with on-ground platforms, the main advantages are that the use of UAVs is less expensive and does not cause soil compaction, and they can fly to muddy or difficult to access areas [14]. Therefore, the analysis of UAV imagery has allowed the generation of localised treatment maps through which it is possible to greatly reduce the area treated in the fields and, consequently, the consumption of herbicides [15]. López-Granados et al. [16] studied different weed management scenarios based on weed threshold, which is the weed infestation level above which a treatment is required, as the baseline to generate herbicide treatment maps, achieving herbicide savings higher than 70%.
An ideal characterization of weed infestations should not be limited to the spatial identification of weed stands. It must also be able to perform an early discrimination between the types of weeds growing in the crop field in order to allow the best possible choice in the type of herbicide treatment to be applied and to avoid the use of a wide spectrum herbicide. A first step in this direction should be the separation between weeds into the two main groups: broad-leaved and grass weeds. This is a major challenge because crop plants, grass and broad-leaved weeds show a parallel phenological stage at early growth phases, as well as similar spectra and appearances. The detection of weeds using images taken by UAVs has been approached in different ways.
One of the most common methodologies for UAV-based weed detection is built on the assumption that plants growing outside the crop line are weeds, so algorithms have been developed that first detect the vegetation, and then delineate the crop lines and classify the plants growing outside the lines as weeds [15,17,18]. Other works have focused on detecting weeds by analyzing their spectral properties [19,20]. There has also been work to make it possible to detect weeds not only between (outside) but also within the crop lines, by combining the detection of crop lines with the use of automatic learning methods [21][22][23]. In a large number of these studies there has been a trend towards segmenting the image into objects. These objects are groups of homogeneous pixels which, in the analysis of very high-resolution spatial images, allow a reduction in the heterogeneity of the classes to be detected, and allow contextual and spatial information to be added to the spectral information contained in the raw UAV images. Therefore, it can be said that these works are framed in the analysis paradigm known as object-oriented image analysis (OBIA), in such way that the basic information unit for image classification is based on objects, not on pixels [24].
Artificial neural networks (ANNs) are widely used methodologies in remote sensing for the resolution of complicated classification problems [25,26]. One of the main characteristics of this type of model is its learning capacity. A standard neural network consists of many processors called neurons that are connected to each other [27]. The input neurons are activated by the information provided by the user, and when activated, they process this information and communicate it to the following neurons, thus reaching the desired result, which, in the case of remote sensing, is the classification of an image. The assignment of weights and relationships between the neurons is produced by means of automatic learning, carried out on a set of samples that are introduced as training in the design of the model. One of the most widely used types of neural networks in remote sensing is the multilayer perceptron (MLP) neural network [28] which has been successfully used in high-resolution satellite imagery for weed detection [29]. In MLP, neurons are organised into three or more layers. First, there is an input layer containing the information from the samples to be analyzed, followed by one or more hidden layers, and finally there is the output layer that produces the desired result.
Discrimination between broad-leaved and grass weeds has been addressed previously by using images taken on the ground or by ultrasonic sensors mounted in front of a tractor [30,31]. However, to our knowledge, this is the first time that the early detection of different groups of weeds in crop fields has been attempted using UAV imagery. Therefore, the aim of this paper is to explore the potential of combining images from UAV, and OBIA and MLP ANN techniques for discriminating between broad-leaved and grass weeds in broad-leaved wide-row crops.

Description of Study Fields and UAV Flights
This study was performed on two different wide-row crops, sunflower and cotton, selecting one field for each crop. Table 1 shows the information related to inter-row spacing in meters, as well as the location and area in hectares for both fields, being sunflower and cotton crops under rainfed and irrigation conditions, respectively. The fieldwork phase was carried out approximately 3 weeks after sowing. At this stage, both crops had an average height of 15-20 cm approximately ( Figure 1) and were naturally infested by different broad-leaved and grass weed species, with the cotton field showing a higher level of weed infestation. A wider variety of broad-leaved than grass weed species was identified in both crops ( Table 2).   A UAV quadcopter model MD4-1000 (microdrones GmbH, Siegen, Germany) was used as an aerial platform to acquire images. This UAV, with vertical take-off and landing, is battery powered and can be manually operated by radio control or autonomously by means of its global positioning system (GPS) receiver and its waypoint navigation system. A visible-light (RGB: red (R), green (G) and blue (B)) low-cost camera model from Sony ILCE-6000 (Sony Corporation, Tokyo, Japan) was attached to the UAV in order to capture the images. This camera was composed of a 23.5 × 15.6 mm APS-C CMOS sensor capable of acquiring 24 megapixels (6000 × 4000 pixels).
A UAV flight for each crop was carried out at the beginning and at the end of May for the sunflower and cotton fields, respectively. The UAV route was adjusted to fly at a 20 m altitude with a forward and side overlaps of 74% and 70%, respectively ( Figure 2). The flights were carried out at noon to take advantage of the sun's position and thus minimize shadows on images. The UAV flight and sensor configuration led to a spatial resolution of around 4 mm, which met the requirement of being lower than 10 mm for RGB sensors, established previously in a review about weed detection using UAV imagery [9].

Digital Surface Model (DSM) and Orthomosaic Generation
Once the UAV images were acquired for both crops, Agisoft PhotoScan Professional Edition software, version 1.2.4 build 2399 (Agisoft LLC, St. Petersburg, Russia), was used for generating the geomatic products. First, a three-dimensional (3D) point cloud was created by applying the structure-from-motion (SfM) technique. Then, a digital surface model (DSM) was generated from the previous 3D point cloud, which provided height information. The final product was an orthomosaic of the whole fields, in which every pixel contained RGB information as well as spatial information ( Figure 3). All geomatic products were created automatically. However, the manual localization of six ground control points (GCPs) [32,33] was necessary, with four placed in the corners and two in the center of each field, in order to georeference the geomatic products. The GCP coordinates were measured using two GNSS receivers: one was a reference station from the GNSS RAP network from the Institute for Statistics and Cartography of Andalusia (Spain), and the other one was a GPS with one centimeter accuracy, used as a rover receiver (model Trimble R4, Trimble company, Sunnyvale, California, United States). At the beginning of the image processing, the software matched the camera position and common points for every image, which allowed the refinement of the camera calibration parameters. More information about the PhotoScan functions can be found in [34].

Ground Truth Data
A total of 30 georeferenced sampling 1 × 1 m white frames were placed in both crops. These frames contained either broad-leaved, grass weeds, or both of them. Their placing ensured that sunflower and cotton fields had an equal chance of being sampled without operator bias [35]. After each frame was placed in every field, it was manually photographed by a conventional camera perpendicular to the ground. These photos were later used for ground truth data when carrying out the manual classification on orthomosaics in the image analysis phase, detailed in the following section.

Image Analysis
The workflow developed in the image analysis procedure is summarized in Figure  4. The following sections explain the details of each of the steps of this workflow.

Labeling of the Image Objects
The first step in the development of the image analysis procedure was the segmentation of the image in objects formed by adjacent and spectrally homogenous pixels according to a procedure known as segmentation. In this work, the multiresolution segmentation algorithm (MRSA) [36] was carried out using eCognition Developer 9 software (Trimble GeoSpatial, Munich, Germany). This algorithm is controlled by a set of parameters that must be fixed by the user: scale parameter, colour/shape weights, and smoothness/compactness weights. The first parameter controls the homogeneity of the pixels included in the objects and is related to their final size (more homogeneous objects lead to smaller sizes). The colour/shape weighting determines if the segmentation pays more attention to the spectral information or to the shape of the objects. The last parameter controls if the creation of the object is spatially compact or if it is dominated by the spectral homogeneity (smoothness). Based on previous experience in the optimization of UAV imagery segmentation for vegetation detection [17,22,37] and on some internal tests, the values of the parameters were: 15 for the scale parameter, 0.6/0.4 for colour/shape, and 0.5/0.5 for smoothness/compactness.
After the image segmentation, the results of which can be viewed in Figure 5b, the next step was the manual labeling of the objects inside the reference white frames that were laid on the fields as explained in Section 2.3. In this part of the workflow, objects were divided into the following classes: bare soil, shadow, broad-leaved weeds, and grass weeds. The high resolution of the UAV imagery (4 mm as stated before) allowed discernment between the classes and, in the case of doubt, the field photographs of the reference frames were used to help in the disambiguation process. This step was carried out by only one expert in order to avoid discrepancies in the manual labeling of the samples that were used in the generation of the neural network. Figure 5c shows one of the reference frames after the labeling of the image objects. The total number of labeled objects is shown in Table 3. Due to the early stage of the crop development, the class with the highest representation in the datasets was bare soil in both studied crops. The amount of objects labeled as weeds depended on the natural weed infestation level of the crops. The cotton field suffered a more intense infestation and, consequently, it presented a higher number of objects labeled as weeds. In order to feed the neural network with a balanced dataset, the number of labelled objects for each class was reduced to match the number of objects of the class with lower representation. Consequently, as the class with the fewest objects in the sunflower field was broad-leaved weeds with 635 objects, 635 samples from each class were randomly selected to match the number of broad-leaved weed objects. In the cotton field, the final number of objects in each class was 421, to match the number of grass weed objects, which was the least represented class in the training dataset.  (Table 4) was extracted from the labeled objects to feed the neural network. The extracted features were divided into three main categories: spectral, geometric, and textural. The first one was related with the spectral values extracted from the three channels of the RGB sensor and included normalized band values, some of their statistics, and a set of vegetation indices. The geometric features were related with the shape of the objects created by the MRSA, and also included the height of the objects above the soil. The textural features explained the variation of the spectral values inside the objects, and included variables related to the gray level co-occurrence matrix (GLCM), which is a tabulation of how often different combinations of pixel gray levels occur inside an object [38]. Among the textural features, there were also variables related to the gray level difference vector (GLDV) [39], i.e., the sum of the diagonals of the GLCM.  [46] Excess green ExG: (2*g)−r−b [45] Excess red ExR: (1.4*r)−g [47] ExGR: ExG-ExR [48] Color index of vegetation CIVE: (0.441*r)−(0.811*g)+(0.385*b)+18.78745 [49] Vegetative VEG: g/((r^0.667)*(b^(1−0.667))) [50] Combination 1 COMB1: (0.25*ExG)+(0.3*ExGR)+(0.33*CIVE)+(0.12*VEG) [46] Combination

Crop Detection
The first step after the manual labeling of the input for the neural network was the automatic detection of the objects belonging to the crop rows. In this step, an automatic OBIA algorithm previously developed and fully validated [15,17,22] was used. This algorithm detects the vegetation (crop and weeds) by applying a thresholding methodology to the ExG values of the segmented objects. Then, the algorithm splits the image into strips and, through an iterative process, searches for the orientation of the strips that best fits the distribution of the vegetation objects in the image. When this orientation is calculated and, taking into account the distance between the crop rows, the algorithm splits the image into strips representing the crop rows, and all the vegetation objects below these strips are classified as crop. More details about the algorithm can be found in the above referenced works.

Artificial Neural Network Creation
Once the crop objects were classified using the automatic OBIA algorithm, the remaining objects were classified as "soil", "shadow", "broad-leaved weed", or "grass weed", using a neural network. The feature values from the manually labeled objects were used for training and validating an MLP neural network in IBM SPSS Statistics software (IBM Corp. Released Version 26.0. Armonk, NY, USA). Of the total manually labeled objects, 60% were used to train the neural network, 20% were used as a test set to track errors during the training and to avoid overfitting [52], and the remaining 20% were reserved to validate the accuracy of the neural network. The size of the MLP neural network is defined as the size of the input layer × the size of the hidden layer × the size of the output layer. In the parameterization of the neural network generation, the input layer was formed by the 49 extracted object features, the output layer contained 4 neurons, corresponding to the 4 classes sought, and SPSS was configured to optimize the number of neurons in the hidden layer in a range between 1 and 50. Batch training was used in the generation of the neural network. This method updates the synaptic weights of the neurons when all the training data have been passed. The optimization algorithm applied in the batch training was the scaled conjugate gradient, a fully automatic algorithm that does not require the input of parameters by the user [53].

Validation
The performance of the MLP neural network was assessed using the confusion matrix [54], created from the validation datasets in both crops. Based on the confusion matrix, the overall accuracy (percentage of pixels correctly classified) was calculated, as well as the user's (or commission error indicating the percentage of pixels classified as a class that should have been classified as a different class) and the producer's accuracy (or omission error indicating the proportion of pixels from a class that were misclassified as a different class) for the four classes assigned by the neural network: soil, shadow, broad-leaved weed, and grass weed.

Results and Discussion
The optimization process of the hidden layer carried out by the software in the development of the neural network led to this layer having eight neurons for cotton and sunflower crops. Tables 5 and 6 show the classification results in each of the sample subsets for sunflower and cotton, respectively. The overall accuracy for the sunflower field was 83.64%, whereas for cotton it was slightly lower, at 78.16%. In both cases the accuracy was around 80%, so it can be said that satisfactory accuracies were obtained. The final accuracies achieved in the validation data set were similar to those obtained in the training and test subsets, where values in the range of 80% were also obtained.
Analyzing Tables 5 and 6 in detail, it can be seen that the highest accuracies were obtained in the detection of soil and shadow, whereas the classes among which there was most confusion were the different types of weeds. However, taking into account the difficulty of the task and the similarity of broad-leaved and grass weed classes, the classification of both groups of weeds was good. Observing the producer's accuracy, it can be seen that around 75% of the broad-leaved and grass weeds in the sunflower field were correctly classified; meaning that some pixels of the broad-leaved and grass weeds were not identified and thus the procedure underestimated the total area of every weed patch. From the point of view of the user of the classification, 78.81% of the objects classified as broadleaved weeds actually belonged to this class. According to these results, a treatment map based on the neural network would therefore allow broad-leaved weeds to be specifically treated with a high level of precision. This precision would be less for grass weeds, since, analyzing the user's accuracy, it could be seen that 67.46% of the objects classified as this type of weed actually corresponded to this type of vegetation. From an agronomic point of view, it would be desirable to consider that the producer's accuracy would be lower and the user´s would tend to be higher, thus weed patches would be less likely to be missed, taking into account the likelihood that farmers would choose to treat weed-free zones rather than assume the risk of allowing weeds to go untreated [55].
In the cotton field the results of the user's accuracy for weeds were very similar to those obtained for the sunflower field. Consequently, the accuracy of a possible treatment map in the cotton field would also be higher for broad-leaved weeds. The fact that the accuracies obtained for the sunflower crop were slightly higher could be linked to the fact that in this crop the variability of weed species was lower (Table 2). Therefore, in the creation of the neural network the group of weed training samples had a higher homogeneity and it was easier for the software to calculate a group of parameters that distinguished broad-leaved from grass weeds. This is in agreement with Lottes et al. [56] who created a classification scheme to discriminate saltbush and chamomile from "other weeds" in a sugar beet field using UAV imagery, and they also reported the heterogeneity of the class "other weeds" as one of the plausible reasons for this class having lower accuracy in their classification.  Tables 7 and 8 show the 10 most important variables in the creation of the neural network for sunflower and cotton, respectively. Complete tables including the importance of all variables can be consulted in Appendix A. In both crops, brightness was the most important variable, which is probably related to the excellent discrimination of shadows from the rest of the classes in the classification. In the sunflower field, and without counting brightness, seven of the most important variables were spectral, including HUE, the normalized red band and several vegetation indices. Among these most important variables, there was only one of a textural type, the GLDV entropy. For cotton the situation was different, as of the 10 most important variables five were of a textural type, most of them related to GLCM. This importance of using textural features for weed classification using UAV imagery has been previously reported in the scientific literature [9,23,57].
It is also noteworthy that in the sunflower crop the height of the objects above the ground was chosen among the most important variables, which could indicate that the weeds in this field showed a more erect bearing than in cotton, and therefore that the height was a determining factor in the classification. The importance of height in weed discrimination was also reported by Zisi et al. [57], whose results improved when including this feature in their analysis. In the neuronal network for the sunflower crop, the 10 most important variables had values of standardized importance higher than 70%, whereas in the case of cotton, only the brightness presented an importance higher than 70% whereas the others had much lower importance, that being between 66% and 49%. This could indicate that in the case of sunflowers, more variables were important for the correct classification of objects, whereas in the case of cotton fewer variables were relevant. A feature selection procedure was not carried out in our study since some authors [21] reported that feature selection was not a robust methodology when the model is intended to be applied to other crop types or fields.
Comparing Tables 7 and 8 it can be seen that the only variables that coincided within the first 10 were brightness, ExGr, NRGDI, and g. All of these are spectral variables, and two of them are spectral indices. The importance of these variables in the generalization of machine learning models for weed detection in different crops and fields has been previously reported by Veeranampalayam Sivakumar et al. [58]. These authors stated that the addition of vegetation indices in the creation of convolutional neural networks increased the ability of these models to generalize their results to different crops and fields. In the scientific literature there are previous works that, similarly to ours, combine spatial information (detection of crop lines) with advanced classification methods such as random forest [22,23], convolutional neural networks [21], or a support vector machine [59] for weed detection in UAV imagery. Some of these works achieved slightly better accuracy metrics, but this is probably due to the fact that they detected weed patches and they did not distinguish between different types of weeds. If the distinction between broad-leaved and grass weeds had not been addressed in this work, the overall accuracies obtained would have been 95.03% for sunflower and 88.51% for cotton, values that are close to the 94.5% of overall accuracy achieved by Gao et al. [23] in their approach with no discrimination between different types of weeds. Another important difference between the above-mentioned works and the methodology presented herein is that in those works the machine learning methods used could be trained without user intervention. This is because the crop rows were detected in a first step, and then the vegetation objects located outside the crop row were used as training for the "weed" class. In the present work, manual classification of the samples had to be done, as it was necessary to differentiate between broad-leaved and grass weeds.
It is relevant to highlight that our research used a low-cost RGB camera, which demonstrates that standard RGB imagery can effectively distinguish different groups of weeds. This is important because, as highlighted by Hassler and Baysal-Gurel [60], when using a higher spectral resolution sensor (e.g., in multispectral or hyperspectral ranges) the image processing is more complex and usually involves previous calibration and data correction steps. Furthermore, using a multispectral sensor implies the need to choose the optimal number of bands and their wavelengths.
Our results not only demonstrate the potential of using specific herbicides but also of identifying areas that would not require treatment. Both achievements would certainly offer the possibility of using specific herbicides against broad-leaved or grass weeds and relevant savings in applications, which could highly improve the SSWM strategy with further economic, agronomic and environmental repercussions. To the best of the authors' knowledge, this is the first time that the discrimination of broad-leaved and grass weeds has been achieved using UAV imagery. Furthermore, this objective has been carried out in commercial fields of two important herbaceous crops: sunflower and cotton. As the presented methodology has been developed in certain specific conditions, i.e., in early season with crops having an average height of 15-20 cm, and with image acquisition on sunny days with low wind, future research will try to confirm the potential of the current workflow in other phenological stages, crops, and with different meteorological and lighting conditions.

Conclusions
This study shows that the application of ANN in an OBIA environment to images taken with a low-cost-RGB sensor embedded in a UAV in wide-row herbaceous crops has the potential for discriminating between broad-leaved and grass weeds. To the best of the authors' knowledge, this is the first time that this objective has been addressed. It is also remarkable that the work was carried out in commercial fields with natural weed infestations, which made the achievement of this objective more difficult than if it had been performed under controlled conditions in experimental fields. Future research will address the analysis of more wide-row crop species, such as sugar beet and potato, and the use of more advanced classification methods such as convolutional neural networks to explore the discrimination between broad-leaved and grass weed species. Another future objective will be the generation of site-specific treatment maps oriented to differential treatment of broad-leaved and grass weeds.

Data Availability Statement:
The datasets generated during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.
Appendix A   Table A1. Importance of all the variables included in the artificial neural network for sunflower.