Reference Measurements in Developing UAV Systems for Detecting Pests, Weeds, and Diseases

: The development of UAV (unmanned aerial vehicle) imaging technologies for precision farming applications is rapid, and new studies are published frequently. In cases where measurements are based on aerial imaging, there is the need to have ground truth or reference data in order to develop reliable applications. However, in several precision farming use cases such as pests, weeds, and diseases detection, the reference data can be subjective or relatively difﬁcult to capture. Furthermore, the collection of reference data is usually laborious and time consuming. It also appears that it is difﬁcult to develop generalisable solutions for these areas. This review studies previous research related to pests, weeds, and diseases detection and mapping using UAV imaging in the precision farming context, underpinning the applied reference measurement techniques. The majority of the reviewed studies utilised subjective visual observations of UAV images, and only a few applied in situ measurements. The conclusion of the review is that there is a lack of quantitative and repeatable reference data measurement solutions in the areas of mapping pests, weeds, and diseases. In addition, the results that the studies present should be reﬂected in the applied references. An option in the future approach could be the use of synthetic data as reference.


Introduction
The principle of precision farming is to treat different parts of agricultural fields according to their specific needs. Remote sensing provides various approaches for detecting differences within fields. Drones, also called unmanned aerial vehicles (UAV) or unmanned aerial systems (UAS), are becoming an essential part of remote sensing tools in the precision farming context. UAVs show unlimited potential in agriculture [1]. While there is great pressure to boost food production [2], these UAV technologies, along with smart farming and new data management strategies such as digital twins [3,4], have the potential to revolutionise agriculture. Digital twins is the collection of digital data representing a physical object. Their goal is to remove fundamental constraints concerning place, time, and subjective observations. Rather than conducting direct decision support, remote sensing data can enrich digital twins [4] in the future. Thus far, remote sensing and moreover UAV imaging are addressed to offer direct decision support in the precision farming context, and new studies are constantly being produced [1].
When considering the measurement techniques and technologies, their accuracy is a fundamental question. To assess the measurement accuracy, or more widely, the measurement quality, the true values of the measured quantities, i.e., ground truth, are needed. The ground truth is assumed to be or validated as true; it is considered the response of the real world, and it is the ideal expected result. The author in [5] studied data quality aspects more thoroughly in precision farming. To provide ground truth for UAV measurements, external and possible independent measurements, later also referred to as reference measurements, are needed. Reference data involve measurements or observations of objects or phenomena of interest. The reference measurements need to be planned with the UAV campaign, and they should often be collected or measured simultaneously, or immediately before or after the flying campaign. Reference data are used to train the remote sensing analysis method, to assess accuracy, and to verify the applied methodology.
Several reviews of UAV applications in agriculture have been published recently. Ref. [1] reviewed platforms, controls, and applications, while the authors in [6,7] focused on the applications and updated an earlier study [8]. The review in [9] had a biodiversity approach. To our knowledge, no review studies focused on the reference data for UAVs in the agricultural context.
There is a wide variety of UAV applications in precision farming, ranging from miniaturised pollinators [10] and UAV spraying [1] to irrigation scheduling [11]. Our focus is on the most common applications that support farm machinery practices during the growing season. In addition, we exclude nitrogen and yield mapping based studies due to the different nature of the reference data. In those studies, the references can be measured from vegetation samples [12][13][14]. Instead, our study focuses on weeds, pests, and diseases in the precision farming context. Pests and diseases cause annual yield losses between 20% and 40% [15]. Furthermore, weed management was found to save billions of dollars [16]. Compared with these precision farming studies, the traditional phenotyping methods used in breeding rely on trained experts to make a visual assessment of crop vigour and other abiotic stresses [17]. However, traditional crop phenotyping methods are comparatively slow, costly, laborious, and not easily applicable over large areas due to the number of varieties to be considered and the frequent requirement of destructive sampling [18]. It is interesting to study practical solutions that are used to collect reference data at the precision farming scale. Integrated pest management is becoming a required action across the European Union (Directive 2009/128/EC), and farmers are therefore expected to use chemical plant protection only when a need for it is recognised, and even then, it should be applied as precisely as possible, and the impacts are expected to be monitored by the farmer. The need for precision farming actions and UAVs for observation and monitoring is therefore becoming of increasing importance.
The articles studied in this review include a plant or crop that is cultivated, and these studies utilised UAV imaging to detect weeds, pests, or diseases. The term "weed" is used as a synonym for invasive/noxious plant species [19], in contrast to alien species. Pests are insects or small animals that are harmful to plants, and diseases are abiotic and biotic plant diseases that harm plants. In the detection applications based on UAV imaging, the attempt is to distinguish weeds individually or in patterns. Similarly, there were also attempts to distinguish the symptoms of pests and diseases in the target plants individually or in patterns. The term "campaign" refers here to a field visit event. On the whole, the aim of this study is to study practical solutions and the advantages of thorough planning of reference data collection in the context of UAV imaging campaigns. Our main research question therefore is, what kind of reference data UAV imaging studies related to pests, weeds, and diseases are used?

Methodology
Our bibliographic analysis in the domain under study involved four main phases: (a) the collection of related studies; (b) a detailed review; (c) an analysis of the studies; and (d) a comparison with other agricultural imaging topics. In the first phase, a keyword-based search for journal articles and high-quality conference papers was performed from the web scientific indexing service Web of Science (WoS) core collection. Similar to a review of machine learning in agriculture [20], we made a comprehensive review of selected applications. For search keywords, we used questions that restrict the findings to drones and that presented applications in the area of precision farming. Our search terms were (1) "drone*" or "uav*" or "unmanned" in the title section; (2) "pest*" or "*disease*" or "weed*" in the title section; and (3) "precision farming*" or "precision agri*" or "site specific" in the topic. The topic includes the title, abstract, author keywords, and Keywords Plus. Adding "smart farming" or UAS (unmanned aircraft systems) to the search criteria did not yield more results. The last search was made on 27 October 2020. In total, 41 results were retrieved, of which 36 were accepted for the review: 27 studies concerned weed detection; 3 studied pest detection; and 6 studied disease detection. One of the included papers [21] only had an extended abstract available in English. The excluded studies addressed spreading systems; they were reviews of other topics or were otherwise irrelevant to our topic. There were 6 publications from 2020, 5 from 2019, and 12 from 2018. The earliest publication was from 2012. The eight most cited articles considered weed applications.
In the second phase, the selected 36 papers were analysed one by one while the following questions were considered:

1.
What was the research topic, and what were the studied crops and the precision farming scenario that the research targeted? 2.
What were the tools and parameters for the imaging of the weeds/pests/diseases? 3.
What was the applied reference data, and how were they used? 4.
How was the operational timing presented? 5.
What were the processing and analysis methods, including data resampling, for different resolutions? 6.
What are conclusive procedures for planning an imaging campaign? 7.
How were these methods differentiated from other agricultural imaging topics?
For question 7, other relevant research articles were analysed in contrast to the articles included in our core review. We present our findings in Section 3.
As the selected papers addressed precision farming applications, almost all had a future goal of developing a site-specific on-time application for pesticides or herbicides. However, one paper [22] was dedicated to developing phenotyping tools for disease resistance. All studies had a certain technology solution as a target, and they could therefore be rated using the technology readiness level (TRL) [57] classification. Most of the studies had a concept that was tested in a relevant environment and at most in a couple of different fields from a single year. They all therefore fit between the TRLs of 3-6. The study by Rasmussen et al. [33] had the highest TRL of 6; it described procedures for detecting green weeds in preharvest cereals using off-the-shelf UAVs, and the authors suggested a simple model for preharvest weed mapping with separate ICT tools. Studies [42,44] conducted the first outdoor tests with selected tools and therefore presented the lowest TRLs of 3.

UAV Imaging Campaigns
All the imaging campaigns were operated in constant altitudes. Figure 1a presents the main flying altitudes in relation to article publication years, with one main altitude per publication. The highest altitude (excluded from Figure 1) was 400 m [44], and the  [37]. Three different altitude categories may be identified with equal distribution: (1) close range imaging at 1-25 m, optimised for spotting detailed information, often from individual images; (2) low-altitude imaging, 25-50 m being currently optimal for optics, especially with multispectral cameras; and (3) highaltitude drone imaging at 70-120 m, optimised for the mapping of large areas. The overall average flying altitude was 49 m in all studies except for disease studies, which was 43 m. There were no trends in flying altitudes in relation to publication years after 2017. Figure  1b shows the relation between the size of the study area and the flying altitude. Many of the studies tested different altitudes [24,26,33,34,45,46,49,50].

UAV Imaging Campaigns
All the imaging campaigns were operated in constant altitudes. Figure 1 (a) presents the main flying altitudes in relation to article publication years, with one main altitude per publication. The highest altitude (excluded from Figure 1) was 400 m [44], and the lowest altitudes were 1 m [34] and 2 m [37]. Three different altitude categories may be identified with equal distribution: (1) close range imaging at 1-25 m, optimised for spotting detailed information, often from individual images; (2) low-altitude imaging, 25-50 m being currently optimal for optics, especially with multispectral cameras; and (3) highaltitude drone imaging at 70-120 m, optimised for the mapping of large areas. The overall average flying altitude was 49 m in all studies except for disease studies, which was 43 m. There were no trends in flying altitudes in relation to publication years after 2017. Figure  1b shows the relation between the size of the study area and the flying altitude. Many of the studies tested different altitudes [24,26,33,34,45,46,49,50]. Two relatively larger areas can be noted, both for weed mapping in several fields: 18 fields covered 110 hectares in the study by Lambert et al. [32], and eight fields covered 20 hectares in the study by Rasmussen et al. [33]. The latter study also tested several flight altitudes (10,20,30,40, and 50 m) and concluded that 40 m was practical for weed mapping with mature cereals. Most of the studies focused on one or two nearby field plots and had an average study area coverage of five hectares. The imaging areas as a whole emphasise the nature of experimental studies. The presented field areas represent the reported test sites that were imaged with the main instruments. In some close-range applications, the whole field was not imaged.
Most of the studies produced orthophotos. They were used in almost 70% of cases. The other studies used raw images. However, in the disease studies, these two approaches were evenly split. According to the authors in [33], whether orthomosaics or individual nadir images are used has a minor impact on image analysis. For orthomosaic computation, the commercial Agisoft Photoscan software (Agisoft LLC, St. Petersburg, Russia) was the most applied, while in four studies, the Pix4D Mapper software (Pix4D S.A., Prilly, Switzerland) was used. The latter was the most common in the UAVs for precision agriculture reviews [7]. In our review, the planned image overlaps varied greatly, with the highest being 90% side and forward overlaps from 70 m [30], and the lowest being 30% side and 60% forward overlaps. Two relatively larger areas can be noted, both for weed mapping in several fields: 18 fields covered 110 hectares in the study by Lambert et al. [32], and eight fields covered 20 hectares in the study by Rasmussen et al. [33]. The latter study also tested several flight altitudes (10,20,30,40, and 50 m) and concluded that 40 m was practical for weed mapping with mature cereals. Most of the studies focused on one or two nearby field plots and had an average study area coverage of five hectares. The imaging areas as a whole emphasise the nature of experimental studies. The presented field areas represent the reported test sites that were imaged with the main instruments. In some close-range applications, the whole field was not imaged.
Most of the studies produced orthophotos. They were used in almost 70% of cases. The other studies used raw images. However, in the disease studies, these two approaches were evenly split. According to the authors in [33], whether orthomosaics or individual nadir images are used has a minor impact on image analysis. For orthomosaic computation, the commercial Agisoft Photoscan software (Agisoft LLC, St. Petersburg, Russia) was the most applied, while in four studies, the Pix4D Mapper software (Pix4D S.A., Prilly, Switzerland) was used. The latter was the most common in the UAVs for precision agriculture reviews [7]. In our review, the planned image overlaps varied greatly, with the highest being 90% side and forward overlaps from 70 m [30], and the lowest being 30% side and 60% forward overlaps.
The UAV platforms also showed large variation. The most used was the Microdrones MD4-1000 quadcopter (Microdrones, Rome, NY, USA), which was one of the first userfriendly and economic solutions on the market about ten years ago. Other frequently used platforms were the lightweight DJI Phantom drones (DJI, Shenzhen, China), representing the first wave of cheap off-the-shelf UAVs. The other UAVs used were rotorcraft-type multicopters such as the Scanopy Quadcopter (Scanopy, Quincy, France), DJI s800 EVO, DJI Matrice 600 (DJI, Shenzhen, China), HiSYstems Hexa XL (MikroKopter, Moormer-land, Germany), Geo-Konzept XR6 (geo-konzept GmbH, Adelschalg, Germany), 3DRobotics SOLO (3DR, Berkeley, California, USA), and Hydra-12 Onyxstar (AltiGator, Waterloo, Belgium), and the fixed-wing SenseFly Ebee (SenseFly SA, Cheseaux-sur-Lausanne, Switzerland) and Tuffwing Mapper (TuffWing LLC, Boerne, Texas, USA). These UAVs can be classified in four categories: (1) lightweight fixed wings; (2) lightweight multicopters with integrated camera; (3) more customisable multicopter bodies with their own camera setups; (4) commercial medium-sized platforms with custom cameras. Figure 2 illustrates examples of such drones with relevant cameras. These present a wide variety of available commercial UAVs where only very small multicopters, large fixed winds, and very large multicopters were not represented. The UAV platforms also showed large variation. The most used was the Microdrones MD4-1000 quadcopter (Microdrones, Rome, NY, USA), which was one of the first userfriendly and economic solutions on the market about ten years ago. Other frequently used platforms were the lightweight DJI Phantom drones (DJI, Shenzhen, China), representing the first wave of cheap off-the-shelf UAVs. The other UAVs used were rotorcraft-type multicopters such as the Scanopy Quadcopter (Scanopy, Quincy, France), DJI s800 EVO, DJI Matrice 600 (DJI, Shenzhen, China), HiSYstems Hexa XL (MikroKopter, Moormerland, Germany), Geo-Konzept XR6 (geo-konzept GmbH, Adelschalg, Germany), 3DRobotics SOLO (3DR, Berkeley, California, USA), and Hydra-12 Onyxstar (AltiGator, Waterloo, Belgium), and the fixed-wing SenseFly Ebee (SenseFly SA, Cheseaux-sur-Lausanne, Switzerland) and Tuffwing Mapper (TuffWing LLC, Boerne, Texas, USA). These UAVs can be classified in four categories: (1) lightweight fixed wings; (2) lightweight multicopters with integrated camera; (3) more customisable multicopter bodies with their own camera setups; (4) commercial medium-sized platforms with custom cameras. Figure 2 illustrates examples of such drones with relevant cameras. These present a wide variety of available commercial UAVs where only very small multicopters, large fixed winds, and very large multicopters were not represented. In all the studies, the drone camera was pointed directly at the ground, i.e., in the direction of the nadir. This technique is adopted from aircraft-based aerial imaging. The optimal direction to observe the phenomenon of interest was not covered or questioned in the articles, although several studies analysed the raw images directly. Some studies generated 3D models based on the nadir images.
The dominant camera solutions were RGB (red green blue) cameras. CIR (colour infrared) or NIR (near-infrared) cameras were used in eleven studies, and nine studies used multispectral cameras. Only one study also used a hyperspectral camera (Headwall Nano-Hyperspec, Headwall Photonics Inc., Boston, MA, USA); this particular study focused on the phylloxera pest on grapevines [42]. The reporting of the wavelengths used in the applications varied. In the reported cases, the applied wavelengths were as follows (with the percentage of how many cases used that wavelength in their analysis): blue 450 nm, 78%; green 560 nm, 100%; red 660 nm, 96%; red-edge 735 nm, 39%; NIR 780 nm, 22%; NIR 800 nm, 9%; and NIR 850 nm, 22%. The review in [7] found that only 20% of the vegetation health studies used RGB images, while the rest applied multispectral images. Those findings differed from ours.
The average ground sample distance (GSD) in the reported studies was 4 cm, and 30% of the studies had a GSD smaller than 1 cm. The lowest GSD in the data capture phase was 25 cm with grapevine pests [42], and the highest reported resolution was 30,000 pixels per soybean leaf [37] collected with a Sony Exmor RGB camera (Sony, Tokyo, Japan). The imaging resolutions are examined in more detail in the next chapter.
The reviewed studies focused on naturally infested fields. Artificial inoculation was used only in the maize phenotyping study [22]. Untreated controls and delayed sprayings were used in some cases, but all the weeds in all cases were naturally infested. In all the studies, the drone camera was pointed directly at the ground, i.e., in the direction of the nadir. This technique is adopted from aircraft-based aerial imaging. The optimal direction to observe the phenomenon of interest was not covered or questioned in the articles, although several studies analysed the raw images directly. Some studies generated 3D models based on the nadir images.
The dominant camera solutions were RGB (red green blue) cameras. CIR (colour infrared) or NIR (near-infrared) cameras were used in eleven studies, and nine studies used multispectral cameras. Only one study also used a hyperspectral camera (Headwall Nano-Hyperspec, Headwall Photonics Inc., Bolton, MA, USA); this particular study focused on the phylloxera pest on grapevines [42]. The reporting of the wavelengths used in the applications varied. In the reported cases, the applied wavelengths were as follows (with the percentage of how many cases used that wavelength in their analysis): blue 450 nm, 78%; green 560 nm, 100%; red 660 nm, 96%; red-edge 735 nm, 39%; NIR 780 nm, 22%; NIR 800 nm, 9%; and NIR 850 nm, 22%. The review in [7] found that only 20% of the vegetation health studies used RGB images, while the rest applied multispectral images. Those findings differed from ours.
The average ground sample distance (GSD) in the reported studies was 4 cm, and 30% of the studies had a GSD smaller than 1 cm. The lowest GSD in the data capture phase was 25 cm with grapevine pests [42], and the highest reported resolution was 30,000 pixels per soybean leaf [37] collected with a Sony Exmor RGB camera (Sony, Tokyo, Japan). The imaging resolutions are examined in more detail in the next chapter.
The reviewed studies focused on naturally infested fields. Artificial inoculation was used only in the maize phenotyping study [22]. Untreated controls and delayed sprayings were used in some cases, but all the weeds in all cases were naturally infested.

Reference Measurements for Development and Evaluation
We found two main uses of the reference measurements, i.e., to teach the classification system and to evaluate the study results. In a typical case, the reference measurements were Remote Sens. 2021, 13, 1238 6 of 21 split randomly into training and validation sets, and no additional reference data were used for evaluation. Three different types of reference data were used: in situ measurements, visual analysis from the image data, and sample plot trials. About half the studied cases included only visual observations. Table 1 presents information about the various reference methods and the image data collected in the articles. They are divided into nine categories based on the crops and the application goal: diseases, pests, weeds in a rice field, weeds in a sunflower field, weeds in a maize field, weeds in a wheat field, weeds in a barley field, weeds in other farm fields, and mixed meadows. The references and imaging approaches used were spread across these categories.  [53,54,56] The following list provided by Vanegas et al. [42] is an example of the attributes used for grapevine pest detection:
Panel (group of four to six grapevines) 7.
Digital vigour model (calculated classes 2-5 based on imaging) 10. Multispectral derived indices and bands 11. Hyperspectral derived indices and bands 12. Soil conductivity data Attributes 1-6 consider the spatial location of the studied unit; attributes 7 and 12 provide external reference information; attribute 8 is a reference based on visual observations; and attributes 9-11 are based on drone imaging data. The total number of reference measurements varied from 12 to 1000. Although in some cases the definition of single reference Remote Sens. 2021, 13, 1238 7 of 21 measurement is not clear, about half of the studies applied only less than 100 reference samples. Both extreme values were field observations of weeds in a square [30,33]. Robust machine learning applications may use thousands of reference samples.

In Situ Measurements
Two studies applied spectroradiometer measurements on the test sites [51,54,56]. However, all the studies applying multispectral cameras used reference panels at the test site. The spectroradiometers were used to calibrate reflectance. For example, on the day of the flight, Wang et al. [51] used a spectroradiometer to measure calibration panels from five points and averaged the results. The study in [56] used a spectrometer (Unispec Enterprises Inc., Washington, DC, USA) to measure spectral signatures of different weeds and used data to identify interesting wavelengths. Figure 3 presents an example of the reflectance signatures of weeds in our similar study in 2019.
9. Digital vigour model (calculated classes 2-5 based on imaging) 10. Multispectral derived indices and bands 11. Hyperspectral derived indices and bands 12. Soil conductivity data Attributes 1-6 consider the spatial location of the studied unit; attributes 7 and 12 provide external reference information; attribute 8 is a reference based on visual observations; and attributes 9-11 are based on drone imaging data. The total number of reference measurements varied from 12 to 1000. Although in some cases the definition of single reference measurement is not clear, about half of the studies applied only less than 100 reference samples. Both extreme values were field observations of weeds in a square [30,33]. Robust machine learning applications may use thousands of reference samples.

In Situ Measurements
Two studies applied spectroradiometer measurements on the test sites [51,54,56]. However, all the studies applying multispectral cameras used reference panels at the test site. The spectroradiometers were used to calibrate reflectance. For example, on the day of the flight, Wang et al. [51] used a spectroradiometer to measure calibration panels from five points and averaged the results. The study in [56] used a spectrometer (Unispec Enterprises Inc., Washington, DC, USA) to measure spectral signatures of different weeds and used data to identify interesting wavelengths. Figure 3 presents an example of the reflectance signatures of weeds in our similar study in 2019. A GNSS (global navigate satellite system) has typically been used with reference data collection [12][13][14], but only a few exploited it in this review. The study presented by Zisi et al. [53] used GNSS positioning and collected large homogenous weed patches from a meadow, and the study by de Castro et al. [45] measured sunflower heights with a ruler and used a GNSS for positioning.
Three weed studies [24,25,46], all published before 2017, applied 1 m × 1 m reference plots that were semi-randomly placed in the field. The study by Lopez-Granados et al. [24] explained that each of the applied 50 squares were georeferenced with a differential GNSS and were photographed to compare the observed weed density with the outputs from the image classification of the weed density estimation. The number of weeds was estimated from the photographs of the squares. According to the study in [25], the weed coverage in the on-ground photographs was determined through the application of a greenness index.  Figure 4a) and a small part of an orthophoto originally collected with a DJI Phantom 4 drone at an altitude of 30 m. The images were taken on 7 June in Hyvinkää in Finland in an organic oat field and were automatically processed using the DroneDeploy software (DroneDeploy Inc., San Francisco, CA, USA). In this case, the squares were left in the field while the imaging was carried out.
Three weed studies [24,25,46], all published before 2017, applied 1 m × 1 m reference plots that were semi-randomly placed in the field. The study by Lopez-Granados et al. [24] explained that each of the applied 50 squares were georeferenced with a differential GNSS and were photographed to compare the observed weed density with the outputs from the image classification of the weed density estimation. The number of weeds was estimated from the photographs of the squares. According to the study in [25], the weed coverage in the on-ground photographs was determined through the application of a greenness index. Figure 4 presents a similar case at an early growth stage, showing a ground image ( Figure 4a) and a small part of an orthophoto originally collected with a DJI Phantom 4 drone at an altitude of 30 m. The images were taken on 7 June in Hyvinkää in Finland in an organic oat field and were automatically processed using the DroneDeploy software (DroneDeploy Inc., San Francisco, CA, USA). In this case, the squares were left in the field while the imaging was carried out.  The study in [26] used a 9 m × 9 m grid square, but collected weed scouting data from four small 0.1 m 2 counting frames at each centroid per grid square. The study presented by Lambert et al. [32] used 20 m × 20 m plots that were located manually. The groundtruth map was also drawn manually based on the assessment of relative weed densities by three trained observers. With a known location, Vanegas et al. [42] studied pest traps as a reference, and Chivasa et al. [22] estimated diseases visually from plot trials on a scale of 1 to 9. As an example of a simple approach, Kerkech et al. [40] stated that disease spread to all untreated areas and provided reference data for unhealthy grapevines.
All the UAV campaigns took place during the growing season, and it is often recommended not to visit the field to avoid crop damage caused by walking and the possible spread of pests and diseases. The authors in [33] avoided this by using sprayer tracks in the late ripening growth stages of barley and wheat. In their visual field studies, two weed The study in [26] used a 9 m × 9 m grid square, but collected weed scouting data from four small 0.1 m 2 counting frames at each centroid per grid square. The study presented by Lambert et al. [32] used 20 m × 20 m plots that were located manually. The ground-truth map was also drawn manually based on the assessment of relative weed densities by three trained observers. With a known location, Vanegas et al. [42] studied pest traps as a reference, and Chivasa et al. [22] estimated diseases visually from plot trials on a scale of 1 to 9. As an example of a simple approach, Kerkech et al. [40] stated that disease spread to all untreated areas and provided reference data for unhealthy grapevines.
All the UAV campaigns took place during the growing season, and it is often recommended not to visit the field to avoid crop damage caused by walking and the possible spread of pests and diseases. The authors in [33] avoided this by using sprayer tracks in the late ripening growth stages of barley and wheat. In their visual field studies, two weed scientists scored the number of square metres infected with thistle weed from each side of 10 m wide strips bordered by sprayer tracks. In each of their eight campaigns, four 600 m long strips were scored after being divided into 70 evaluation plots made visible from the air with blue sticks. The area of the evaluation plots ranged from 20 to 1000 m 2 , depending on weed density.

Visual Image Analysis
The majority of the studied articles applied manual visual analysis of UAV images to extract a reference. The methods were typically explained at general level. For example, "images were visually examined, and the plants were annotated" [34], or "the annotator was trained to draw rectangular bounding boxes around weed patches" [38]. Although visual UAV image analysis can be convenient to carry out, it can also be time-consuming. With a 0.54 hectare test site, the study by Huang et al. [28] reported that the manual labelling for 91 images took a total of 60 h. The weed density was obtained by visual judges under the instruction of agronomy experts working at the pixel level. The study in [51] used a graphic pad to digitise and classify infected regions of cotton. Two methods were commonly used to help visual classification and digitising: Image thresholding and crop row detection. Hough transform is identified as one of the most common machine vision methods for crop row detection [58,59]. The thresholds were adjusted interactively: If weed patches were misclassified as crops by the default threshold, the threshold was adjusted to a lower value and vice versa [33]. These visual analyses may always be subjective. Figure 5 shows an example of subjective weed segmentation based on the relative threshold determined visually by an expert. extract a reference. The methods were typically explained at general level. For example, "images were visually examined, and the plants were annotated" [34], or "the annotator was trained to draw rectangular bounding boxes around weed patches" [38]. Although visual UAV image analysis can be convenient to carry out, it can also be time-consuming. With a 0.54 hectare test site, the study by Huang et al. [28] reported that the manual labelling for 91 images took a total of 60 h. The weed density was obtained by visual judges under the instruction of agronomy experts working at the pixel level. The study in [51] used a graphic pad to digitise and classify infected regions of cotton.
Two methods were commonly used to help visual classification and digitising: Image thresholding and crop row detection. Hough transform is identified as one of the most common machine vision methods for crop row detection [58,59]. The thresholds were adjusted interactively: If weed patches were misclassified as crops by the default threshold, the threshold was adjusted to a lower value and vice versa [33]. These visual analyses may always be subjective. Figure 5 shows an example of subjective weed segmentation based on the relative threshold determined visually by an expert. In the cases where UAV images were manually studied, the assumption was that the study targets were visible to the naked eye. This assumption was studied by the authors in [33], where their study demonstrated that flight altitude in the range of 10 to 50 m with corresponding image resolution in the range of 3-15 mm per pixels did not influence detection of Cirsium arvense.

Timing of the Imaging
The UAV campaigns were carried out mostly in one of the following timeframes: (a) the early growth stage; (b) during herbicide treatments; and (c) the late ripening growth stage. If the growth stages were presented, they were presented in the Zadoks scale [60] for cereals or using BBCH (Biologische Bundesanstalt, Bundessortenamt und Chemische In the cases where UAV images were manually studied, the assumption was that the study targets were visible to the naked eye. This assumption was studied by the authors in [33], where their study demonstrated that flight altitude in the range of 10 to 50 m with corresponding image resolution in the range of 3-15 mm per pixels did not influence detection of Cirsium arvense.

Timing of the Imaging
The UAV campaigns were carried out mostly in one of the following timeframes: (a) the early growth stage; (b) during herbicide treatments; and (c) the late ripening growth stage. If the growth stages were presented, they were presented in the Zadoks scale [60] for cereals or using BBCH (Biologische Bundesanstalt, Bundessortenamt und Chemische Industrie) [61], which is based on Zadoks but is generalised to the growth stages of monoand dicotyledonous plants. These scales present the growth stages from 0 to 99, and more generally in 10 principal growth stages: germination (00); leaf development (10); formation of side shoots (20); stem elongation (30); booting, i.e., development of harvestable vegetative plant parts (40); inflorescence emergence (50); flowering (60); development of fruit (70); ripening (80); senescence (90). The early stages (BBCH lower than 30) included only weed studies but covered 60% of them. The authors in [50] found that the earlygrowth stages of sunflowers (14-16 BBCH) gave the best results for weed mapping. In the early-growth stage, the plant rows are visible; weeds can be disarrayed, and they can be larger and higher than cultivated plants. The UAV imaging at the mid-growth stage just before the precision application reveals the latest near real-time information. This can be challenging for a few reasons; for example, the weed plants were similar in size or in some cases smaller than the maize plants [25], but for pests and diseases, it can be essential. With weeds, only rice, maize, and sunflower campaigns were conducted in the mid-growth Remote Sens. 2021, 13, 1238 10 of 21 stages. The imaging during the late ripening stage was due to the occurrence of visual differences. For example, healthy vines did not begin to exhibit leaf discoloration [43], or the mature cereals' crop colour did not turn yellow yet [33]. Figure 6 presents an example of late-growth stage imaging, in which green couch grass (Elymus repens) can be detected from a yellow ripened wheat field. In such cases, the exploitation of a precision farming application can be done by patch-spraying glyphosate before or after harvest, during the next growing season, or by harvesting selectively.
vegetative plant parts (40); inflorescence emergence (50); flowering (60); development of fruit (70); ripening (80); senescence (90). The early stages (BBCH lower than 30) included only weed studies but covered 60% of them. The authors in [50] found that the earlygrowth stages of sunflowers (14-16 BBCH) gave the best results for weed mapping. In the early-growth stage, the plant rows are visible; weeds can be disarrayed, and they can be larger and higher than cultivated plants. The UAV imaging at the mid-growth stage just before the precision application reveals the latest near real-time information. This can be challenging for a few reasons; for example, the weed plants were similar in size or in some cases smaller than the maize plants [25], but for pests and diseases, it can be essential. With weeds, only rice, maize, and sunflower campaigns were conducted in the midgrowth stages. The imaging during the late ripening stage was due to the occurrence of visual differences. For example, healthy vines did not begin to exhibit leaf discoloration [43], or the mature cereals' crop colour did not turn yellow yet [33]. Figure 6 presents an example of late-growth stage imaging, in which green couch grass (Elymus repens) can be detected from a yellow ripened wheat field. In such cases, the exploitation of a precision farming application can be done by patch-spraying glyphosate before or after harvest, during the next growing season, or by harvesting selectively. Late campaigns also had some difficulties. The authors in [33] noted that a fraction of the weed shoots appeared under the crop canopy, and they collected references by walking in the field. This was an observation undertaken during a field visit. To select the optimal timing, the studies frequently observed the fields. The authors in [36] exploited satellite imagery from previous years for the preliminary optimisation of weed mapping in an oat field at the ripening stage.

Data Processing and Analysis Solutions
Tsouros et al. [7], in their related review of precision agriculture UAV applications, listed the three most common image processing methods for analysing UAV imagery for precision agriculture: (1) photogrammetric techniques, (2) machine-learning methods, and (3) vegetation indices calculations. As previously presented in our study, 70% of the Late campaigns also had some difficulties. The authors in [33] noted that a fraction of the weed shoots appeared under the crop canopy, and they collected references by walking in the field. This was an observation undertaken during a field visit. To select the optimal timing, the studies frequently observed the fields. The authors in [36] exploited satellite imagery from previous years for the preliminary optimisation of weed mapping in an oat field at the ripening stage.

Data Processing and Analysis Solutions
Tsouros et al. [7], in their related review of precision agriculture UAV applications, listed the three most common image processing methods for analysing UAV imagery for precision agriculture: (1) photogrammetric techniques, (2) machine-learning methods, and (3) vegetation indices calculations. As previously presented in our study, 70% of the studies applied photogrammetric techniques and calculated orthophotos after preliminary calibrations. In several cases, digital surface models (DSM) were also calculated. In some cases, the mosaic data were already resampled to a lower resolution at this stage.
As the next step, object-based image analysis (OBIA) was the most used approach. In OBIA, pixels are grouped into objects based on spectral similarity, shape, or neighbourhood. This required algorithm development, and the methods behind them typically played a key role in each article. For example, in study by de Castro et al. [45], the OBIA algorithm combined DSM, mosaics, and machine-learning techniques such as random forest (RF). The plant heights from DSM were estimated and used as a feature in the automatic sample selection by the RF classifier. Then, RF randomly selected a classbalanced training set, obtained the optimal feature's values, and classified the image, requiring no manual training and removing errors due to a subjective manual task [45]. Several studies applied OBIA [24,25,31,46,48,50,53] as an approach and applied the RF technique [22,23,32,36,45]. In a typical case, the data were randomly divided into training (70-80%) and validation (20-30%) sets. Other methods used techniques such as convolutional neural networks [27,[38][39][40], a K-means support vector machine [51], a multilayer perceptron model combined with automatic relevance determination [54], unsupervised ISODATA (Iterative Self-Organizing Data Analysis Technique) [30], and the supervised Kohonen network and counter-propagation artificial neural network [56]. The authors in [29] referred to LeCun et al. [62] in stating that fully convolutional networks (FCN) are an automatic feature learning algorithm that can address the disadvantages of OBIA approaches [33,63]. The number of training samples needed depends on the algorithm, the number of input variables, and the complexity of the problem [64]. In general, increasing the training sample size also increases the analysis classification accuracy [65]. From reference data, these methods need information about which class each reference belongs.
All of the studies aimed to classify imaged data into relevant classes. A methodology based on threshold detection from hue histograms was proposed by the authors in [66], for example. The number of classes in the studies was between two and six and was most often three. Some examples include a two-class study involving a target weed or anything else [53]; a four-class study involving shadow, ground, healthy, symptomatic [40]; and a six-class study involving soil, wheat, and four different weed species [34]. The linkage between the number of classes and the characteristics of the reference data is essential. In some cases, new classes that were not used as a reference, such as shadows or soil, were developed during the classification process to improve the classification quality. In many cases, it was not relevant to interpret classified data at the pixel level. Four different kinds of interpretation were found: (1) a pixel-by-pixel classification [28]; (2) resampled windows such as 3 × 3 pixels [53] up to 64 × 64 pixels [39]; (3) small zones [36] or homogenous clusters [37]; or (4) metric plots ranging from 0.5 m × 0.5 m [46] to 20 m × 20 m [32].
The studies reported generally very good and promising results. However, this is due to the coverage and nature of the reference data, the low number of test fields in each study, the relatively high number of measured variables, and the freedom in applied classification methods. Technically, overteaching is often avoided, but the datasets are so small that no generalisations can be made. A visual image analysis can be an iterative process when the analysis ends at the moment when classification results are adequate.

General Campaign Planning
This chapter section summarises the workflow in the reviewed papers. The first step is to determine the imaging classification classes that are the purpose of the campaign. The second step is to determine the other classes that will exist and to plan the timing of the campaign in relation to be able to distinguish between classes, and to determine what kind of reference data can be collected in order to benefit data in practice. In the next step, the imaging campaign tools and parameters are planned, and the campaigns are carried out. Then, suitable preprocessing, processing and analysis are carried out. Figure 7 presents a bottom-up demonstration of the campaign planning conducted from in the studied articles. out. Then, suitable preprocessing, processing and analysis are carried out. Figure 7 presents a bottom-up demonstration of the campaign planning conducted from in the studied articles. Figure 7. Determined planning process of imaging campaigns: processing of data, data analysis.
The workflows from the imaging data preprocessing to the classification phase were often at the core of the articles, and detailed descriptions were given. The workflows from the imaging data preprocessing to the classification phase were often at the core of the articles, and detailed descriptions were given.

Differences with Other UAV Imaging Applications
UAV-based remote sensing has also been widely employed to estimate crop biophysical parameters (e.g., biomass, yield, leaf area index (LAI), and plant height) and biochemical parameters (e.g., nitrogen content). These studies often involve the laboratory analysis [14,58] of reference data. For example, the reference for biomass or grass yield is usually measured with a "cut and dry" method or a rising plate meter [67]. The rising plate meter is a simple indirect instrument for estimating grass yield, based on the compressed height (CH) of the sward [68]. Cut and dry is a direct method for measuring grass yield, and it involves cutting and weighing a sample of fresh grass from a precisely measured area at a specified cutting height. After weighing, the sample is dried in an oven to determine the amount of dry biomass [67]. Cut and dry is the most common reference data collection method in biomass estimation studies, and it has been used in various recent studies [69][70][71][72][73]. The method provides objective reference data, but the cutting phase can include lot of variation. In studies where biochemical parameters such as the nitrogen or digestibility of grass were estimated, the reference samples were analysed in a laboratory, most commonly using the NIRS (near-infrared spectroscopy) technique [69,72,73], providing numerical comparable values.
In the data classification phase, supervised machine-learning methods rely on training samples (Figure 8). The variation in the quality of reference samples needs to cover the studied phenomena completely [74]. Some additional studies applied imaging analysis and machine learning for weeds, pests, or diseases [75][76][77][78][79] without focusing on UAV operations. The reference data approach was similar to this review. For example, the study by Ebrahimi et al. [75] used a robot arm in a greenhouse to detect pests on a strawberry flower and only used visual observations from the images as a reference.
Our review scope restricted some pest and disease studies because those studies used detailed information about pests or diseases in their title. The authors in [80] reviewed the use of remote sensing technologies in precision pest management that focus on arthropodinduced stress reactions. They listed 10 studies applying drone-based hyperspectral, multispectral, and RGB remote sensing to detect arthropod-induced stress in crops and orchards, including two of the three studies included in our search results [42,58]. These 10 studies included the following species: grape, wheat, onion, canola, cotton, potato, and sorghum. Typically, these studies quantified either the symptoms caused by the pests and/or the number of pests, with two exceptions [80]. In the case of fall armyworm (Spodoptera frugiperda) in wheat fields [71], the reference data were reported to consist of the reported outbreaks by farmer. UAV-based RGB imaging was seen as having the potential to predict the movement and damage cause by this pest [81]. In canola (Brassica napus), soil and plant tissue nutrient analyses were also used as a reference method [82], Some additional studies applied imaging analysis and machine learning for weeds, pests, or diseases [75][76][77][78][79] without focusing on UAV operations. The reference data approach was similar to this review. For example, the study by Ebrahimi et al. [75] used a robot arm in a greenhouse to detect pests on a strawberry flower and only used visual observations from the images as a reference.
Our review scope restricted some pest and disease studies because those studies used detailed information about pests or diseases in their title. The authors in [80] reviewed the use of remote sensing technologies in precision pest management that focus on arthropodinduced stress reactions. They listed 10 studies applying drone-based hyperspectral, multispectral, and RGB remote sensing to detect arthropod-induced stress in crops and orchards, including two of the three studies included in our search results [42,58]. These 10 studies included the following species: grape, wheat, onion, canola, cotton, potato, and sorghum. Typically, these studies quantified either the symptoms caused by the pests and/or the number of pests, with two exceptions [80]. In the case of fall armyworm (Spodoptera frugiperda) in wheat fields [71], the reference data were reported to consist of the reported outbreaks by farmer. UAV-based RGB imaging was seen as having the potential to predict the movement and damage cause by this pest [81]. In canola (Brassica napus), soil and plant tissue nutrient analyses were also used as a reference method [82], and it was used in determining the relationship between potassium deficiency and the susceptibility to green peach aphids. The review in [80] also listed 9 studies using orbital sensors, 26 studies using aerial (manned aircraft), and 75 studies using ground-based hyperspectral or multispectral remote sensors. This indicates that remote sensing technologies are quite widely studied in the detection of insect outbreaks, but only a few reports described the use of UAVs. In contrast with the reference methods that are often used with UAV and aerial studies, ground-based measurements often used controlled infestations, and orbital studies in particular relied mainly on the calculation of arthropods [80].
Late blight caused by Phytophthora infestans is regarded as the most important disease of potato (Solanum tuberosum) worldwide and is a threat wherever potatoes are grown [83]. Many studies have therefore focused on utilising UAV-acquired spectral imagery to monitor late blight disease incidence and severity [84][85][86][87][88]. Generally, the aim of the UAV studies on late blight was to detect the disease symptoms and assess their severity to develop an easy, albeit reliable, method for disease detection that could be used in agronomic trials and at farm scale. All the studies were based on rigorously designed experiments set up in experimental contexts [84,85,87,88] or in farmers' fields [86]. In most studies, the visual assessments of the disease symptoms and their severity were used as reference data for the images [85][86][87][88]. The assessments were carried out by experts several times during the growing season, and in some studies, the assessments were made according to the known guidelines (EPPO, European and Mediterranean Plant Protection Organization) [87,88]. The authors in [86] explained the ground truth measurements as follows: "Expert visual evaluation of severity of P. infestans under field conditions was done at the plot level and for each of the four image acquisition campaigns. Disease severity was estimated by sampling at random four plants on each plot and computing the average percentage of the disease-infected foliar area." In [84], the authors compared UAV-acquired image data with data collected using a ground-based hyperspectral field spectrometer instead of visual assessments. No artificial inoculation was used in these studies because outbreaks of late blight usually occur spontaneously when no fungicides are used.
Hyperspectral imaging with remote sensing has shown the potential to detect the symptoms of potato virus Y (PVY) [89,90] and potato blackleg caused by Dickeya and Pectobacterium bacteria [91]. These diseases are especially harmful in seed potato crops, and their management is primarily based on the use of certified pathogen-free seed tubers and the removal of symptomatic plants that can serve as an inoculum source. The identification of the infected plants at an early developmental stage is therefore of utmost importance. The authors in [89] and [90] aimed to distinguish infested plants from healthy plants at individual plant level, using spectral reflectance. Image data were acquired using ground-based systems such as a hand-held field spectrometer [89] or tractor-mounted line-scan cameras [90,91]. Reference data were collected by visually monitoring the disease symptoms several times during the growing season. In addition, the authors in [89] confirmed the visual observations with laboratory analysis, i.e., the presence of PVY using enzyme-linked immunosorbent assay (ELISA) and the identification of the PVY strain using a reverse transcriptase polymerase chain reaction (RT-PCR). The positions of the infected plants were either marked in the field [89] or stored using a real-time kinematic global navigation satellite system (RTK-GNSS) [90,91] that allowed the infested (and healthy) plants to be linked to their acquired images. In contrast with the other disease studies mentioned earlier, the authors in [90,91] used artificial inoculation. Ref. [91] inoculated seed tubers with one of the causal bacteria of potato blackleg (P. carotovorum subsp. brasiliense), whereas Ref. [90] set up the experiment using seed lots that were especially selected for their high level of PVY-infection.
Challenges related to the reference material as well as the need to apply novel imaging technologies can be demonstrated by the case of Fusarium head blight (FHB), which is one of the most important diseases in cereals. FHB can cause devastating yield losses [92], but its largest related problem is the accumulation of harmful mycotoxins, which lead to rejections in the cereal trade. What makes FHB a difficult disease to manage is that it has a range of Fusarium species as causal agents that make it favour various environmental conditions. It spreads efficiently through air and seeds, and it overwinters in various crop residues [93,94]. It is also difficult to assess the prevalence of FHB in the field. The symptoms can be mixed with maturation, especially with crops such as oats, and the disease can be quite unevenly spread in fields because it may spread from infested seeds or overwintered crop residue. Plant breeders must therefore rely on inoculated disease nurseries to get evenly distributed infections, and they also need to include analyses made for grain samples [95] to screen resistance to FHB. These analyses, such as the determination of mycotoxin content by enzyme-linked immunosorbent assays (ELISA), are expensive, and alternative methods like NIR spectroscopy are being considered for analysing the yield samples or for making field phenotyping more reliable by chlorophyll fluorescence [96] or hyperspectral sensors [97], and RGB [98].
In the field, the time window for hyperspectral imaging was determined for grain filling stages [97] on a ground-based study, and this was successfully applied in a study in which FHB was monitored 60 m above a wheat field [99]. Up to 98% accuracy was achieved by a backpropagation neural network model for ground data where 50 individual plants per plot were classified with a disease severity scale from 0 to 5 from 50 randomly selected plots on the studied field. The authors claimed that the combination of spectral and textural features selected for modelling in their study should be easily applicable to other areas resembling the studied field [99], which is commonly a restraint for the application of hyperspectral imaging campaigns. The authors in [100] suggested that UAVs carrying hyperspectral sensors could provide valuable information beyond the range of the RGB spectrum [100], and this may be why there are no reports of UAV applications for RGB cameras detecting FHB, despite the existence of several studies such as [98] that used RGB images to detect FHB from ground level.
The management of FHB requires the integration of several practices such as tillage, crop rotation, cultivar resistance, fungicides, and postharvest practices [94,101]. Fungicide use can be recommended by risk models [102], but not all crops and regions have these models, and infections and mycotoxins can also be formed after the fungicide application. Moreover, at least from the Nordic perspective, the suggested timings of flight campaigns or phenotyping for FHB occur after the time window for spraying has passed. Nevertheless, in addition to plant breeders who would welcome a high-throughput phenotyping solution to replace the currently applied costly and laborious phenotyping tools [103], UAVs could be used to plan harvesting so that the contaminated parts of the field are harvested separately, and the data could be used as a supplementary component to adjust weather-based risk models [100].

Discussion
A total of 36 articles was included in this review, and the articles were published in a wide variety of journals. These studies present recent applications of the UAV-based imaging of weeds, pests, or diseases for precision farming. Our review focused on the ground truth data of these applications. The majority of the applications mainly considered visual image analysis as a reference classification, while some of the applications collected in situ data.
We observed that there were no standardised approaches or methods for the reference data collection, and subjective classifications were needed. There is a need to develop traceable methods to access reference data in this area. We recommend that future publications should focus more on a detailed description of the reference data collection and ground truth descriptions related to their work. In particular, subjective observations can have a critical impact on the quality of the results.
Weed, pest, and disease identifications have traditionally been subjective or binary, relying on observations such as "20% infested", "weed existing", and "occurring pests". Observations such as the number of pests can be very time-critical. The studies selected for this review did not involve laboratory analysis of references. Visual observations were often seen to fulfil certain requirements. For example, the weight of the weeds or crops was not measured in sample plots. This was because the existence of weeds was interesting, not the actual amount itself. This also means that there were no human errors related to in situ sample collection and processing. Another remarkable observation was that there were hardly any plot setups such as the artificial contaminations that are typical of phenotyping studies. The majority of the studies concentrated on real growing conditions at a near-field scale. Due to the low TRL levels, this approach can be challenged.
The imaging campaigns and data analysis were not coherent, except for the nadir imaging direction. In some cases, other directions could reveal the target and present the vertical structure of the plants. The variation between the growing seasons and growth conditions was not studied to a great extent. This is probably because of the low TRLs. There was also variation in GSDs, the actual applied resolution, and the size of the single reference unit. UAV imaging is still a relatively new approach, and the traditional data capture resolutions could be matched in the future.
Pest and disease infestations are often associated with the unfavourable moisture conditions of soil and vegetation, i.e., excess water or drought. In contrast with weed, pest, or disease detection, soil moisture regime-related UAV applications are commonly based on the thermal region of the spectra [104,105], although visible reflectance [106] or multispectral data [107] are also widely used. Thermal imaging was not considered in the reviewed articles. Recently, UAV-based methods that are based on ground-penetrating radar [108] and synthetic aperture radar [109] also began to emerge. An accurate knowledge of the moisture conditions could be exploited to improve the efficiency of pest and disease detection by targeting the detection at the parts of the field where pests and diseases are most likely to occur. Thermal imaging can also be indirectly employed to detect the signs of some diseases such as Verticillium wilt in olive [110], which reduces the waterflow to the plant and thus induces water stress. However, thermal imaging is sensitive to the environmental condition of the image capturing, such as the illumination conditions, canopy architecture, and maturity of crops [111]. Sensors of thermal cameras also have relatively low-pixel resolution compared with ones used in visible and near-visible imaging.
It is clearly a future challenge to develop automated image analysis and the give timely support for decision making and field actions. Moreover, there is a lack of quantitative reference data in the studied topic. There is especially a need for a more systematic approach to the manual classification of the images, and field measurements are always recommended at some level. One approach is to use simulation or synthetic means of creating reference data. This approach will remove the uncertainty of the ground truth and reduce the need for the manual labelling of the data, but this creates another problem that researchers are trying to overcome, i.e., having synthetically created images that may look unrealistic to the vision algorithms compared with real images and scenes. This problem is called the reality gap; it is also referred to as the sim2real gap and is an active area of research [112]. It is the major obstacle to the adoption of synthetic reference data creation.
However, recent advances in the computer vision domain led to the emergence of several studies that attempt to bridge the sim2real reality gap. These studies either train convolutional neural networks only on synthetically generated data [113] or combine training on synthetic and manually labelled reference data [114,115]. Studies that combine synthetically generated data with manually labelled reference data have shown promising results [115]. In the context of agriculture, the authors in [116] generated synthetic images of Arabidopsis (small flowering plants) from 3D models that can potentially accelerate the field of plant phenotyping [116]. The study in [113] generated synthetic data for different seeds, including barley and wheat, against a black background; the goal was to detect and segment each seed in the pictures. The manual labelling of each seed would take a long time compared to creating the reference data using simulation. The authors claimed a high accuracy and concluded that the same approach could be extended to other crops like rice, oats, and lettuce [113].
This synthetic creation of data is an emerging new field of research that will allow for the reduction of the manual labelling costs of reference data, especially after the sim2real gap is solved [112]. We suggest that simulation training could be the next research direction that opens more possibilities for the use of UAVs in precision farming. However, there are two main barriers to the immediate adoption of synthetic data creation. First, bridging the sim2real gap remains an unsolved problem. Second, there are few ready-made tools that can generate photorealistic synthetic reference data, and it takes labour and time to utilise current simulation engines for this purpose. With more funding and research entering the area, it can be speculated that these two barriers will eventually be crossed, and new possibilities within UAVs and precision farming will become available. In cases where the collection of reference data is laborious or subjective, synthetic data may provide comprehensive results. This ties in with the introduced digital twin [117] concepts. Another and more traditional development direction would be to formalise the future research topic from the reference data perspective, especially acknowledging that data quality could make a difference. Achieving high quality requires the management of spatial, radiometric, and spectral resolution, temporal resolution, cluster accuracy precision, positional accuracy, thematic precision, temporal validity, data completeness, spatial redundancy, readability, accessibility, and consistency [5,118].

Conclusions
The reviewed studies were developing straight forward precision applications for a wide variety of different crops. The studies applied wide variety of drone types, nadir imaging with replanned flying pattern, constructed orthophotos, and developed machine learning methods to distinguish the targets. Only a small set of reference data was used, and it was split between learning and validation. The reference data were mostly collected by visual examination of drone images. It was essential to plan the timing of the imaging campaign to fit the suitable growth state, i.e., when the target is visible. Typically, the mapped data were resampled to larger units before the classification process. Figures 7 and 8 present the baseline for the processing. In contrast to other mapping topics in agriculture, the mapping of pests, weeds, and diseases is challenging because of the subjective nature of the targets. Generally, a better control of the references is needed in future works.
The majority of the studied applications used visual image analysis as reference data, and there was a large variation in the resolution of the applied data. The principles of subjective analysis could be introduced more thoroughly. Moreover, the impact on the results should be evaluated. As such, the role of the reference data quality and quantity was bypassed in the studies. As an alternative or as an addition, simulated reference data can be seen as a potential approach to develop sufficient imaging analysis methods. The core starting point is to identify the true quality and quantity of reference data. According to our review, we suggest the following main considerations for the future imaging campaigns with pests, weeds, and diseases:

1.
Carefully define the characteristics of the reference data and how they are measured in order to make the process repeatable. The reference data were often not defined accurately.

2.
Consider other imaging methods, camera directions, campaign timing, and imaging wavelengths to better make the target objects visible. Imaging possibilities were not considered in the studies.

3.
Focus on reference data quality and quantity. The studies did not focus on these, and the quantity was heavily affected by the convenience factor.

4.
Adjust classification methods to make them suitable for the reference data characteristics. There should be reference data for each class.

5.
Consider all remote sensing data quality aspects presented in [118]. Due to the nature of feasibility studies, these aspects were not met. 6.
Elaborate on the classification results in contrast to the collected reference data quality and quantity. This is very important. A majority of studies presented overwhelming results because the involved data were so limited. 7.
Evaluate the possibility to exploit synthetic data for reference at least for some level. This was not considered in the studies. 8.
Adapt general study goals and plans according to the TRL classification. TRLs were not mentioned in the studies, but the adaptation can help define the requirements and can especially give a realistic framework for the customer or for the end user. Funding: This research received no external funding.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The authors confirm that the data supporting the findings of this study are available within the article.