Drones, Deep Learning, and Endangered Plants: A Method for Population-Level Census Using Image Analysis

A census of endangered plant populations is critical to determining their size, spatial distribution, and geographical extent. Traditional, on-the-ground methods for collecting census data are labor-intensive, time-consuming, and expensive. Use of drone imagery coupled with application of rapidly advancing deep learning technology could greatly reduce the effort and cost of collecting and analyzing population-level data across relatively large areas. We used a customization of the YOLOv5 object detection model to identify and count individual dwarf bear poppy (Arctomecon humilis) plants in drone imagery obtained at 40 m altitude. We compared human-based and model-based detection at 40 m on n = 11 test plots for two areas that differed in image quality. The model out-performed human visual poppy detection for precision and recall, and was 1100× faster at inference/evaluation on the test plots. Model inference precision was 0.83, and recall was 0.74, while human evaluation resulted in precision of 0.67, and recall of 0.71. Both model and human performance were better in the area with higher-quality imagery, suggesting that image quality is a primary factor limiting model performance. Evaluation of drone-based census imagery from the 255 ha Webb Hill population with our customized YOLOv5 model was completed in <3 h and provided a reasonable estimate of population size (7414 poppies) with minimal investment of on-the-ground resources.


Introduction
The use of deep learning (AI or artificial intelligence) methodology for object identification is a fast-moving area of research that has only recently been applied to the analysis of UAV (drone) imagery [1]. In this paper we describe an application of the deep learning object detection model YOLOv5 [2] to locate, identify, and enumerate individual plants of a single plant species in its desert habitat. This work represents the next step in our efforts to perform a range-wide census based on drone imagery for the endangered dwarf bear poppy (Arctomecon humilis), an evergreen perennial species endemic to gypsum badlands habitat at the northeastern edge of the Mojave Desert of southwestern Utah, USA [3,4].
All known dwarf bear poppy populations occur in close proximity to a rapidly expanding urban area, St. George, Utah. It is estimated that the species has already suffered extirpation due to urban development over half of its original range [4]. Detailed knowledge of plant abundance and patterns of distribution (i.e., population-level data) is fundamental to understanding the ecology of rare plant species and is especially important for implementing effective conservation measures [5][6][7][8]. According to the US Fish and Wildlife Service, population-level census data are essential for management planning to mitigate further losses in the face of intensive off-road recreational use, urban development, and other anthropogenic threats to the dwarf bear poppy [4].
Arctomecon humilis is sparsely and discontinuously distributed across 3650 ha of fragile and largely inaccessible habitat ( Figure 1); see [4] for map. Concerns over disturbance of biological soil crust in this unique environment have precluded the use of traditional on-the-ground census methodologies, which would often not be feasible due to the rugged, steep, and fragile nature of the terrain. Consequently only 5% of the area has ever been systematically censused, and even rough estimates are unavailable for the Red Bluffs population, which occupies over two-thirds of the total range [4]. This species is a good candidate for a remote sensing approach because it grows in sparsely vegetated habitat (Figure 1), and because of its unique blue-green color and mounding growth form (Figure 2b).
We began our efforts to use drone imagery analysis to census this species in 2018 and have completed census for three of the eight formally recognized populations [4] based on visual analysis of the resulting imagery [3,9]. Relatively inexpensive and easy-to-use drones equipped with high resolution cameras are capable of low-altitude flights over relatively large areas in short time frames. This has made it possible for us to complete imagery acquisition for the species across its entire 3650 ha range. However, a major drawback is the time required for imagery processing and analysis, particularly the visual enumeration of individual plants. To solve this problem, we investigated how droneacquired census imagery could be analyzed using deep learning models to identify and count the individuals of our species of interest in the imagery. planning to mitigate further losses in the face of intensive off-road recreational use, urban development, and other anthropogenic threats to the dwarf bear poppy [4]. Arctomecon humilis is sparsely and discontinuously distributed across 3650 ha of fragile and largely inaccessible habitat ( Figure 1); see [4] for map. Concerns over disturbance of biological soil crust in this unique environment have precluded the use of traditional on-the-ground census methodologies, which would often not be feasible due to the rugged, steep, and fragile nature of the terrain. Consequently only 5% of the area has ever been systematically censused, and even rough estimates are unavailable for the Red Bluffs population, which occupies over two-thirds of the total range [4]. This species is a good candidate for a remote sensing approach because it grows in sparsely vegetated habitat (Figure 1), and because of its unique blue-green color and mounding growth form ( Figure 2b).  We began our efforts to use drone imagery analysis to census this species in 2018 and have completed census for three of the eight formally recognized populations [4] based on visual analysis of the resulting imagery [3,9]. Relatively inexpensive and easy-to-use drones equipped with high resolution cameras are capable of low-altitude flights over relatively large areas in short time frames. This has made it possible for us to complete imagery acquisition for the species across its entire 3650 ha range. However, a major drawback is the time required for imagery processing and analysis, particularly the visual enumeration of individual plants. To solve this problem, we investigated how drone-acquired census imagery could be analyzed using deep learning models to identify and count the individuals of our species of interest in the imagery.
Many studies have used drones along with deep learning models to collect data in agriculture [10][11][12][13], and there are publications describing this approach for a variety of wild organisms, including plant species in general [14][15][16][17][18][19] and especially invasive species [20][21][22][23]. Interest in using drone imagery as a tool in rare plant conservation is increasing [14], but to our knowledge no published studies to date have successfully applied a deep learning approach to drone-acquired imagery with the goal of enumerating individuals of a rare plant species. Reckling et al. [24] were able to visually identify individuals of their herbaceous target species with the aid of an a priori species distribution model, but they were not successful in their efforts to use AI for species recognition in drone imagery. Our specific objectives in this study were to: (1) report on detection accuracy of the YOLOv5 model trained on a custom drone-based imagery dataset for dwarf bear poppy, (2) compare the accuracy of the YOLOv5 model against a human worker trained in identifying dwarf bear poppy in drone imagery, (3) evaluate the effect of image quality on model accuracy, and (4) present the results of a drone/AI census for one population of the target species (Webb Hill). We emphasize our approach for building a deep learning model in an effort to provide a beginning road map to aid conservation researchers considering the use of AI for drone-based census of other plant species. Many studies have used drones along with deep learning models to collect data in agriculture [10][11][12][13], and there are publications describing this approach for a variety of wild organisms, including plant species in general [14][15][16][17][18][19] and especially invasive species [20][21][22][23]. Interest in using drone imagery as a tool in rare plant conservation is increasing [14], but to our knowledge no published studies to date have successfully applied a deep learning approach to drone-acquired imagery with the goal of enumerating individuals of a rare plant species. Reckling et al. [24] were able to visually identify individuals of their herbaceous target species with the aid of an a priori species distribution model, but they were not successful in their efforts to use AI for species recognition in drone imagery. Our specific objectives in this study were to: (1) report on detection accuracy of the YOLOv5 model trained on a custom drone-based imagery dataset for dwarf bear poppy, (2) compare the accuracy of the YOLOv5 model against a human worker trained in identifying dwarf bear poppy in drone imagery, (3) evaluate the effect of image quality on model accuracy, and (4) present the results of a drone/AI census for one population of the target species (Webb Hill). We emphasize our approach for building a deep learning model in an effort to provide a beginning road map to aid conservation researchers considering the use of AI for drone-based census of other plant species.

Study Area
In the current study, we conducted a drone-based census of the Webb Hill population, which represents roughly 12% of the total suitable habitat for the poppy. We were limited to census of the lands managed by the Bureau of Land Management (BLM) along with some state and county-managed lands for a total of 255 ha. We were unable to census any of the privately-owned land, mainly because these areas were under construction or already built upon. Additionally, we analyzed census and validation flights from across a small subset of the Red Bluffs population as an area of interest (n = 6 flights) for testing the deep learning model against a human worker with imagery that differed in quality from the Webb Hill imagery.

Drone Flights and Imagery Processing
Our protocol for acquiring and processing drone imagery for analysis is described in detail in our earlier papers [3,9]. For the present work, drone flights were made using DJI Phantom 4 Pro V2 drones (SZ DJI Technology Co. Ltd. Shenzhen, China), which were equipped with a stock 20 MP camera (f/2.8-f/11, 84 • FOV). Drone census flights at Webb Hill were conducted during the late fall of 2019 (October-November). Flights were made at 40 m above ground level (AGL) with the following settings adjusted within the flight planning app: 70/70 side/front image overlap, ISO 100, "Auto" shutter speed (vs aperture or shutter priority), and white balance set to "Cloudy". We carried out 36 census flights across the population, capturing 9666 images and covering roughly 255 ha of habitat. Due to poor image quality, imagery captured from two flights (flights 10 and 32, respectively) were excluded from analysis. In addition to census flights, we conducted 15 m AGL validation flights across a subset (n = 6 plots) of areas flown for census in order to validate the poppy detection at census-level.
All drone census and validation flights were carried out by a two-person team, each operating a drone. Both drone operators were Part 107 licensed with the Federal Aviation Administration (FAA) and were authorized to conduct flights with the FAA, BLM, and U.S. Fish and Wildlife Service (USFWS). The total time in the field was 24 h per person for a total of 48 person hours.
The collected images were organized by flight and copied into an in-house imagery storage database. All imagery was processed in Adobe Photoshop (Photoshop CC 2020, Adobe Systems Inc., San Jose, CA, USA) to correct for light and color distortions within the imagery as described in our earlier work [3]. The images were then processed into orthomosaics using Pix4D software (Pix4D S.A., Lausanne, Switzerland) and each orthomosaic was loaded into ArcGIS Pro (ESRI, Redlands, CA, USA) for further analysis. Orthomosaics were used as the basis of the training imagery that was used to build the deep learning model, to conduct the model-versus-human comparison of detection accuracy, and to carry out the full Webb Hill drone-based census. For poppy detection in drone imagery, we used the "You Only Look Once", or YOLOv5, model, the 5th version of the YOLO family of object detection models [2]. The YOLOv5 model comes pre-trained on the COCO dataset [25], which provides baseline weights for hyperparameter settings, resulting in drastically reduced training times when training on a customized dataset. This version of YOLO was designed to be particularly accessible to people who do not necessarily come from a computer programming background [26]. The YOLOv5 model uses the PyTorch framework (as opposed to Darknet), which makes integrating a local GPU for training and inference (using the trained model to detect objects and predict classes) relatively easy. This was ideal for our purposes.
In simple terms, the YOLO model works by first creating features in the training image (backbone), which are then passed to the next layer; these features are then mixed, matched, and combined in various ways (neck); and then finally, bounding boxes are drawn around predicted objects (Figure 3a; passed features from previous layers), and a class prediction with level of confidence is made for each predicted object (head). The model then runs inference on an internal validation dataset. Based on the results, the model adjusts the hyperparameters and applies the changes during the next epoch. This is why the model is said to "learn". For more detailed information on how YOLOv5, as well as other YOLO models are designed and function, please see these references [2,[27][28][29][30].
We experimented with several different sizes of the YOLOv5 model by training with our dataset on both the local GPU (NVIDIA GeForce GTX 1080) and cloud-based GPU (Google CoLaboratory Pro. Google LLC. Mountain View, CA, US; Tesla P100) for a relatively short amount of time (≥250 epochs). We selected the YOLOv5 small model for poppy detection due to its relatively fast training times, as well as high detection accuracy. Training results from the other model sizes were comparable in accuracy but much slower in training-times. is why the model is said to "learn". For more detailed information on how YOLOv5, as well as other YOLO models are designed and function, please see these references [2,[27][28][29][30].
We experimented with several different sizes of the YOLOv5 model by training with our dataset on both the local GPU (NVIDIA GeForce GTX 1080) and cloud-based GPU (Google CoLaboratory Pro. Google LLC. Mountain View, CA, US; Tesla P100) for a relatively short amount of time (≥250 epochs). We selected the YOLOv5 small model for poppy detection due to its relatively fast training times, as well as high detection accuracy. Training results from the other model sizes were comparable in accuracy but much slower in training-times.

Training Imagery Selection
To build a training dataset of poppy images, we first had to ensure the images being fed to the model were confirmed poppies. This may seem obvious, but images of poppies collected from 40-m AGL were often blurred or color distorted and were sometimes not easy to distinguish from similar-looking plants or from the background (Figure 3a). To mitigate this problem, we used only census (40 m AGL) imagery from areas that also had validation (15 m AGL) imagery. The visual difference in image quality from 40 to 15 m AGL is quite dramatic (Figure 3), and the majority of poppies can be reliably confirmed in the 15 m AGL imagery [9]. Training images were obtained from across all poppy populations for which drone imagery was available. However, the majority of the training images were from the Webb Hill and Red Bluffs populations.

Imagery Annotation and Model Training
The YOLOv5 model requires an input of annotated training images representing the target object class, along with a corresponding comma separated values (csv) file containing bounding box coordinates and class labels, for each image. We chose two target classes to train the model: poppy, specifically non-flowering poppies; and similar vegetation. We found that providing the model with additional objects (other plant or plant-like objects from the same census imagery) that were labelled as not poppies (i.e., similar vegetation) resulted in higher model precision. The source training imagery was from the individual census flight orthomosaics. To use orthomosaics for training, we first had to divide the larger orthomosaic into individual tiff images of 416 × 416 pixels each. We used images of this size to speed up model training, and because this area was sufficiently small to draw bounding boxes closely around our target objects. The model can use images of various sizes, but use of larger-sized images greatly increases the model training time.
We used the python-based tool LabelImg [31] to annotate all images. Annotation was done by manually drawing bounding boxes around the target objects and labeling each box with its appropriate class. We annotated 389 images resulting in 1975 total annotations-755 poppy annotations, and 1220 similar plant annotations. We uploaded the images into an image processing framework, Roboflow (Roboflow, Inc. Des Moines, IA, USA), to separate the images into training and validation, as well as to perform additional image augmentations. Within Roboflow, the images were randomly separated into training and validation subsets of 245 training and 144 validation images, respectively. Further, each of the 245 images had three additional "augmentations" performed per image, resulting in 980 training images for model input. The augmentations were randomly selected from the following five user-selected options: 90 • rotate (clockwise, counterclockwise); crop (0% to 39%); saturation (between -29% and 29%); brightness (between -25% and 25%); and exposure (between -8% and 8%). Each specific augmentation was randomly applied within the settings shown in parenthesis above. Augmentations were subjectively chosen in order to bolster the number of training images and to give the model a wide variety of possibilities for how poppies could appear in the imagery.
Using the YOLOv5s model baseline weights (from the COCO dataset), we trained our custom model for 9000 epochs across five separate training runs (4 runs for 1000 epochs, 1 run for 5000 epochs) taking roughly 10.4 h to complete. The final weights file was saved for inference use or for additional model training. The training was done in the cloud with Google Colab Pro.

A.I. Model vs. Human Poppy Detection
To test the utility of using our customized model for census across relatively large areas, we ran model inference on imagery in test plots that had not been previously used for model training or validation, but for which both census (40 m) and validation (15 m) imagery were available. The same area was also evaluated by a worker trained to visually detect poppies in the imagery. We used precision and recall as our metrics for evaluating the accuracy of both the model inference and human evaluation results. Precision is calculated as the number of correctly marked objects divided by the total number of marked objects (error of commission), whereas recall is the number of correctly marked objects divided by the total number of objects present (error of omission). We evaluated n = 11 plots from two poppy populations, Webb Hill (n = 5) and an area of interest (AOI, n = 6) at the Red Bluffs population. Each test plot was made up of 48 images of 416 × 416 pixels (approximately 4.78 × 4.78 m) each, representing a contiguous area within the plot. The total area examined across the test plots was approximately 1.21 hectares. All test plots had poppies present in ≥1 image, but poppies were not present in many of the individual images of the respective test plot. As mentioned previously, the Webb Hill imagery was collected in late fall 2019 (Oct-Nov), while the Red Bluffs imagery was collected in spring 2020 (March) prior to poppy flowering (Figure 2b). We used imagery from two populations to compare the results of model inference and human evaluation on higher quality imagery relative to the lower quality Webb Hill imagery. We suspected the obvious, that the higher quality imagery would yield better evaluation results, but we also wanted to better understand what the optimal imagery capture conditions are for the dwarf bear poppy. In the Webb Hill imagery, poppies were far less conspicuous due to the presence of spent inflorescences. These largely obscured the distinct blue-green poppy foliage, making them difficult to separate from similar vegetation or even from the background. In contrast, the Red Bluffs imagery was taken in the spring following a rainstorm, which made the blue-green poppy foliage stand out against the wet and darkened background. Additionally, the previous season's spent inflorescences were mostly no longer present on the plants (Figure 2b).
We used the methods developed by Rominger and Meyer [3] as the basis of our evaluation and validation of the test plots. All the census tiles in each plot were passed through model inference and separately evaluated by the human worker. Worker evaluation was done in Adobe Photoshop, where scale and zoom could be manipulated to closely examine and ultimately mark each plant. Any basic photo software could be used for this type of evaluation, as long as the software has a zoom-in function. After the model and the human worker completed detections on the test plots, each set of results was evaluated and scored against the validation imagery by a second trained worker who did not take part in the test plot evaluations. Each object detected by either the model or by the human worker was checked against the validation imagery to confirm if the identified plants were poppies and to identify any poppies that were missed. Plants were scored as either marked/not confirmed, marked/confirmed, or missed. Detected plants that could not be confirmed in the validation imagery as poppies were scored as marked/not confirmed, which drove the precision metric (confirmed/marked). Poppies that were not detected were scored as missed, which negatively impacted the total recall percentage (confirmed/actual). However, poppies that were <5 cm in diameter were excluded from consideration, regardless of whether they were detected or missed by either the model or worker. This was because poppies <5 cm could not always be reliably identified in the validation imagery. The time required for model inference and for visual evaluation of each plot was also recorded.

Model Inference
To accurately perform model inference, the same image size used for model training was required as input for the trained model. We processed our previously generated censuslevel orthomosaics (n = 36) into individual 416 × 416 px images, resulting in~147,000 individual images for inference. We developed a customized python script to automate the process of tiling the orthomosaics into individual tiff files, moving the files to the inference directory, and merging the resulting inference csv files into one csv for each individual flight.
All 147,000 images were run through inference on the local GPU. The detection threshold was set to 50% confidence, meaning that only objects predicted with >50% confidence were displayed. At this level of confidence, we obtained higher precision, but at the cost of lower recall. If the confidence was set lower, better recall was achieved, but at the cost of lower precision. This tradeoff is unavoidable, so we used a balanced confidence level that worked well for our purposes. When the model detected a poppy, a bounding box was drawn around the poppy (Figure 3a) and a copy of the corresponding image was saved, along with a separate csv file containing coordinates of the bounding box(es), class, and confidence level. From the csv files for each flight, we tabulated the number of detected poppies, as well as the duration of inference, and entered these data into a spreadsheet for further analysis.
We validated the bounding boxes in the census areas that had corresponding validation imagery using the same validation methods described earlier (Section 2.5.1) across n = 6 validation flight areas. For the purpose of estimating true population size, an accounting of missed and misidentified poppies was needed. We calculated a correction factor to apply to the final number of poppies detected by the model that takes these errors into account. The correction factor was calculated as precision multiplied by the inverse of recall. This made the correction factor for missed poppies a number >1, which increased the estimated number of plants, thereby accounting for missed or undetected poppies. Finally, multiplying by precision (which is always ≤1) decreased the total number of estimated poppies, thereby accounting for mis-identified detections. The visual validation process took roughly 2 h for each validation plot.

A.I. Model Accuracy
Model accuracy was evaluated by the precision and recall metrics after training for 9000 epochs. The final custom-trained YOLOv5 model had an average precision of 0.64 and an average recall of 0.55. However, this result reflects both training classes (poppy and similar vegetation). Looking specifically at the poppy class precision (Figure 4a), the model consistently performed at greater than 0.8 at nearly all confidence levels. Poppy class recall was not as high as precision but was still reasonably accurate between 0.5 and 0.8 recall until around 0.85 confidence when it sharply declined (Figure 4b). At 0.5 confidence, recall was still greater than 0.6, which provided sufficient accuracy. We found that these predicted metrics for precision and recall reflected favorable model performance relative to the metrics obtained by visual detection in our earlier work [3].

All Test Plots
In comparing test plot evaluations carried out by a trained worker to the AI model inference results, the AI model had higher precision and recall across all test plots as well as when plots were evaluated by population, with the exception of slightly lower recall across the Red Bluffs plots ( Table 1). The model was 1.36x higher in precision than the human worker: model 0.83 precision vs. human 0.67. It was also slightly better at recall: 0.74 recall vs. 0.71 for the trained worker. In measuring inference/evaluation time across each plot as well as all plots pooled, the AI model was significantly faster than the human worker. Inference by the model for all plots required less than one minute (0.56 min), while the worker evaluation required 657 min (ca. 11.0 h), for an inference/evaluation improvement of over 1100 times by the AI model. All other metrics being equal, these time savings alone would make the drone/AI census method much more advantageous when scaled up to the entire 255 ha census area than either visual detection in the imagery or traditional on-the-ground methods.

Imagery Quality Difference: Webb Hill vs. Red Bluffs Plots
Breaking down the test plot evaluations into their respective populations showed higher precision and recall for both the model and the human worker with the Red Bluffs imagery (Table 1). This result was not surprising, as we knew the Webb Hill imagery

All Test Plots
In comparing test plot evaluations carried out by a trained worker to the AI model inference results, the AI model had higher precision and recall across all test plots as well as when plots were evaluated by population, with the exception of slightly lower recall across the Red Bluffs plots ( Table 1). The model was 1.36× higher in precision than the human worker: model 0.83 precision vs. human 0.67. It was also slightly better at recall: 0.74 recall vs. 0.71 for the trained worker. In measuring inference/evaluation time across each plot as well as all plots pooled, the AI model was significantly faster than the human worker. Inference by the model for all plots required less than one minute (0.56 min), while the worker evaluation required 657 min (ca. 11.0 h), for an inference/evaluation improvement of over 1100 times by the AI model. All other metrics being equal, these time savings alone would make the drone/AI census method much more advantageous when scaled up to the entire 255 ha census area than either visual detection in the imagery or traditional on-the-ground methods. Breaking down the test plot evaluations into their respective populations showed higher precision and recall for both the model and the human worker with the Red Bluffs imagery (Table 1). This result was not surprising, as we knew the Webb Hill imagery quality was not as good as the imagery from Red Bluffs, due to the non-optimal condition of the plants in the late fall season as well as often shadowy conditions due to the short day length. However, given the lower quality imagery at Webb Hill, the model still had 0.78 precision and 0.66 recall, which was much higher than the human worker (0.64 and 0.58 for precision and recall, respectively). These results underscore the usefulness of obtaining imagery under optimal flight conditions in the field, in terms of both plant phenology and light conditions and that poppy detections for both AI and human were more accurate when imagery is captured in more favorable conditions.

Webb Hill Census Imagery Analysis
Orthomosaics representing the Webb Hill census imagery covered a total area of 246.2 ha and consisted of 34 flights (accounting for the two flights that were not analyzed) that varied in area covered ranging from 0.28-19.3 ha (Table 2). Processing the imagery into orthomosaics took 4248 min (70.8 h), which is mostly computer runtime rather than worker labor. Once the orthomosaics were tiled into 416 × 416 px images for analysis, total model inference time was 147.2 min (2.45 h), with inference times that were approximately proportional to flight area and that ranged from 0.7 min (flight 13) to 14.8 min (flight 7; Table 2). Of the 147,411 image tiles passed through inference, the model detected and drew bounding boxes on poppies in 4994 individual images ( Figure 5, only 3.4% of total images had poppies detected). The total number of poppies marked in the imagery was 6283, which means that some tiles contained ≥ 1 detected poppy ( Table 2). Most marked poppies (74%) were concentrated in the ten most-populated flight areas, while the 10 least-populated flight areas collectively included only 5.4% of total marked poppies. Much of this difference was due to differences in the areal extent of the flights, but even when area is taken into account, the flights that included 74% of the poppies only accounted for 38% of the area, indicating that poppies were concentrated in these areas. The ten least-populated flight areas occupied 21% of the total area, so that poppies at 5.4% of the total were markedly underrepresented.

Webb Hill Census Imagery Analysis
Orthomosaics representing the Webb Hill census imagery covered a total area of 246.2 ha and consisted of 34 flights (accounting for the two flights that were not analyzed) that varied in area covered ranging from 0.28-19.3 ha (Table 2). Processing the imagery into orthomosaics took 4248 min (70.8 h), which is mostly computer runtime rather than worker labor. Once the orthomosaics were tiled into 416x416 px images for analysis, total model inference time was 147.2 min (2.45 h), with inference times that were approximately proportional to flight area and that ranged from 0.7 min (flight 13) to 14.8 min (flight 7; Table 2). Of the 147,411 image tiles passed through inference, the model detected and drew bounding boxes on poppies in 4994 individual images ( Figure 5, only 3.4% of total images had poppies detected). The total number of poppies marked in the imagery was 6283, which means that some tiles contained ≥ 1 detected poppy (Table 2). Most marked poppies (74%) were concentrated in the ten most-populated flight areas, while the 10 least-populated flight areas collectively included only 5.4% of total marked poppies. Much of this difference was due to differences in the areal extent of the flights, but even when area is taken into account, the flights that included 74% of the poppies only accounted for 38% of the area, indicating that poppies were concentrated in these areas. The ten least-populated flight areas occupied 21% of the total area, so that poppies at 5.4% of the total were markedly underrepresented.  Overall, the density of marked poppies was extremely low (26.5 poppies-ha −1 ). Much of the area, although comprised of gypsum soils, was likely not suitable habitat, resulting in clustering of the poppies in the most favorable areas, an effect observed in previous census evaluations for this species [3]. This demonstrates the utility of drone-based census methods as opposed to on-the-ground census, as locating so few poppies scattered over such a large area would be a daunting task on the ground.
When the correction factor based on precision (0.78) and recall (0.66) was applied to the number of marked poppies at Webb Hill, we obtained an estimate of 7414 poppies in the census area in autumn 2019. The total time to complete the census and analysis was 7995 min (~133 h). The on-the-ground/field time was 48 h, or 37% of the total time, with the remaining 63% or 82 h consisting of computer runtime with very little worker time involved.

Discussion
In this paper, we present evidence that we have solved a major limitation of the previous drone-based census methodology that this work was built upon [3]. By incorporating the use of an AI model to detect poppies in drone imagery, we have essentially eliminated the bottleneck of visual imagery evaluation. We also showed that the AI model performed better than a trained worker in both precision and recall. Finally, we presented the results of the drone-based Webb Hill census and AI poppy detection and enumeration to show that the AI approach is feasible at the population level. Our results show that we have developed a viable census method worth investigating for additional plant species.

Limitations of Drone Census and AI Evaluation Methods
There are some limitations to using drones and AI models for poppy census. One of our biggest issues with detection accuracy was differences in image quality. Using imagery taken from 40 m AGL was problematic, especially when captured in unfavorable seasons. We did not anticipate the presence of spent inflorescences on the poppies to obscure the blue/green poppy foliage ( Figure 5). In fact, we thought the inflorescences would make the poppies even more distinct in the imagery. Additionally, even between-flight imagery quality was an issue (Table 2); this was mainly due to the drone not maintaining a consistent 40 m AGL. We flew the Webb Hill population in late 2019, well before we began methods development to use an AI-based object detector. Ideally, we would have flown test plots and used that imagery directly with a trained AI model. The results could then have guided us in mission planning in terms of how many validation plots were needed, or even to a conclusion that census flights would need to be flown at lower altitude. Imagery quality is the key to AI model detection success, and we learned the hard way that some of our imagery was not adequate for higher precision and recall results. By greatly reducing the time needed for image analysis, AI will potentially make it possible to better optimize conditions for image acquisition, as there will be more time to spend in the field and more opportunity to choose the best seasonal window for image acquisition.
There will always be a lower detection limit, whether using deep learning technology or with human workers. We made the decision to limit the size threshold for poppies based on how well we could reliably verify them in validation imagery. We settled on excluding poppies <5 cm in diameter, as poppies this small are not big enough to flower, which is a metric for determining if a seedling has recruited into the population. We know that there are many occasions when poppies can be detected and verified reliably when they are <5 cm in diameter; however, both precision and recall are greatly improved when poppies below this size threshold are excluded.
The final limitation discussed here is imagery processing time. Though most imagery processing time is computer runtime rather than worker labor, imagery processing is still a very time-consuming process. We processed 33 census flights that required 72.9 h of computer runtime for the Webb Hill census, an average of about 2 h of computer runtime per flight (Table 2) using a relatively high-powered workstation. This process can likely be dramatically improved by processing with a supercomputer either in the cloud or locally. To use drone-based plant census methods for larger populations, processing times for producing orthomosaics from raw imagery will need to be greatly improved.

Advantages of Drone Census and AI Evaluation Methods
There are, however, many advantages to using drones for plant census. One of the big advantages is the sheer amount of data that can be gleaned from imagery in addition to counts of individuals of the target species. Processing imagery into orthomosaics using Pix4D also allows for the creation of digital elevation models (DEMs), from which additional environmental variables can be extracted (such as slope, aspect, hydrology features, etc.) and analyzed for use in other models (i.e., a fine-scale species distribution model (SDM)). The detected poppy bounding boxes output csv file allows for plotting the poppies into mapping software where geospatial tools can be employed to look at clustering patterns or distance relationships and relationships with environmental variables. The amount of information that can be extracted from drone imagery is staggering.
Our work in this project has focused on analyzing census-level data to get counts of individuals in the population. However, using deep learning to perform classification within the target species could also be used to look at and measure other types of information. This could make it possible to develop a method for large-scale monitoring and demographic studies, similar to our earlier demographic study for the dwarf bear poppy [15], much more efficiently over even larger areas. For example, with imagery taken during the flowering season, poppies could be sub-classified by flowering class (i.e., flowering, non-flowering) and size. If a minimal on-the-ground component is added to collect flowering and fruit data along with imagery obtained during the same time period, the number of seeds produced could be calculated across an entire population [15]. Up until very recently, this kind of large-scale demographic data was thought to be virtually impossible to obtain.
Traditional on-the-ground census methods are often not feasible for population-level data collection over large areas. At the heart of this lack of feasibility, it usually comes down to time and cost. Obviously, drones can cover more habitat in a given time frame than on-the-ground workers, but another advantage of drone census coupled with AI detection methods potentially addresses a well-documented issue in plant census, survey, and monitoring studies, namely observer error [31][32][33][34]. Plant species with low abundance, such as dwarf bear poppy, are particularly prone to observer error, primarily as observer failure to detect the plant (false absence or error of omission). This was true even when decoy plants with readily distinguishable morphology or phenological stage (including plants in full bloom) [31] were deployed. In our method, we systematically exclude poppies that are ≤5 cm in diameter due to difficulty in confirming their identification. In contrast, even skilled observers on the ground often fail to detect individuals of the target species [32], with failure increasing with increased size of search area [31] and longer time spent searching [34]. Drone/AI-based plant census methods have a detection threshold that is objective, measurable, and subject to modification as needed, unlike the omission error in on-the-ground census methods that attempt to detect every plant.
One of our main objectives when setting out to do this work was to design innovative methods that are also inexpensive, as we were thinking in terms of utility for land managers or researchers attempting to collect population-level plant data. Relative to funding for animal species research, plant conservation research is notoriously underfunded [35], which often makes the cost of data collection the biggest driver of decisions as to what data can be collected and over how large an area. Our total time to complete a 246-ha population-level census, including drone flights, imagery processing, validation, and inference, was 133 h, with most of that time accounted for in computer processing. Even the 48 h spent conducting drone flights was mostly flight time, with some time for workers to navigate to different flight areas. We do not have data available to directly compare this to on-the-ground census of poppy habitat. However, Zhang et al. [35] conducted species-richness surveys across 356 quarter-hectare plots in boreal forest habitat with 12 observers (1 observer/plot) and found that the average survey time per plot was 82 min, ranging from 20-194 min per plot depending on density. Species richness surveys are generally more time consuming than counting individuals of one species; however, this example shows that the potential time invested in on-the-ground survey efforts over larger areas can be high, and ultimately very expensive, relative to drone/AI census methods.
The ability to analyze drone imagery using deep learning methodology has reduced the time investment for plant census by orders of magnitude relative to human visual analysis. The method presented here can be used to collect and analyze an enormous amount of data over a relatively large area. The method is especially suited to species with distinctive morphology in sparsely vegetated habitat. Many edaphically restricted species of conservation concern, particularly those found in semiarid and arid environments, meet these criteria. We think that with a well-designed training program and an initial investment in a drone and a computer with software capable of processing and visualizing imagery, our methods could readily be incorporated by conservation botanists as well as land managers and their contractors.

Conclusions
The next steps in developing our improved drone-based population-level census methodology for more widespread use are: (1) complete image processing and AI analysis of our range-wide census imagery for dwarf bear poppy, (2) develop a procedure to project census points onto maps to examine spatial patterns of distribution and environmental correlates, (3) incorporate these spatial data to build a species distribution model that will be used to identify potential areas for the establishment of new dwarf bear poppy populations, and (4) apply our methodology to additional species of conservation concern. We have been invited by our funding partners to try drone-based census on two additional rare plant species, which gives us the opportunity to test the applicablity of our method to species with different morphologies and that occupy different habitats.

Data Availability Statement:
The data presented in this study are available as part of this.