3.1. Quality of Iput Data
An experiment was performed to confirm the consistency of object detection accuracy with the labeled data used in the study. From 300 labeled images of each flight (union of the training and testing set of Models 1–4 in
Table 2), training and testing images were randomly chosen in a 2-to-1 split, and mAP was investigated as a measure of detection accuracy. This experiment was repeated 30 times for each labeled image of Models 1–4, to obtain average and standard deviation of mAP. The average mAP values of 85.1, 87.0, 78.3, and 72.3%, and standard deviations of 1.9, 2.7, 3.9, and 5.7 were observed from the datasets of 15, 16, 18, and 24 April (input dataset of Models 1–4), respectively (
Figure 8). From models using the 15 April to 18 April datasets (Models 1–3), average values of mAP over 78% were observed, with standard deviations less than 4%, and the performance of object detection using the dataset in this period was considered persistent, compared to the results derived by 24 April data (Model 4). However, a decreasing trend of mAP was observed with data collected at later dates. The 24 April dataset produced the lowest mAP of 72.3%, with the largest standard deviation of 5.7%.
The decrease in mAP over time could be attributed to the fact that separation of individual plants becomes challenging as they grow (
Figure 9). While their appearance in UAS images is generally uniform, and gaps between adjacent plants are well defined immediately after plant emergence, the gaps between individual plants tend to be covered by plant leaves over time, resulting in almost continuous canopy. At that point, it becomes difficult not only to delineate plant boundaries but also to determine the number of plants. Additionally, target objects located near the image boundary can cause inconsistency issues during labeling and training, since plants near the edge of an image may contain only partial coverage of the plant, therefore, and their shape might differ from intact plants found in the interior of the image. Furthermore, germination time and developmental phases vary within the field, due to microclimate or nutrition, resulting in variance of plant size.
Even though care was taken to label images consistently, pinpointing the exact location and number of plants in the connected/clustered canopy area was challenging. Data shows that labeling performance in input training data, was adequate when there is good separation between plants, and then decreases in situations where plants are clustered or canopy overlaps (
Figure 9). In cases where visual inspection fails to correctly label data, it may be unreasonable to expect that a machine learning model would outperform a human inference. In this regard, it seems that the timing of data acquisition will be very important for object detection-based plant counting. From the datasets of this study, 15 to 18 April (DAP 10 to 13) was effective to ensure object detection over 78% (
Figure 8). It should be noted that the current dataset contains images of four different plant densities, from 3 to 12 plants per linear meter of row, and the effective period of UAS data can expand when the maximum plant density of target plot is lower than 12 plants per linear meter of row.
3.2. Object Detection Accuracy Using YOLOv3
When the training and testing images were both chosen from the corresponding dataset of the same flight, the mAP and F1 scores of the YOLOv3 model were higher than 78%, and 0.77, respectively, except for those derived from the 24 April data (
Table 3). Training and testing data comprised of 15, 16, and 18 April, resulted in a higher accuracy, whether they were collected from a single date or multiple dates. In particular, it should be noted that the accuracy of Models 5 and 6 were comparable to that of Models 1 and 2. This suggests that a similar level of object detection accuracy can be achieved with an increased diversity in soil background, illumination, or plant shape in the training data.
The number of input training images used to train Models 1, 5, and 6 were 200, 400, and 600, and corresponding object detection accuracies were 88, 86, and 84%, when the models were applied to testing data of their own dates (left column in
Table 3). It is expected that comparing the object detection accuracy of various YOLOv3 models, trained with training images in a 10–10,000 range, will clearly provide a more reliable guide to prepare an appropriate number of training datasets, however, more than 60 h will be required, only to label images, based on manual procedures, excluding model training and results interpretation. Although this paper does not rigorously investigate the effects of number of input data and plant detection accuracy, semi-automatic labeling methods based on Canopeo, and object detection algorithms, could help improve the efficiency of labelling, and investigate the effect of number of input training images.
As the reference plant count is available by 1 m
2 for 15 April, the object detection performance of Models 1–10 was evaluated using the F1 score and mAP, when each model is tested on 15 April data. Models 1, 5, 6, and 7 contained 15 April images in their training sets, and these models showed a mAP over 80%. The highest accuracy was obtained from Model 1, which was assumed to be due to the consistency in training and testing data. A similar or slightly lower accuracy was observed from Models 5–7 that include training data of other dates. The mAP over 80% of Models 1, and 5–7, tested on 15 April data, indicates that detection accuracy can be maximized when training data include images of 15 April. Nevertheless, mAP declined from approximately 80% to 50% when 15 April images were not included in their training dataset. The models derived from single-date input (Models 2–4) exhibited an abrupt decrease in their accuracy. Model 4 for example was trained with 24 April data, and showed a mAP of 5.51% when tested on the 15 April data. It seems encouraging, however, that the use of multi-date labeled images as a training dataset could minimize the decrease of the detection accuracy. Within Models 5–10, mAP was greater than 53% when 15 April data were not used for training (Models 8–10), but mAP was above 79% when 15 April was used for training (Models 5–7). The usage of 15 April data in the training stage, and the use of multi-date labeled images, will affect final plant count results, therefore, this effect will also be discussed in
Section 3.5.
Among the UAS data in this study, the background soil surface on 18 April was relatively darker, with less apparent soil crusting. Due to the difference in background brightness on that date, the YOLOv3 model performance was also evaluated. Similar to the previous results, models trained with 18 April data demonstrated higher accuracy when the models were applied to the testing data of 18 April. All models containing 18 April data in their training dataset showed mAP and F1 scores over 77%, and 0.70, respectively. On the contrary, mAP and F1 scores ranged between 43–67%, and 0.42–0.59, when training data did not contain the 18 April dataset. It is supposed that the combined effect of availability of 18 April training data, and the difference in background brightness, might have decreased the overall detection accuracy. However, Model 5 trained with multi-date labeled images, including 15 and 16 April, exhibited a higher detection mAP of 66%. It is considered that the low object detection accuracy, due to the absence of comparable training data, could be compensated by increasing the number of training data, even though the object or background information does not match well. Although reference plant count on 18 April is unavailable, the expected error of plant counting using Model 5 will be estimated in
Section 3.5.
The mAP and F1 scores of YOLOv3 models on 15 and 18 April testing datasets, were mostly higher than 43% and 0.42. Although the 43–88% mAP achieved in this study appears low in comparison with a perfect detection rate of 100% mAP, the achieved value was higher than the mAP of other object detection algorithms, e.g., SSD, Faster R-CNN, and RetinaNet, that range between 28–38% [
37]. It should be noted that, the object detection accuracy in mAP is not a measure of plant count error, and the final detection rate may increase by redundancy, created by multiple overlapping images. The overall trend between mAP and final plant count error will be discussed in
Section 3.5.
The accuracy of object detection on 15 and 18 April testing data indicates that YOLOv3-based object detection performs adequately, even though differences in plant size, shape, or soil background exist in training and testing datasets. It was also shown that YOLOv3 models trained with multiple datasets exhibited stable performance, based on mAP and F1 score measures. However, it was evident that substantial differences of plant morphological appearance in training and testing data should be accounted for, to avoid a reduction in detection accuracy, such as noted for Model 4 on 15 April testing data.
3.4. Determination of Optimal Parameters for Screening and Clustering Plant Centers
In an ideal case, plant centers obtained from multiple overlapping images should be projected on the exact geographic location of the plants, with minimal location error. However, errors in photogrammetric parameters (IO and EO) and the ground surface model can cause dispersion of projected centers, even though the errors were optimized by the SfM process (
Figure 11b and
Figure 12b). This study used a screening approach to reduce the number of plant centers that can induce larger dispersion of projection plant centers, and to find the best processing parameters that minimize error of plant count error.
A small estimation error of the plant centers in the raw image space can be translated into a larger distance error in the projected geographic space, particularly when detected plants are far from the fiducial point. For this reason, radial distance threshold values of 820, 1320, 1980, 2650, and 3300 pixels, which is equivalent to 25%, 40%, 60%, 80%, and 100% of one-half diagonal pixel distance in the raw image, were used to filter out any points that are outside of the circular radius in the later stages. Another controlling parameter suggested here was the class confidence score from the YOLOv3 model. It was assumed that the dispersion in geographic space could be reduced by screening out plant centers with lower confidence scores. Threshold values used for class confidence scores were 25%, 50%, and 75%.
After screening plant centers using both criteria, an agglomerative clustering algorithm was applied to the chosen plant centers [
51,
52]. The clustering algorithm was tested by running it with 7 different threshold values of distance between clusters, i.e., 0.01, 0.02, 0.04, 0.05, 0.10, 0.15, and 0.20 m
2. Afterwards, grid-wise plant counts were summarized for every combination of screening factor (radial distance and class confidence score), threshold of distance between two adjacent clusters (Equation (1)), and input YOLOv3 model (
Table 3), and this resulted in a total of 1050 instances (5 cases from radial distance, 3 cases from class confidence score, 7 cases from distance measure in the clustering, and 10 YOLOv3 models: 5 × 3 × 7 × 10 = 1050). Root mean square error (RMSE) and R
2, between manual and UAS-based counts, were calculated for each instance, to find optimal processing parameters.
Instances were separated into 5 groups according to the 5 radial distance threshold values for each model, and average RMSE was calculated. An average RMSE of 2.9, 3.0, 4.2, 5.3, and 5.6 plants, per linear meter of row, was observed, when radial distance was 820, 1320, 1980, 2650, and 3300.
Figure 11a illustrates average RMSE, grouped by both radial distance threshold and the YOLOv3 models (Models 1–10). Projected ground centers tended to spread more widely when their image coordinates were far from the fiducial center (
Figure 11b). Therefore, results with radial distance thresholds of 820, 1320, and 1980, which resulted in a total of 630 cases (3 cases from radial distance, 3 cases from class confidence score, 7 cases from distance measure, and 10 YOLOv3 models: 3 × 3 × 7 × 10 = 630), were used to identify the best performing parameters.
The 630 selected cases were separated into 3 groups by class confidence threshold of 25%, 50%, and 75%, and average RMSE was assessed for Models 1–10. When plant center coordinates were screened out by a 75% class confidence score, larger variances of average RMSE were observed, depending on which YOLOv3 model was used (
Figure 12a). This could be because class confidence score can change abruptly when the appearance of the plant deviates from that of a well-separated single plant. For example, the upper three plants in
Figure 12b show more concentrated canopy around the main stems, which was the common plant morphological appearance in 15 April data. Most of the plants showing the concentrated canopy structure had a class confidence score over 75%. On the contrary, the lowermost plant in the same figure (
Figure 12b) showed two smaller canopy segments around its main stem, and its class confidence score was mostly 50–75%. There were many instances from the study area in which class confidence scores of valid plant centers were 25–75%. In this respect, cases with a class confidence score threshold of 25% or 50% were only used to minimize loss of input plant centers, resulting in 420 cases left in the search.
Average RMSE, according to the distance threshold between clusters, showed minor differences between input models, except when the threshold was 0.01 or 0.02 m
2 (
Figure 13a). It was also evident that the cluster distance threshold of 0.05 and 0.1 m
2 produced smaller errors, when compared to others. Although the distance, defined by the Ward minimum variance method (Equation (1)), does not directly translate into a tangible distance concept, it is meaningful that the optimum range for this parameter could be identified. The effect of the cluster distance threshold on clustering results can be visually assessed in
Figure 13b. The number of plants derived by the proposed clustering method, with the cluster distance threshold of 0.10 m
2, perfectly matched the plant count driven by visual assessment (
Figure 13b).
The comparative analysis, between processing parameters and plant counting performance, revealed the applicable range of screening and clustering parameters. As one of the goals of this study is to suggest a plant counting procedure that works with general input UAS data, the average RMSE of plant count was assessed using Models 1–10, which were trained by different input training and testing datasets, and processing parameters with a minimum average RMSE were determined as optimal values. Average RMSE values derived from Models 1–10, when the selected screening and clustering parameters were applied, are shown in
Table 4. The results indicate that average RMSE ranged from 1.02 to 2.83 plants per linear meter of row. The lowest average RMSE of Models 1–10 was 1.02 plants per linear meter of row, when class confidence score, radial distance, and cluster distance threshold were set to 25%, 1980 pixels, and 0.10 m
2, respectively. It was considered that keeping a larger number of object centers, using a lower minimum class confidence score, was effective to minimize plant count error, when the distance threshold for clustering was 0.05 or 0.10 m
2.
The three main hyperparameters used in the study should be adjusted when the proposed approach is applied with a different flight altitude, camera, or to a different target crop. Although in-depth investigation on hyperparameter tuning in different conditions was not conducted in this study, general recommendations can be drawn. Maximum radial distance from fiducial center is a parameter related to the quality of lens calibration, and should be decreased when poorly calibrated or a lower-quality lens is used. As the lens distortion error increases, smaller areas around the fiducial center will provide a reliable accuracy of image coordinates to ground surface projection. Minimum class confidence score is a parameter related to the quality of object detection results. A threshold value of 25% will be appropriate to minimize omission rate, when majority of detected objects have a smaller class confidence score due to small object size or diversity of object shape. However, a higher threshold value can be used if the target object is more distinctive, and the resultant class confidence score is generally higher. Maximum distance between adjacent clusters should be determined considering the ground projection errors. The location accuracy of the projected plant center can improve as flight altitude is lowered, or target plants are more regularly shaped, therefore, a smaller value for this parameter can be used.
3.5. Accuracy of Plant Counting
Assuming the average error was optimized with the steps discussed previously, RMSE and R
2 values for Models 1–10 were examined (
Table 5). The majority of RMSE and R
2 values ranged between 0.50–1.14 plants per linear meter of row, and 0.82–0.97, respectively, and Models 1, 3, 5, 6, 7, and 8 showed optimal results with RMSE values ranging from 0.50 to 0.85 plants per linear meter of row. It should be noted that the RMSE values of multiple-date inputs (Models 5–10) were generally smaller than those of single-date input models (Models 1–4), as a similar trend was shown from the object detection accuracy (
Figure 14). Therefore, it is considered that the accuracy of the deep learning-based plant counting method can be improved by including training data containing variability of plant size, shape, and background information for practical applications.
However, Model 4 (trained with 24 April data) resulted in an extremely large error of 3.11 plants per linear meter of row (
Table 5), and Models 9 and 10, which used 24 April data during training, also produced slightly higher RMSE values of 1.06 and 1.09 plants per meter of row. The higher RMSE from Models 4, 9, and 10 can be largely attributed to differences in the morphological plant characteristics in the training and test datasets, as previously discussed.
Except for the results derived from Model 4, the RMSE of plant count of 0.50–1.14 plants per linear meter of row was achieved from all other YOLOv3 models, when mAP was over 45% (
Figure 14). From inverse correlation between object detection accuracy and RMSE of plant count, it can also be presumed that expected values of plant count error (RMSE) is 0.5–1.2 when object detection accuracy (mAP) is 50–90%, and RMSE of plant count becomes lower than 1.0 when object detection accuracy is over 80%.
The range of RMSE when 15 April data were used in the training (Model 1 and 5–7) and when 15 April data were not used in the training (Models 2, 3, and 8), was 0.50–0.60, and 0.81–1.14 plants per linear meter of row. Therefore, it should be restated that training data should be collected from at least two different dates to obtain optimal results using the proposed method. In the following paragraphs, best results derived from two YOLOv3 models trained with 15 April data (Models 1 and 5), and a model trained without (Models 3), are presented to discuss the effect of availability of training data acquired from the date of inference (15 April).
From the plant count results in
Table 5, Models 1, 5, 6, and 7 were considered effective based on the lower RMSE, ranging from 0.50 to 0.60 plants per linear meter of row.
Figure 15a–d shows a scatter plot between the reference plant count and the optimal UAS-based plant count, derived from Models 1 and 5. Although Model 5 resulted in the lowest RMSE, Models 1, 6, and 7 also produced almost identical trends, as shown in
Figure 15a. When assessing the relationship between the reference plant count and the UAS-based plant count, it was evident that most of the data points were distributed along the 1:1 line on the scatter plot (
Figure 15a,c) when plant density was 0–14 plants per linear meter of row. Previous research indicates that yield potential can be drastically reduced when the number of plants per linear meter of row is below five plants per linear meter of row [
5]. It is encouraging, since the proposed methodology performs well under these conditions, improving the chances of identifying areas prone to yield penalties due to inadequate plant population. For 82% of all 1 m
2 grids, reference and UAS-based plant counts were identical, and 99% of the estimates showed an absolute error of less than or equal to two (
Figure 15b,d).
Nevertheless, the accuracy of the proposed method was lower when reference plant count was greater than 10 plants per linear meter of row, based on the ratio of correct and incorrect results (
Figure 15a,c). A larger variance in this region can be associated with the increase in canopy overlap, as well as the limited number of input data points (i.e., the less frequent appearance in the training data,
Figure 1b). In order to overcome this problem with the object detection approach, more data is required, with plant count, greater than, or equal to, 10. Alternatively, a more sophisticated process, exploiting detailed object skeletal structure [
25] or a feature extraction and classification method [
23], can be performed to determine the exact number of plants and their locations.
The best plant count results without 15 April training data were achieved by Model 3. Although data are generally distributed around 1:1 line, larger variance was observed throughout the entire range, with a slight underestimating trend (
Figure 15e). This result indicates the necessity of training data of earlier dates, when plants do not severely overlap each other, and the effectiveness of multi-date training data, to obtain a lower RMSE of plant count in the cotton seedling stage.
The implementation of proposed method, with diverse input training data and appropriate set of hyperparameters, resulted in a RMSE of plant counting less than 0.6 in the initial growth stages of cotton. While this study focuses on optimizing the three major hyperparameters (class confidence score, maximum radial distance, and cluster distance threshold) to minimize the RMSE of plant count, use of a novel object detection algorithm that is capable of detecting small objects and separating overlapping objects will greatly improve the performance of image-based plant counting.