Author Contributions
Conceptualization, W.Y. and M.D.; methodology, W.Y., W.Z., and M.D.; software, W.Y.; validation, W.Y. and W.Z.; formal analysis, W.Y.; investigation, W.Y.; resources, M.D.; data curation, W.Y.; writing—original draft preparation, W.Y. and M.D.; writing—review and editing, W.Y. and M.D.; visualization, W.Y.; supervision, M.D.; project administration, M.D. All authors have read and agreed to the published version of the manuscript.
Figure 1.
Approach utilized in this study to evaluate methods of bale detection (image processing, faster R-CNN, and YOLO) in UAV imagery and predict bale geolocation using photogrammetry.
Figure 1.
Approach utilized in this study to evaluate methods of bale detection (image processing, faster R-CNN, and YOLO) in UAV imagery and predict bale geolocation using photogrammetry.
Figure 2.
Examples of bales imaged at 20 MP (5472 × 3648 pixels) and an altitude of 61 m above ground level yielding a pixel resolution of 1.08 × 1.65 cm: top left—field of corn stover residue containing six bales and one partial bale; top right—field of corn stover residue containing three bales and one building; bottom left—field of corn stover residue containing six bales; bottom right—field of soybean residue containing four bales.
Figure 2.
Examples of bales imaged at 20 MP (5472 × 3648 pixels) and an altitude of 61 m above ground level yielding a pixel resolution of 1.08 × 1.65 cm: top left—field of corn stover residue containing six bales and one partial bale; top right—field of corn stover residue containing three bales and one building; bottom left—field of corn stover residue containing six bales; bottom right—field of soybean residue containing four bales.
Figure 3.
Examples of annotated images with LabelMe [
27]: Left—a road annotated in red and four bales in blue/yellow/green/pink; right—here, a truck is highlighted in red, and yellow/green represent annotated buildings.
Figure 3.
Examples of annotated images with LabelMe [
27]: Left—a road annotated in red and four bales in blue/yellow/green/pink; right—here, a truck is highlighted in red, and yellow/green represent annotated buildings.
Figure 4.
Pipeline to detect bales in the field using image processing. It starts with converting the image to grayscale, blurring to remove noise, equalizing the histogram to remap the pixel values between 0–255, binarizing using Otsu threshold, and applying the erosion + dilation to remove noise.
Figure 4.
Pipeline to detect bales in the field using image processing. It starts with converting the image to grayscale, blurring to remove noise, equalizing the histogram to remap the pixel values between 0–255, binarizing using Otsu threshold, and applying the erosion + dilation to remove noise.
Figure 5.
Sample picture from a field of corn stover residue processed through each step of the pipeline (
a–
f) described in
Figure 4. The input image contains only one biomass bale in the top right part of the figure. The output image is a binary mask that segments the bale from the background.
Figure 5.
Sample picture from a field of corn stover residue processed through each step of the pipeline (
a–
f) described in
Figure 4. The input image contains only one biomass bale in the top right part of the figure. The output image is a binary mask that segments the bale from the background.
Figure 6.
Image processing, faster R-CNN, and YOLOv3 detection outputs from a sample image of field 0. (a) is the original 20 MP sample image. (b) is the output of the detection using the image processing pipeline. (c,e) are the outputs of the faster R-CNN and YOLOv3 on the 1 MP sample image, respectively. (d,f) are the outputs of the faster R-CNN on the 20 MP sample image, respectively.
Figure 6.
Image processing, faster R-CNN, and YOLOv3 detection outputs from a sample image of field 0. (a) is the original 20 MP sample image. (b) is the output of the detection using the image processing pipeline. (c,e) are the outputs of the faster R-CNN and YOLOv3 on the 1 MP sample image, respectively. (d,f) are the outputs of the faster R-CNN on the 20 MP sample image, respectively.
Figure 7.
The final pipeline of the mapping framework: (1) image data were collected with a UAV; (2) a YOLOv3 model was trained and tuned to obtain the coordinates in the image; (3) image coordinates are converted from drone position and pose to latitude and longitude; (4) multiple observations of the same bale in different images are reconciled using DBSCAN; (5) final coordinates of the bale is predicted.
Figure 7.
The final pipeline of the mapping framework: (1) image data were collected with a UAV; (2) a YOLOv3 model was trained and tuned to obtain the coordinates in the image; (3) image coordinates are converted from drone position and pose to latitude and longitude; (4) multiple observations of the same bale in different images are reconciled using DBSCAN; (5) final coordinates of the bale is predicted.
Figure 8.
Field 0: (a) black dots are the predictions, and the red dots are the ground truth coordinates of the bales; (b) shows the DBSCAN clustering of the detected bales. It is possible to see that some isolated detections are rejected as noise; (c) shows an overlay of the orthomosaic with the colored DBSCAN clustering.
Figure 8.
Field 0: (a) black dots are the predictions, and the red dots are the ground truth coordinates of the bales; (b) shows the DBSCAN clustering of the detected bales. It is possible to see that some isolated detections are rejected as noise; (c) shows an overlay of the orthomosaic with the colored DBSCAN clustering.
Figure 9.
The black line is the linear regression of the predicted bale location by the surveyed ground truth location of the bales. In the red dashed line, we have the 45° slope as a reference: left—predicted latitude versus actual latitude (y = 1.000629x, R2 = 1, F = 8.7 × 107, p-value < 0.01); right—predicted longitude versus actual longitude (y = 1.001x, R2 = 1, F = 1.14 × 108, p-value < 0.01).
Figure 9.
The black line is the linear regression of the predicted bale location by the surveyed ground truth location of the bales. In the red dashed line, we have the 45° slope as a reference: left—predicted latitude versus actual latitude (y = 1.000629x, R2 = 1, F = 8.7 × 107, p-value < 0.01); right—predicted longitude versus actual longitude (y = 1.001x, R2 = 1, F = 1.14 × 108, p-value < 0.01).
Table 1.
Number of instances annotated using LabelImg for COCO datasets and YOLO datasets.
Table 1.
Number of instances annotated using LabelImg for COCO datasets and YOLO datasets.
Images | Bales | Buildings | Streets | Trucks |
---|
300 | 783 | 22 | 72 | 17 |
Table 2.
Dataset specifications for training machine learning algorithms.
Table 2.
Dataset specifications for training machine learning algorithms.
Dataset | Resolution | Train | Validation | Test |
---|
High Res | 5472 × 3648 | 243 | 27 | 30 |
Low Res | 1080 × 720 | 243 | 27 | 30 |
Table 3.
YOLOv3 detector network architecture.
Table 3.
YOLOv3 detector network architecture.
| Type | Filters | Size | Output |
---|
| Convolutional | 32 | 3 3 | 256 256 |
| Convolutional | 64 | 3 3/2 | 128 128 |
| Convolutional | 32 | 1 1 | |
1 | Convolutional | 64 | 3 3 | |
| Residual | | | 128 128 |
| Convolutional | 128 | 3 3/2 | 64 64 |
| Convolutional | 64 | 1 1 | |
2 | Convolutional | 128 | 3 3 | |
| Residual | | | 64 64 |
| Convolutional | 256 | 3 3/2 | 32 32 |
| Convolutional | 128 | 1 1 | |
8 | Convolutional | 256 | 3 3 | |
| Residual | | | 32 32 |
| Convolutional | 512 | 3 3/2 | 16 16 |
| Convolutional | 256 | 1 1 | |
8 | Convolutional | 512 | 3 3 | |
| Residual | | | 16 16 |
| Convolutional | 1024 | 3 3/2 | 8 8 |
| Convolutional | 512 | 1 1 | |
4 | Convolutional | 1024 | 3 3 | |
| Residual | | | 8 8 |
| Avgpool | | Global | |
| Connected | | 1000 | |
| Softmax | | | |
Table 4.
Google Colab specifications of the machine used to implement the detection algorithms.
Table 4.
Google Colab specifications of the machine used to implement the detection algorithms.
CPU | Memory | GPU | CUDA Version |
---|
Intel Xeon 2.20GHz | 16 GB | Tesla P100-16GB | 10.1 |
Table 5.
Image processing parameters that yielded the best contrast between the bales and ground.
Table 5.
Image processing parameters that yielded the best contrast between the bales and ground.
Gaussian Size | Gaussian Iterations | Erosion Kernel | Dilation Kernel | Erosion Iterations | Dilation Iterations |
---|
(45.3) | 2 | (20, 20) | (25, 25) | 1 | 2 |
Table 6.
Performance of image processing, R-CNN, and YOLOv3 on in-field detection of biomass bales.
Table 6.
Performance of image processing, R-CNN, and YOLOv3 on in-field detection of biomass bales.
Method | Precision | Recall | F1 | mAP | Inference Time (s) |
---|
Image Processing | 0.681 | 0.878 | 0.767 | - | 9.1 |
Faster R-CNN (Low Res) | 0.823 | 0.902 | 0.860 | 0.802 | 0.597 |
Faster R-CNN (High Res) | 0.845 | 0.895 | 0.869 | 0.808 | 0.627 |
YOLOv3 (Low Res) | 0.790 | 0.967 | 0.883 | 0.958 | 0.377 |
YOLOv3 (High Res) | 0.801 | 0.988 | 0.889 | 0.965 | 0.400 |
Table 7.
Optimal YOLOv3 hyperparameters determined for in-field detection of biomass bales.
Table 7.
Optimal YOLOv3 hyperparameters determined for in-field detection of biomass bales.
| GIoU | cls | clspw | obj | objpw |
---|
Initial | 3.54 | 37.4 | 1.0 | 49.5 | 1.0 |
Final | 5.21 | 41.4 | 1.6 | 49.5 | 1.46 |
| iot_t | lr | SGD | weight_decay | fl_gamma |
Initial | 0.225 | 0.006 | 0.937 | 0.0005 | 0.4 |
Final | 0.166 | 0.009 | 0.881 | 0.0003 | 0.0 |
Table 8.
Bale detection performance of YOLOv3 before and after tuning the hyperparameters.
Table 8.
Bale detection performance of YOLOv3 before and after tuning the hyperparameters.
Method | Precision | Recall | F1 | mAP | Method |
---|
No Tuning | 0.801 | 0.988 | 0.889 | 0.965 | No Tuning |
Tuned | 0.895 | 1.000 | 0.988 | 0.945 | Tuned |
Table 9.
Bale localization performance in three fields using the proposed method compared with RTK–GNSS ground survey.
Table 9.
Bale localization performance in three fields using the proposed method compared with RTK–GNSS ground survey.
Field | RMSE (°) | RMSELAT (°) | RMSELON (°) | RMSE (m) | RMSELAT (m) | RMSELON (m) |
---|
0 | | | | 2.67 | 1.33 | 2.32 |
1 | | | | 2.48 | 0.88 | 2.32 |
2 | | | | 2.13 | 1.52 | 2.60 |
Average | | | | 2.43 | 1.24 | 2.41 |