Next Article in Journal
Assessment of Staining Patterns in Facades Using an Unmanned Aerial Vehicle (UAV) and Infrared Thermography
Next Article in Special Issue
Fully Automatic Geometric Registration Framework of UAV Imagery Based on Online Map Services and POS
Previous Article in Journal
Evolution of Unmanned Surface Vehicle Path Planning: A Comprehensive Review of Basic, Responsive, and Advanced Strategic Pathfinders
Previous Article in Special Issue
Unmanned Aircraft Systems in Road Assessment: A Novel Approach to the Pavement Condition Index and VIZIR Methodologies
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Chestnut Burr Segmentation for Yield Estimation Using UAV-Based Imagery and Deep Learning

1
Engineering Department, School of Science and Technology, University of Trás-os-Montes e Alto Douro, 5000-801 Vila Real, Portugal
2
Centre for Robotics in Industry and Intelligent Systems (CRIIS), Institute for Systems and Computer Engineering, Technology and Science (INESC-TEC), 4200-465 Porto, Portugal
3
Computer Science Department, Engineering School, Polytech Annecy Chambery, University Savoie Mont Blanc, 74000 Annecy, France
4
ALGORITMI Research Centre, University of Minho, 4800-058 Guimarães, Portugal
5
Centre for the Research and Technology of Agro-Environmental and Biological Sciences, University of Trás-os-Montes e Alto Douro, 5000-801 Vila Real, Portugal
6
Institute for Innovation, Capacity Building and Sustainability of Agri-Food Production, University of Trás-os-Montes e Alto Douro, 5000-801 Vila Real, Portugal
*
Author to whom correspondence should be addressed.
Drones 2024, 8(10), 541; https://doi.org/10.3390/drones8100541
Submission received: 17 August 2024 / Revised: 19 September 2024 / Accepted: 25 September 2024 / Published: 1 October 2024

Abstract

:
Precision agriculture (PA) has advanced agricultural practices, offering new opportunities for crop management and yield optimization. The use of unmanned aerial vehicles (UAVs) in PA enables high-resolution data acquisition, which has been adopted across different agricultural sectors. However, its application for decision support in chestnut plantations remains under-represented. This study presents the initial development of a methodology for segmenting chestnut burrs from UAV-based imagery to estimate its productivity in point cloud data. Deep learning (DL) architectures, including U-Net, LinkNet, and PSPNet, were employed for chestnut burr segmentation in UAV images captured at a 30 m flight height, with YOLOv8m trained for comparison. Two datasets were used for training and to evaluate the models: one newly introduced in this study and an existing dataset. U-Net demonstrated the best performance, achieving an F1-score of 0.56 and a counting accuracy of 0.71 on the proposed dataset, using a combination of both datasets during training. The primary challenge encountered was that burrs often tend to grow in clusters, leading to unified regions in segmentation, making object detection potentially more suitable for counting. Nevertheless, the results show that DL architectures can generate masks for point cloud segmentation, supporting precise chestnut tree production estimation in future studies.

1. Introduction

Maximizing crop yields while minimizing economic and environmental losses is a perpetual challenge in agriculture [1]. Historically, these tasks were primarily conducted through manual monitoring methods [2], often requiring significant financial and human resources. Consequently, agricultural practices frequently relied on the excessive use of chemical products to maintain crop health and productivity [3].
The paradigm of precision agriculture (PA) relies on a variety of technologies to improve the efficiency, productivity, and sustainability of agricultural operations [4]. These technologies include sensors for data collection from diverse sources [5,6], Internet of Things (IoT) devices to enable real-time communication and information dissemination [7], field mapping technologies that enable georeferenced data acquisition using GNSS (Global Navigation Satellite Systems) receivers along with geographical information systems (GIS) [8], remote sensing data to monitor crop health and its temporal variability [9], and data processing and analysis techniques, such as artificial intelligence (AI), to assist in prediction and decision making [10]. The application of these technologies creates new opportunities to observe, measure, and respond to inter- and intra-field variability and yield improvements [11], simplifying agricultural decision-making, particularly in regions with high agricultural production [12]. In the context of remote sensing data, the advent of unmanned aerial vehicles (UAVs) has enabled the acquisition of high spatial and temporal resolution data [13] from various sensors for a range of PA applications [14].
For instance, the chestnut (Castanea sativa Mill.) harvest in northern Portugal is both intensive and economically indispensable [15]. In this context, remote sensing platforms are valuable sources of data for decision support tools in chestnut groves, yet this crop has been under-represented compared to research on other orchard trees [11]. Previous studies within this area have employed aerial imagery to assess diseases [16,17,18], visually evaluate its phytosanitary status [19], estimate biomass from LIDAR data [20], monitor canopy thinning responses [21], and classify chestnut stands [22] using satellite imagery. UAV data have been used to monitor the decline of chestnut trees [23,24], automatically detect and extract individual tree geometrical parameters [25], estimate pruning wood biomass [26], and estimate phytosanitary issues using machine learning [27]. Machine learning techniques, particularly deep learning (DL), have gained prominence in PA applications, where they are applied to diverse tasks such as crop, water, and soil management [28].
In the field of fruit detection, DL techniques have been applied to various fruits, such as tomatoes [29,30], apples [31,32,33], mangoes [34,35], citrus fruits [36], litchis [37], bananas [38], and melons [39]. Rahnemoonfar and Sheppard [29] used convolutional neural networks (CNNs) trained with synthetic data and tested on real tomato images, achieving an accuracy of 91%. Additionally, combinations of RGB images and corresponding depth images (RGB-D) have been used for fruit detection [30,40]. Afonso et al. [30] used a masked R-CNN [41] with a ResNext101 [42] backbone to detect tomatoes using RGB-D images captured at night, mitigating the effects of illumination variation and achieving an F1-score of 0.9, while Mengoli further used RGB-D data to estimate apple size. For apple counting, several DL-approaches have been explored by Gao et al. [31], Häni et al. [32], and Apolo-Apolo et al. [33]. Gao et al. [31] developed a method using YOLOv4-tiny [43] combined with a spatial channel reliability discriminative correlation filter to process orchard videos, achieving a counting accuracy of 91.49%. Häni et al. [32] applied U-Net [44] for segmentation and Faster R-CNN for apple detection, followed by ResNet-50 [45] for counting, obtaining an accuracy value exceeding 80%. Apolo-Apolo et al. [33] used Faster R-CNN on UAV images flown at 10 m to detect and count apples on individual trees, employing linear regression to estimate the total tree production based on the visible fruit.
Koilara et al. [34] compared Faster R-CNN, YOLOv2 [46], and YOLOv3 [47], introducing a new architecture, “MangoYolo”, for real-time mango detection in images acquired at night. MangoYolo achieved the best balance between performance and inference time among the tested models, with an F1-score of 0.97 and an inference time of 15 seconds. Similarly, Xiong et al. [35] used YOLOv2 to count green mangoes in orchards, using a UAV flying between 1.5 and 2 m, instead of an agricultural vehicle as used by Koilara et al. [34]. This approach achieved an estimation error rate of 1.1%. Apolo-Apolo et al. [36] developed a method to detect, count, and estimate citrus fruit size on individual trees using a Faster R-CNN on UAV images captured at heights of five to six meters over 20 trees, achieving an average standard error of 7.22%. Similarly, Xiong et al. [37] applied a modified YOLOv5 to estimate litchi fruit production from UAV images captured at three to five meters height, achieving an average precision of 72.6%. Neupane et al. [38] employed Faster R-CNN to count bananas in UAV images captured at heights of 40, 50, and 60 m, using linear contrast stretching, synthetic color transformation, and the triangular green index, achieving accuracy rates between 75.8% and 96.4%. Kalantar et al. [39] used UAV images taken at 15 m and applied RetinaNet [48] to estimate melon counts and weight. After detection, the authors applied the Chan–Vese active contour algorithm and an ellipse fitting method via principal component analysis to estimate melon weight with a regression model, achieving an F1-score of approximately 0.9 in detection and a weight estimation error of 3%.
DL methodologies have also proven effective in addressing various challenges in chestnut crop management, such as tree detection [49], chestnut quality assessment [50], and origin classification [51]. Additionally, methodologies for counting burrs on chestnut trees have been developed [52,53,54]. Adão et al. [52] segmented ground-based RGB images of chestnut trees into individual patches and classified them using the Xception model [55], reaching an Intersection over Union (IoU) of 0.54. Arakawa et al. [53] used YOLOv4 to detect and count chestnut burrs in UAV images captured at heights of 12 to 15 m, training the models on 7866 burrs across 500 images. Validation against ground-truth assessments showed an IoU of 84.27 and an RMSE of 6.3 per tree. Comba et al. [54] used U-Net to segment chestnut burrs in UAV-based multispectral imagery captured at 20 m height, achieving an accuracy of 92.30%.
Ensuring precise control over crop harvesting is a critical aspect of modern agricultural practices. The development of automated monitoring and yield estimation methods offers significant opportunities for farmers to optimize their operations. However, certain limitations remain in DL-based methods for fruit detection and yield estimation, including sensitivity to varying lighting conditions (e.g., data acquired at night [30]), altitude inconsistencies during UAV flights (e.g., data collected too close to trees and/or via manual UAV operation [37]), and challenges posed by occluded or clustered fruits [33]. Addressing these limitations is crucial for improving the reliability and accuracy of such systems.
The primary objective of this study is to advance chestnut plantation management by reducing the time and effort required to assess chestnut yield, thereby improving operational efficiency for farmers. To achieve this, a method for segmenting chestnut burrs using DL techniques applied to UAV imagery automatically acquired at typical photogrammetric flight heights and under different illumination conditions is presented and evaluated. Furthermore, this study introduces a methodology for estimating chestnut tree yield from point cloud data obtained through photogrammetric processing of UAV imagery, as has been demonstrated for other crops [56,57]. By leveraging the potential of UAV data and DL techniques, this study aims to improve chestnut tree management, monitor agricultural productivity, and promote sustainability, while reducing the manual workload for farmers.

2. Material and Methods

2.1. Proposed Methodology

The methodology for chestnut tree yield estimation integrates point cloud data obtained from UAV-based imagery and from DL techniques. An overview of the proposed methodology is presented in Figure 1.
In the initial phase, high-resolution UAV imagery of a chestnut orchard is acquired, ensuring consistent flight height, camera angle, and high image overlap. The UAV imagery undergoes thorough photogrammetric processing to produce a dense point cloud. DL-based segmentation models are then trained on annotated UAV images, and their performance is evaluated trough the calculation of specific metrics, chestnut burr counting within the imagery, and comparison with similar studies, thereby validating the quality of the results for the subsequent stages of the proposed methodology.
In the second phase of the methodology, the trained models are applied to all acquired images, resulting in segmented images representing the detected chestnut burrs, which are then projected into the 3D point cloud. This involves four key steps: frustum culling, 3D mapping, occlusion testing, and space-based optimization, following the approach by Jurado-Rodriguez et al. [58]. Frustum culling, eliminates points beyond the camera view. During the subsequent 3D mapping phase, a camera model is applied to project the coordinates of selected points in 3D space onto a 2D image. An occlusion test is then performed using spatial clustering to discard occluded points. Finally, space-based optimization is executed to classify the remaining unclassified pixels. These steps culminate in a fully labeled point cloud used to estimate chestnut yield based on modeling the relationship between the chestnut burr area, chestnut count, and their weight.
This methodology faces two main challenges: automating the segmentation of chestnut burrs in the UAV images and of chestnut trees in the point cloud. Counting chestnut burrs in the point cloud improves accuracy by preventing duplicate counts from overlapping images.
This study focus on evaluating the first two steps of the methodology (Figure 1), with three DL-based segmentation architectures tested for chestnut burr counting from planar UAV images. The models were trained on two datasets individually and as a combined set to assess the performance in the yield estimation task.

2.2. Proposed Dataset

2.2.1. UAV Data Acquisition

The UAV data used in this study were acquired from a chestnut plantation in northeastern Portugal, covering an area of approximately 0.55 hectares. A multi-rotor Phantom 4 Pro V2.0 equipped with a 1-inch CMOS sensor for RGB imagery acquisition at a resolution of 20 megapixels was used. The sensor was mounted on a 3-axis electronic gimbal. The flight mission was conducted at a flight height of 30 m relative to the take-off point, which was located adjacent to the chestnut plantation. A total of 601 images were captured with 90% longitudinal overlap and 80% lateral overlap. The plantation terrain is relatively flat, with no significant topographic variations that could impact the flight height. The camera was angled at −65°, and the mission followed a double-grid flight configuration, allowing us to obtain georeferenced photographs from multiple perspectives of the chestnut trees. The UAV data collection took place on 8 September 2020, during the early fruiting stage of the trees. In addition to the chestnut trees, the plantation surroundings included bare soil, shrubs, and Pinus pinaster Aiton trees.

2.2.2. Dataset Annotation

Each UAV image obtained during the flight was divided into 288 patches, each with a size of 608 × 608 pixels (Figure 2). To facilitate image annotation, various thresholds were tested across different color spaces, including RGB, HSV (hue, saturation, and value), and CIELAB ( L * a * b * ). Among these, the L * a * b * color space obtained the best results and was therefore selected to assist the annotation process. The L * a * b * color space is composed of three axes: the L * axis, representing lightness, ranges from 0 to 100, while the a * and b * axes correspond to Hering’s opponent theory of color. This theory defines color based on the fact that the human retina perceives colors based on red-versus-green and yellow-versus-blue attributes, which are not defined by fixed boundaries [59]. Pixels with values in the ranges L * [ 50 ; 100 ] , a * [ 40 ; 0 ] , and b * [ 40 ; 50 ] were identified as potential representations of chestnut burrs.
Examples of pixel separation for chestnut burr identification are presented in Figure 3. The color thresholding in the L * a * b * color space successfully differentiates burrs from chestnut leaves and soil in healthy trees (Figure 3b). However, this approach is less effective in trees with phytosanitary issues (Figure 3a), where some leaves are misclassified as chestnut burrs due to color similarities.
After the establishment of a color threshold approach, 189 UAV images were randomly selected, and their masks were improved/corrected using the open-source data labeling platform Label Studio (HumanSignal, Inc., San Francisco, CA, USA). This process ensured that the dataset included a variety of cases, such as images with and without burrs, images containing leaves from trees with phytossanitary issues, and images with leaves from other tree species. This diverse representation helps the dataset to cover a range of scenarios that can found. The annotated images were then used to train DL-based segmentation models. These images were randomly split into training (66%), validation (17%), and test (17%) subsets.
To enable the use of this dataset for training object detection architectures, an automated annotation transformation was performed for the entire dataset. Each image was manually verified to ensure that all visible chestnut burrs were annotated. Each region containing burrs was enclosed with a bounding box, even if the region included more than one burr. This means that a single bounding box may contain multiple burrs (Figure 4). While this approach may introduce ambiguity into the object detection task, given that the annotation was not performed by an expert, it was necessary for performance comparisons with other studies and datasets, and this uncertainty was therefore considered acceptable. In the remainder of the text, this dataset will be referred to as Dataset 1.

2.2.3. Dataset 2

Arakawa et al. [53] presented a dataset (hereinafter referred to as Dataset 2) for detecting burrs of two chestnut varieties (Castanea crenata Siebold et Zucc. and Castanea mollissima Blume). The images were acquired in a 9.4 ha chestnut grove at the Gifu Prefectural Institute of Agricultural Research (GRIAT; Nakatsugawa, Gifu, Japan; 35°29′ N, 137°28′ E; Cfa in the Köppen-Geiger classification) in 2021. A DJI Mavic Mini (DJI Technology, Shenzhen, China), equipped with a 1/2.3″ onboard camera sensor, was used for manual UAV imagery acquisition. The flights were conducted at a height of 12 to 15 m above the ground, and the data were collected one or two months before harvest on a cloudy day to minimize sunlight interference.
After acquisition, the images were divided into 416 × 416-pixel patches, totaling 500 images, which were annotated using the LabelImg software (v1.8.1.). The dataset provided by the authors was then randomly split into training (70%), validation (15%), and test (15%) sets.
To test the counting of chestnut burrs, 53 images were resized to 608 × 608 pixels. The authors conducted two counts: a manual count and an in situ count. The manual count, performed using ImageJ’s cell counter plugin, recorded 4234 burrs, while the in situ count by a specialized technician resulted in 6753 burrs.
An annotation transformation was applied to adapt the dataset for segmentation model training. This approach consisted of drawing a circle inside each bounding box to represent burrs. Although this annotation introduces some uncertainty into the segmentation process, since it was not performed by an expert, it was deemed acceptable for comparing the performance of segmentation models across different datasets.

2.3. Models and Training

Considering the aim of the proposed methodology for chestnut yield estimation in 3D point clouds, three different DL-based segmentation architectures were tested: U-Net [44], LinkNet [60], and PSPNet [61]. In addition, given that Arawaka et al. [53] evaluated their dataset using YOLO architectures, the YOLOv8m [62] was included in the experiments for comparison. The YOLOv8m was trained as an object detector and cannot be used to generate segmented images. It is important to note that this model was only included for comparison purposes. Testing different YOLO models or hyperparameters is beyond the scope of this study.
U-Net was selected for its proven effectiveness, even with small datasets [63]. LinkNet was chosen for its similar architecture to U-Net, but with fewer parameters. PSPNet was selected as a compromise between parameter count and performance due to its use of Spatial Pyramid Pooling. Figure 5 presents a general overview of each architecture.
The Python library Segmentation Models [64] was used to implement these models. All three architectures used SE-ResNet-50 as the backbone. For U-Net and LinkNet, five up-sampling blocks were employed in the expansive path, with each block being composed of an upsampling layer (which increases the feature maps’ resolution, creating sparse feature maps) followed by two convolutional layers (responsible for filling the sparse feature maps). The connection between contracting and expansive paths was made using the final convolution activation for each block of SE-ResNet-50 and their corresponding operation in the expansive path. For an input dimension of ( H , H , 3 ) , the first upsampling block in the expansive path, the backbone block which has an output dimension of ( H / 16 , H / 16 ) , was used for concatenation or addition. The remaining four blocks follow the same logic, but use feature maps from blocks with output dimension of H / 8 , H / 4 , H / 2 , and H, respectively. PSPNet used a down-sampling factor of eight and 512 filters in the spatial pooling convolution layer.
Batch normalization [65] was applied between blocks during training. U-Net and LinkNet used input sizes of 320 × 320 pixels, while PSPNet required 336 × 336 pixels due to its need for input sizes divisible by 6 × downsampling   factor (in this case, 48). The sigmoid activation function was used for all models, with a fixed threshold of 0.5 for all models. The use of a larger decoder increases the number of parameters for U-Net (34.60 million) and LinkNet (30.80 million) compared to PSPNet, which has four million parameters despite its larger input size, being approximately eight times smaller than U-Net.
The training followed the fine-tuning strategy described by Chollet [66], with two phases: initially, the backbone’s weights were frozen and only the decoder was trained, followed by full training after unfreezing the weights. Each phase involved 50 epochs of training using AdamW as the optimizer, and an initial learning rate of 1 × 10 3 for the first phase and decreased to 1 × 10 4 in the second. A Reduce Learning Rate on Plateau approach was applied, with a patience of two epochs by a factor of 0.95.
The loss function was a combination of Dice Loss (1) and Focal Loss (2), defined as in (3), where p t represents prediction accuracy, α and γ are loss parameters from the Focal Loss (set to 0.25 and 2 by default), y the ground truth, ϵ a small value to avoid division by zero, and p ^ the predicted value.
D I C E ( y , p ^ ) = 1 2 y p ^ + ϵ y + p ^ + ϵ
F L = α t ( 1 p t ) γ log ( p t )
Total Loss = F L + D L
Online data augmentation was applied during training using the Python library Albumentations [67], including random crops, flips, Gaussian noise, solarization, and blur. This process aims to increase the diversity in the data.
The Precision, Recall, IoU, and F1-Score metrics were used to evaluate the segmentation models’ performance.
For YOLOv8m, training followed hyperparameters similar to those used in segmentation models. The input size was 320 × 320 pixels, Adam was used as the optimizer, and the learning rate started at 1 × 10 3 . The IoU was set to 0.5, and the model was trained for 100 epochs with a batch size of 48. Online data augmentation was also applied, and the pre-trained weights from the MS COCO dataset [68] were used as initial weights. Approximately 25 million parameters were trained using the official Ultralytics implementation [62].
A workstation equipped with two Intel Xeon E5-2680 CPUs, 128 GB of RAM, and an Nvidia Quadro M400 GPU with 8 GB of VRAM was used to train the segmentation models. For the YOLOv8 experiments, a workstation with an Intel Core i7-13700KF processor, 48 GB of RAM, and an Nvidia GeForce RTX 3060 GPU with 12 GB of VRAM was used.

2.4. Burr Counting in UAV Images

To evaluate chestnut burr counting, 21 images were selected from the 189 annotated images, forming a subset referred to as the count dataset. An expert manually counted the burrs in these images, resulting in a total of 1322 burrs distributed across the images. Each image was then processed by the trained models to generate segmentation masks. To prevent the unification of burrs in the segmentation masks, an erosion operation was applied using a 3 2 -pixel ellipse kernel. The number of pixels in each segmented region was calculated using the regionprops function from the scikit-image Python library, assuming a 1-connectivity for connected pixels in the binary images.
The mode, mean, and median of the pixel counts for each burr were computed to quantify the average size of a burr in an image. For each image in the count dataset, the total number of burr pixels was summed and divided by these statistical values to estimate the number of burrs. The same approach was used for Dataset 2, which was evaluated in Arakawa et al. [53]. The manual count of 4234 burrs was used as the reference for this dataset.

3. Results

The overall results for the different segmentation architectures are summarized in Table 1. Among the evaluated models, U-Net demonstrated the best overall performance across all datasets. Nevertheless, LinkNet showed comparable results to U-Net on Dataset 2 and the merged datasets, slightly outperforming U-Net in Recall. In contrast, PSPNet showed the lowest performance across all datasets. When examining the individual dataset performance, the results revealed that Dataset 1 provided the poorest results, with the U-Net model achieving an IoU of 0.39. On the other hand, on Dataset 2, the U-Net model achieved an higher IoU of 0.62.
Segmentation results for models trained on Dataset 1, Dataset 2, and the merged dataset are presented in Figure 6 and Figure 7. In Figure 6, a significant reduction in the burr region is observed when the merged dataset is used for training, leading to more uniform segmentation in terms of shape. However, this improvement is not observed in Figure 7, where merging the datasets did not result in any significant changes in segmentation performance.
Table 2 presents the results of the segmentation experiments where test sets were switched. When only Dataset 2 was used for training, none of the segmentation models achieved an IoU greater than 0.1 when tested on Dataset 1. On the other hand, when the Dataset 2 test set was evaluated using the model trained on Dataset 1, U-Net achieved an IoU of 0.27, indicating that models can leverage Dataset 1 for better performance on the Dataset 2 test set. Distinctly, training on the merged datasets led to poorer segmentation performance compared to training on Dataset 1 alone across all models, with the smallest IoU difference being 0.10 for PSPNet. For Dataset 2, the results barely changed, with the largest IoU difference being 0.02 for LinkNet.
In terms of computational costs, training U-Net, LinkNet, PSPNet, and YOLOv8m with the merged dataset (the largest dataset) took 96, 98, 40, and 9 minutes, respectively. It is noteworthy that the segmentation models used early stopping, stopping before completing 50 epochs, while YOLOv8m, trained without early stopping, benefited from a more modern GPU with more RAM.
The mean, mode, median, and the number of predicted regions for each model trained on both the individual and merged datasets are presented in Table 3 and Table 4, respectively. For Dataset 1, the highest count accuracy was achieved using U-Net trained on the merged dataset combined with the mode value. The count error was 28.51%, which is lower than the best performance achieved by models trained solely on Dataset 1 (U-Net + median), which had a count error of 54.00%.
When analyzing the results for Dataset 1 it is clear that merging the datasets improved the model’s count accuracy, due to the reduction effect on the regions. Moreover, Table 3 and Table 4 present the standard deviation (std) of the number of pixels in the identified regions for the individual and merged datasets. Merging the datasets resulted in segmentation with more uniformly sized regions, with an std lower than the average (e.g., U-Net, mean = 52, std = 38). On the other hand, training the segmentation models exclusively on the proposed datasets produced regions with less uniform sizes (e.g., U-Net, mean = 174, std = 179).
The results for the YOLOv8m model are shown in Table 5. The performance of YOLOv8m was comparable to that of the segmentation models on all datasets. The difference between the F1-scores, specifically the disparity of 0.08 between U-Net trained on the merged dataset (F1-score = 0.51) and YOLOv8m (F1-score = 0.43) for Dataset 1 (Table 1), is related to the approximation made by transforming the segmentation annotations into object detection annotations. This transformation empirically introduced less uncertainty in the reverse direction, resulting in a similar F1-score for both models trained on Dataset 2 only (U-Net = 0.76, YOLOv8m = 0.78) and when merging the datasets (U-Net = 0.75, YOLOv8m = 0.77). It is important to note that the training of the YOLO model is not a replica of the experiments carried out by Arakawa et al. [53], since the data, models, and parameterization are different. Training the YOLOv8m model on Dataset 1 resulted in an F1-score of 0.34 when tested on the Dataset 2 test set. When trained on Dataset 2, the model was able to detect burrs on the Dataset 1 test set, but with a lower F1-score of 0.15. Merging the datasets led to slight changes in the results. For Dataset 1, the biggest differences were seen in precision, with a difference of 0.03, while for Dataset 2, the biggest difference was 0.03 for precision, recall, and F1-score. Additionally, testing the model on 608 × 608-pixel images, similar to the setup used by Arakawa et al. [53], helped to reduce the counting error, producing results closer to the ground truth.

4. Discussion

The best performance in terms of metrics and burr counting on Dataset 1 among the segmentation models was achieved by U-Net. Considering the differences between the segmentation architectures, this result aligns with expectations, given that U-Net’s U-shaped architecture is specifically designed for applications requiring high-precision segmentation. Although LinkNet has a similar architecture and produced results close to those of U-Net, the findings suggest that the concatenation of feature maps, as implemented in U-Net, is more effective for burr segmentation than the summation approach used in LinkNet. The architectural differences are further highlighted by the distinctly different results obtained with PSPNet. As observed in Figure 6 and Figure 7, PSPNet maintained the shape of the segmented regions less effectively than the U-shaped architectures across both datasets. Moreover, U-Net and LinkNet demonstrated a superior ability to differentiate between burrs and diseased leaves compared to PSPNet. However, it is important to note that U-Net and LinkNet are composed of over 30 million parameters, while PSPNet has only 4 million, which partially explains the superior performance of the models with more parameters. For the purposes of this study, where the goal is to generate masks for projecting point cloud data to count burrs in chestnut trees, the preservation of shape is crucial, making U-Net’s results the most suitable.
Another observation is that segmentation models trained only on Dataset 1 tend to merge nearby chestnut burrs into a single cluster of pixels. This behavior may be attributed to the high density of chestnut burrs in Dataset 1. The presence of clusters or closely spaced regions allows the model to merge these areas during training to minimize loss. This is reflected in the precision and recall metrics (Table 1), where recall was higher than precision in the models trained with Dataset 1 and the merged dataset, indicating a higher presence of false positives. The small dataset size exacerbates this issue, and increasing the number of annotated images and using weighted loss functions could mitigate this effect.
Segmentation performance on Dataset 2 was superior to that on Dataset 1. The higher spatial resolution (imagery captured between 12 and 15 m) of Dataset 2 likely contributed to this, as the burrs appeared larger in the resulting images. However, the resolution used in Dataset 1 (30 m) is more suitable for practical UAV flight campaigns in Castanea sativa plantations in European regions such as northeastern Portugal, where trees are typically managed to be taller and larger. Additionally, obstacles such as taller trees (e.g., Pinus pinaster forests) at the surrondings of the plantations can influence the flight height to avoid potential equipment damage.
Merging the datasets considerably reduced the size of segmented regions, which led to a decrease in classification performance (in terms of metrics) on Dataset 1. This reduction is related to the varying appearance of burrs under different lighting conditions. Castanea crenata and Castanea mollissima Blume burrs (Dataset 2) in images acquired at 15 m on cloudy days are green, while Castanea sativa burrs (Dataset 1) in images acquired at 30 m on a sunny day are green in the center but tend to be yellow at the edges. As Dataset 2 contains more than four times the number of images as Dataset 1, the models began to prioritize the central, green regions of Dataset 1 as burrs, leading to a decrease in segmentation quality. However, the more separated regions proved beneficial for counting, and the segmentation of trees suffering from phytosanitary problems also benefited from this phenomenon. This is evident in the third row of Figure 6, where merging the data reduced the number of false positives in trees with phytosanitary problems.
Regarding classification errors, segmentation proved more challenging in certain cases. Green regions with an elliptical shape and tree leaves affected by phytosanitary issues were often misclassified as burrs, particularly in models trained solely on Dataset 1 (Figure 6, rows 3 and 4). Merging the datasets improved model performance, increasing U-Net’s precision from 0.51 to 0.84. False negatives, on the other hand, were observed near shadow edges or in small isolated burrs, with the models trained on merged datasets performing worse in these cases, as reflected by the drop in recall from 0.72 to 0.30. Further exploration of the hyperparameters, such as testing thresholds other than 0.50, and increasing image resolution and sample size could improve segmentation accuracy.
Counting burrs in Dataset 1 was directly improved by merging the datasets, while for Dataset 2, the opposite effect was observed. The reduction and uniformity of segmented regions in Dataset 1 allowed for the use of the mode value in burr counting, resulting in a count closer to the ground truth. On the other hand, for Dataset 2, using only its data for counting led to better results. It is important to note that the number of regions in Dataset 2 was close to the ground truth, but given the high level of occlusion in the clusters, this number should be lower. Although manual counting by an expert was used as the ground truth, an in situ count of 6753 burrs by an expert revealed significant occlusion in the manual count. The hypothesis is that segmentation models can capture small parts of occluded burrs that might be misclassified as leaves during manual counting. An example of such a case can be seen in Figure 8.
The proposed approach faces another challenge with chestnut burr occlusion. Segmented regions representing parts of different chestnut burrs may only be counted as a single burr, due to the count being based on the size of the regions. Since chestnuts frequently grow in clusters of burrs, undercounting may be attributed to this factor. However, the ability to identify small chestnut burr parts is advantageous over detection-based architectures, which may miss occluded burrs entirely. In addition, it is important to note that the resulting segmentation will be projected into a point cloud for yield estimation. By using the projection of multiple images and the 3D volume of segmented chestnut burrs, this problem can be addressed, as demonstrated in other studies [56,58,69,70].
The results obtained by the YOLOv8m model indicate that when trained under similar conditions as the segmentation models, it performed poorly. However, using larger images for counting produced results comparable to those achieved with the proposed approach on a more curated dataset, such as Dataset 2 (Table 5).
Moreover, according to Miranda et al. [71] a disadvantage of semantic segmentation is the lack of a direct method for counting fruits. Other studies have carried out various approaches to count fruits after segmentation. For example, Chen et al. [72] used a second convolutional neural network to count apples and oranges in different regions post-segmentation, achieving an accuracy of 0.97. Compared to the proposed approach, this method requires greater annotation effort, as it involves labeling binary regions for counting. Kestur et al. [73] and Bargoti and Underwood [74] used segmentation contours to count mangoes and apples, achieving counting accuracies of 0.73 and 0.89, respectively. While similar to the proposed method, counting only the segmented regions would result in poor counts in the chestnut tree scenario due to the high number of burr clusters. In UAV images, burrs represent smaller regions of pixels, making segmentation more challenging. Other studies that counted various fruits using object detection methods [30,31,32,34,36], mostly used YOLO variants and Faster R-CNN, reporting performances between 0.72 and 0.97. The approach proposed in this study, which relies on a statistical measure to define the number of pixels comprising a burr, achieved a counting accuracy of 0.71 in Dataset 1, a performance comparable to that of these studies.
In the context of chestnut trees, Arakawa et al. [53] and Comba et al. [54] proposed methods for estimating production and segmenting chestnut burrs using UAV aerial imagery. Arakawa et al. [53] reported an RMSE of 6.30 for the burr count in their dataset. Among the segmentation models in this study, PSPNet demonstrated the best performance on the same dataset, with an RMSE of 50.54 when the number of regions was not used for counting. When considering the number of regions, the RMSE decreased to 2.29. YOLOv8m, trained with similar hyperparameters as the segmentation models (optimizer, learning rate, and image size), achieved an RMSE of 43.24, which differs significantly from the result obtained by YOLOv4 in Arakawa et al. [53]. The image size used in this study was constrained by hardware limitations and was much smaller than that used by Arakawa et al. [53]. By adjusting the image size of the counting dataset to match theirs during inference, YOLOv8m, without retraining, outperformed the previous results, achieving an RMSE of 4.62 when merging the datasets, counting 4038 burrs. A similar experiment on Dataset 1 reduced the RMSE from 50.15 to 18.30, obtaining a count of 1080 burrs. This indicates that YOLOv8m was only able to surpass the proposed approach when larger images were used for counting. The impact of image size on the performance of the segmentation models should be evaluated in the future.
Comba et al. [54] reported a segmentation accuracy of 92.30%; however, their study did not include burr counting. The authors used multispectral images acquired at a flight height of 20 m, with the network trained on images of 16 2 pixels and 9 bands, resulting in burrs larger than those in Dataset 1 but smaller than those in Dataset 2. Although their performance is superior to that presented in this study, the data used are different, as UAV multispectral data can acquire information in the near infrared and visible parts of the electromagnetic spectrum [75]. Nevertheless, this study obtained better results than those from Adão et al. [52], which used ground-based photographs and achieved an IoU of 0.54.
Given the importance of data quality in deep learning-based approaches, it is important to highlight the differences between the datasets used by Arakawa et al. [53], Comba et al. [54], and Dataset 1 from this study. Table 6 outlines the characteristics of each dataset. Arakawa et al. [53] (Dataset 2 in this study) has more samples, larger image sizes, and higher resolution. In contrast, Comba et al. [54] used multispectral images rather than RGB images, but still with higher resolution. Additionally, only Dataset 1 contains samples of chestnut trees with phytosanitary problems. As a result, Dataset 1 is better suited for training models with greater generalization capacity and is more challenging to classify compared to the datasets used in similar studies. The results of cross-testing on different test sets (Table 2 and Table 5) support this statement.

5. Conclusions

This study demonstrates the potential of DL architectures for segmenting chestnut burrs from UAV-acquired aerial images, serving as a preliminary step toward productivity estimation in chestnut groves. Among the evaluated models, U-Net proved to be the most effective, achieving an F1-score of 0.56 and a counting accuracy of 0.71. The results indicate that U-Net effectively preserves the shape of burr regions and differentiates between burrs and leaves from both healthy and diseased trees. Despite the inherent challenge of burr clustering, which often leads to the unification of regions during segmentation, this study shows that DL models can successfully generate masks that are crucial for accurate yield estimation through subsequent 3D point cloud segmentation.
Despite the promising results, some limitations were identified, including reduced segmentation quality when merging images from different datasets and the models’ tendency to unify closely spaced burrs into single regions. This suggests that future research should focus on increasing dataset size and diversity, enhancing model generalization, and exploring advanced techniques to better manage burr clustering.
Regarding counting accuracy, the results suggest that object detection architectures, such as YOLOv8m, are more suitable for counting burrs in UAV images of chestnut trees when larger images are used. However, for the proposed methodology, using pixel-based segmentation rather than bounding boxes is essential for accurately obtaining chestnut burr regions in point clouds.
The broader implications of this study have the potential to improve PA practices, particularly in chestnut groves. By demonstrating the feasibility of using UAV-based remote sensing data in combination with DL models, this study can serve as the groundwork for developing robust decision support tools to enhance crop management and yield estimation in chestnut trees. Additionally, the full implementation of this methodology will enable more accurate productivity estimation for entire groves, providing farmers with better-informed decisions to optimize their harvest management. Future research should also refine the segmentation process by leveraging the segmentation capabilities of DL models for segmenting chestnut trees, and, if necessary, integrate additional data types such as multispectral imagery. Validating this approach across different environmental conditions should be carried to achieve more accurate and reliable productivity assessments. Continuous UAV data acquisition could enable the development of a comprehensive database for each chestnut tree within the plantation, encompassing various parameters beyond yield. With regular updates from UAV flight surveys, this database could also support temporal assessments, allowing for the ongoing monitoring and analysis of tree health and productivity over time.

Author Contributions

Conceptualization, G.A.C., J.J.S., G.A.C., and L.P.; methodology, G.A.C., J.S., J.J.S., A.C. and L.P.; software, G.A.C. and J.S.; validation, G.A.C. and J.S.; formal analysis, G.A.C.; investigation, J.J.S. and L.P.; resources, J.J.S.; data curation, G.A.C. and J.S.; writing—original draft preparation, G.A.C., J.S. and L.P.; writing—review and editing, J.J.S., A.C. and L.P.; visualization, G.A.C. and J.S.; supervision, A.C. and J.J.S.; project administration, J.J.S., A.C. and L.P.; funding acquisition, J.J.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be made available upon reasonable request.

Acknowledgments

The authors would like to acknowledge the Portuguese Foundation for Science and Technology (FCT) for financial support through national funds to projects UIDB/04033/2020 (https://doi.org/10.54499/UIDB/04033/2020), LA/P/0126/2020 (https://doi.org/10.54499/LA/P/0126/2020), and UIDB/00319/2020 (https://doi.org/10.54499/UIDB/00319/2020); and through Gabriel Carneiro’s doctoral scholarship (PRT/BD/154883/2023).

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Zaffaroni, M.; Bevacqua, D. Maximize crop production and environmental sustainability: Insights from an ecophysiological model of plant-pest interactions and multi-criteria decision analysis. Eur. J. Agron. 2022, 139, 126571. [Google Scholar] [CrossRef]
  2. Zhu, Q. Integrated Environment Monitoring and Data Management in Agriculture. In Encyclopedia of Smart Agriculture Technologies; Zhang, Q., Ed.; Springer International Publishing: Cham, Switzerland, 2022; pp. 1–12. [Google Scholar] [CrossRef]
  3. Zhang, J.; Nie, J.; Cao, W.; Gao, Y.; Lu, Y.; Liao, Y. Long-term green manuring to substitute partial chemical fertilizer simultaneously improving crop productivity and soil quality in a double-rice cropping system. Eur. J. Agron. 2023, 142, 126641. [Google Scholar] [CrossRef]
  4. Nowak, B. Precision agriculture: Where do we stand? A review of the adoption of precision agriculture technologies on field crops farms in developed countries. Agric. Res. 2021, 10, 515–522. [Google Scholar] [CrossRef]
  5. Yin, H.; Cao, Y.; Marelli, B.; Zeng, X.; Mason, A.J.; Cao, C. Soil sensors and plant wearables for smart and precision agriculture. Adv. Mater. 2021, 33, 2007764. [Google Scholar] [CrossRef]
  6. Cimtay, Y.; Özbay, B.; Yilmaz, G.; Bozdemir, E. A new vegetation index in short-wave infrared region of electromagnetic spectrum. IEEE Access 2021, 9, 148535–148545. [Google Scholar] [CrossRef]
  7. Khanna, A.; Kaur, S. Evolution of Internet of Things (IoT) and its significant impact in the field of Precision Agriculture. Comput. Electron. Agric. 2019, 157, 218–231. [Google Scholar] [CrossRef]
  8. Kahvecı, M. Contribution of GNSS in precision agriculture. In Proceedings of the 2017 8th International Conference on Recent Advances in Space Technologies (RAST), Istanbul, Turkey, 19–22 June 2017; pp. 513–516. [Google Scholar]
  9. Sishodia, R.P.; Ray, R.L.; Singh, S.K. Applications of remote sensing in precision agriculture: A review. Remote Sens. 2020, 12, 3136. [Google Scholar] [CrossRef]
  10. Zhang, P.; Guo, Z.; Ullah, S.; Melagraki, G.; Afantitis, A.; Lynch, I. Nanotechnology and artificial intelligence to enable sustainable and precision agriculture. Nat. Plants 2021, 7, 864–876. [Google Scholar] [CrossRef]
  11. Pádua, L.; Adão, T.; Hruška, J.; Sousa, J.J.; Peres, E.; Morais, R.; Sousa, A. Very high resolution aerial data to support multi-temporal precision agriculture information management. Procedia Comput. Sci. 2017, 121, 407–414. [Google Scholar] [CrossRef]
  12. Zhang, C.; Kovacs, J.M. The application of small unmanned aerial systems for precision agriculture: A review. Precis. Agric. 2012, 13, 693–712. [Google Scholar] [CrossRef]
  13. Bendig, J.; Bolten, A.; Bareth, G. UAV-based imaging for multi-temporal, very high resolution crop surface models to monitor crop growth variability. In Unmanned Aerial Vehicles (UAVs) for Multi-Temporal Crop Surface Modelling; Schweizerbart Science Publishers: Stuttgart, Germany, 2013; p. 44. [Google Scholar] [CrossRef]
  14. Radoglou-Grammatikis, P.; Sarigiannidis, P.; Lagkas, T.; Moscholios, I. A compilation of UAV applications for precision agriculture. Comput. Netw. 2020, 172, 107148. [Google Scholar] [CrossRef]
  15. Martins, L.M.; Castro, J.P.; Bento, R.; Sousa, J.J. Monitorização da condição fitossanitária do castanheiro por fotografia aérea obtida com aeronave não tripulada. Rev. Ciências Agrárias 2015, 38, 184–190. [Google Scholar]
  16. Vannini, A.; Vettraino, A.; Fabi, A.; Montaghi, A.; Valentini, R.; Belli, C. Monitoring ink disease of chestnut with the airborne multispectral system ASPIS. In Proceedings of the III International Chestnut Congress 693, Chaves, Portugal, 20–23 October 2004; pp. 529–534. [Google Scholar]
  17. Martins, L.; Castro, J.; Macedo, W.; Marques, C.; Abreu, C. Assessment of the spread of chestnut ink disease using remote sensing and geostatistical methods. Eur. J. Plant Pathol. 2007, 119, 159–164. [Google Scholar] [CrossRef]
  18. Castro, J.; Azevedo, J.; Martins, L. Temporal analysis of sweet chestnut decline in northeastern Portugal using geostatistical tools. In Proceedings of the I European Congress on Chestnut-Castanea 2009 866, Cuneo-Torino, Italy, 13–16 October 2009; pp. 405–410. [Google Scholar]
  19. Martins, L.; Castro, J.P.; Macedo, F.; Marques, C.; Abreu, C.G. Índices espectrais em fotografia aérea de infravermelho próximo na monitorização da doença tinta do castanheiro. In Proceedings of the 5º Congresso Florestal Nacional, SPCF-Sociedade Portuguesa de Ciências Florestais, Instituto Politécnico de Viseu, Viseu, Portugal, 16 May 2005. [Google Scholar]
  20. Montagnoli, A.; Fusco, S.; Terzaghi, M.; Kirschbaum, A.; Pflugmacher, D.; Cohen, W.B.; Scippa, G.S.; Chiatante, D. Estimating forest aboveground biomass by low density lidar data in mixed broad-leaved forests in the Italian Pre-Alps. For. Ecosyst. 2015, 2, 1–9. [Google Scholar] [CrossRef]
  21. Prada, M.; Cabo, C.; Hernández-Clemente, R.; Hornero, A.; Majada, J.; Martínez-Alonso, C. Assessing canopy responses to thinnings for sweet chestnut coppice with time-series vegetation indices derived from landsat-8 and sentinel-2 imagery. Remote Sens. 2020, 12, 3068. [Google Scholar] [CrossRef]
  22. Marchetti, F.; Waske, B.; Arbelo, M.; Moreno-Ruíz, J.A.; Alonso-Benito, A. Mapping Chestnut stands using bi-temporal VHR data. Remote Sens. 2019, 11, 2560. [Google Scholar] [CrossRef]
  23. Martins, L.; Castro, J.; Bento, R.; Sousa, J. Chestnut health monitoring by aerial photographs obtained by unnamed aerial vehicle. Rev. Ciências Agrárias 2015, 38, 184–190. [Google Scholar]
  24. Pádua, L.; Hruška, J.; Bessa, J.; Adão, T.; Martins, L.M.; Gonçalves, J.A.; Peres, E.; Sousa, A.M.; Castro, J.P.; Sousa, J.J. Multi-temporal analysis of forestry and coastal environments using UASs. Remote Sens. 2017, 10, 24. [Google Scholar] [CrossRef]
  25. Marques, P.; Pádua, L.; Adão, T.; Hruška, J.; Peres, E.; Sousa, A.; Sousa, J.J. UAV-based automatic detection and monitoring of chestnut trees. Remote Sens. 2019, 11, 855. [Google Scholar] [CrossRef]
  26. Di Gennaro, S.F.; Nati, C.; Dainelli, R.; Pastonchi, L.; Berton, A.; Toscano, P.; Matese, A. An automatic UAV based segmentation approach for pruning biomass estimation in irregularly spaced chestnut orchards. Forests 2020, 11, 308. [Google Scholar] [CrossRef]
  27. Pádua, L.; Marques, P.; Martins, L.; Sousa, A.; Peres, E.; Sousa, J.J. Monitoring of chestnut trees using machine learning techniques applied to UAV-based multispectral data. Remote Sens. 2020, 12, 3032. [Google Scholar] [CrossRef]
  28. Albahar, M. A survey on deep learning and its impact on agriculture: Challenges and opportunities. Agriculture 2023, 13, 540. [Google Scholar] [CrossRef]
  29. Rahnemoonfar, M.; Sheppard, C. Deep Count: Fruit Counting Based on Deep Simulated Learning. Sensors 2017, 17, 905. [Google Scholar] [CrossRef]
  30. Afonso, M.; Fonteijn, H.; Fiorentin, F.S.; Lensink, D.; Mooij, M.; Faber, N.; Polder, G.; Wehrens, R. Tomato Fruit Detection and Counting in Greenhouses Using Deep Learning. Front. Plant Sci. 2020, 11, 571299. [Google Scholar] [CrossRef]
  31. Gao, F.; Fang, W.; Sun, X.; Wu, Z.; Zhao, G.; Li, G.; Li, R.; Fu, L.; Zhang, Q. A novel apple fruit detection and counting methodology based on deep learning and trunk tracking in modern orchard. Comput. Electron. Agric. 2022, 197, 107000. [Google Scholar] [CrossRef]
  32. Häni, N.; Roy, P.; Isler, V. A Comparative Study of Fruit Detection and Counting Methods for Yield Mapping in Apple Orchards. J. Field Robot. 2020, 37, 263–282. [Google Scholar] [CrossRef]
  33. Apolo-Apolo, O.E.; Pérez-Ruiz, M.; Martínez-Guanter, J.; Valente, J. A Cloud-Based Environment for Generating Yield Estimation Maps From Apple Orchards Using UAV Imagery and a Deep Learning Technique. Front. Plant Sci. 2020, 11, 1086. [Google Scholar] [CrossRef]
  34. Koirala, A.; Walsh, K.B.; Wang, Z.; McCarthy, C. Deep learning for real-time fruit detection and orchard fruit load estimation: Benchmarking of ‘MangoYOLO’. Precis. Agric. 2019, 20, 1107–1135. [Google Scholar] [CrossRef]
  35. Xiong, J.; Liu, Z.; Chen, S.; Liu, B.; Zheng, Z.; Zhong, Z.; Yang, Z.; Peng, H. Visual detection of green mangoes by an unmanned aerial vehicle in orchards based on a deep learning method. Biosyst. Eng. 2020, 194, 261–272. [Google Scholar] [CrossRef]
  36. Apolo-Apolo, O.E.; Martínez-Guanter, J.; Egea, G.; Raja, P.; Pérez-Ruiz, M. Deep learning techniques for estimation of the yield and size of citrus fruits using a UAV. Eur. J. Agron. 2020, 115, 126030. [Google Scholar] [CrossRef]
  37. Xiong, Z.; Wang, L.; Zhao, Y.; Lan, Y. Precision Detection of Dense Litchi Fruit in UAV Images Based on Improved YOLOv5 Model. Remote Sens. 2023, 15, 4017. [Google Scholar] [CrossRef]
  38. Neupane, B.; Horanont, T.; Hung, N.D. Deep learning based banana plant detection and counting using high-resolution red-green-blue (RGB) images collected from unmanned aerial vehicle (UAV). PLoS ONE 2019, 14, e0223906. [Google Scholar] [CrossRef] [PubMed]
  39. Kalantar, A.; Edan, Y.; Gur, A.; Klapp, I. A deep learning system for single and overall weight estimation of melons using unmanned aerial vehicle images. Comput. Electron. Agric. 2020, 178, 105748. [Google Scholar] [CrossRef]
  40. Mengoli, D.; Bortolotti, G.; Piani, M.; Manfrini, L. On-line real-time fruit size estimation using a depth-camera sensor. In Proceedings of the 2022 IEEE Workshop on Metrology for Agriculture and Forestry (MetroAgriFor), Perugia, Italy, 3–5 November 2022; pp. 86–90. [Google Scholar]
  41. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 42, 386–397. [Google Scholar] [CrossRef]
  42. Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. arXiv 2017, arXiv:1611.05431. [Google Scholar] [CrossRef]
  43. Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
  44. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar] [CrossRef]
  45. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 770–778, ISBN 9781467388504. [Google Scholar] [CrossRef]
  46. Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. arXiv 2016, arXiv:1612.08242. [Google Scholar] [CrossRef]
  47. Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
  48. Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. arXiv 2018, arXiv:1708.02002. [Google Scholar] [CrossRef]
  49. Sun, Y.; Hao, Z.; Guo, Z.; Liu, Z.; Huang, J. Detection and Mapping of Chestnut Using Deep Learning from High-Resolution UAV-Based RGB Imagery. Remote Sens. 2023, 15, 4923. [Google Scholar] [CrossRef]
  50. Zhong, Q.; Zhang, H.; Tang, S.; Li, P.; Lin, C.; Zhang, L.; Zhong, N. Feasibility Study of Combining Hyperspectral Imaging with Deep Learning for Chestnut-Quality Detection. Foods 2023, 12, 2089. [Google Scholar] [CrossRef] [PubMed]
  51. Li, X.; Jiang, H.; Jiang, X.; Shi, M. Identification of Geographical Origin of Chinese Chestnuts Using Hyperspectral Imaging with 1D-CNN Algorithm. Agriculture 2021, 11, 1274. [Google Scholar] [CrossRef]
  52. Adão, T.; Pádua, L.; Pinho, T.M.; Hruška, J.; Sousa, A.; Sousa, J.J.; Morais, R.; Peres, E. Multi-Purpose Chestnut Clusters Detection Using Deep Learning: A Preliminary Approach. ISPRS Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, 4238, 1–7. [Google Scholar] [CrossRef]
  53. Arakawa, T.; Tanaka, T.S.T.; Kamio, S. Detection of on-tree chestnut fruits using deep learning and RGB unmanned aerial vehicle imagery for estimation of yield and fruit load. Agron. J. 2024, 116, 973–981. [Google Scholar] [CrossRef]
  54. Comba, L.; Biglia, A.; Sopegno, A.; Grella, M.; Dicembrini, E.; Ricauda Aimonino, D.; Gay, P. Convolutional Neural Network Based Detection of Chestnut Burrs in UAV Aerial Imagery. In AIIA 2022: Biosystems Engineering Towards the Green Deal; Ferro, V., Giordano, G., Orlando, S., Vallone, M., Cascone, G., Porto, S.M.C., Eds.; Lecture Notes in Civil Engineering; Springer: Cham, Switzerland, 2023; pp. 501–508. [Google Scholar] [CrossRef]
  55. Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. [Google Scholar] [CrossRef]
  56. Torres-Sánchez, J.; Mesas-Carrascosa, F.J.; Santesteban, L.G.; Jiménez-Brenes, F.M.; Oneka, O.; Villa-Llop, A.; Loidi, M.; López-Granados, F. Grape Cluster Detection Using UAV Photogrammetric Point Clouds as a Low-Cost Tool for Yield Forecasting in Vineyards. Sensors 2021, 21, 3083. [Google Scholar] [CrossRef] [PubMed]
  57. Wu, G.; Li, B.; Zhu, Q.; Huang, M.; Guo, Y. Using color and 3D geometry features to segment fruit point cloud and improve fruit recognition accuracy. Comput. Electron. Agric. 2020, 174, 105475. [Google Scholar] [CrossRef]
  58. Jurado-Rodríguez, D.; Jurado, J.M.; Pádua, L.; Neto, A.; Muñoz-Salinas, R.; Sousa, J.J. Semantic segmentation of 3D car parts using UAV-based images. Comput. Graph. 2022, 107, 93–103. [Google Scholar] [CrossRef]
  59. Weatherall, I.L.; Coombs, B.D. Skin Color Measurements in Terms of CIELAB Color Space Values. J. Investig. Dermatol. 1992, 99, 468–473. [Google Scholar] [CrossRef]
  60. Chaurasia, A.; Culurciello, E. LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation. In Proceedings of the 2017 IEEE Visual Communications and Image Processing (VCIP), St. Petersburg, FL, USA, 10–13 December 2017; pp. 1–4. [Google Scholar] [CrossRef]
  61. Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. arXiv 2017, arXiv:1612.01105. [Google Scholar] [CrossRef]
  62. Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLO. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 8 January 2024).
  63. Bardis, M.; Houshyar, R.; Chantaduly, C.; Ushinsky, A.; Glavis-Bloom, J.; Shaver, M.; Chow, D.; Uchio, E.; Chang, P. Deep Learning with Limited Data: Organ Segmentation Performance by U-Net. Electronics 2020, 9, 1199. [Google Scholar] [CrossRef]
  64. Iakubovskii, P. Segmentation Models. 2024. Available online: https://github.com/qubvel/segmentation_models (accessed on 5 January 2024).
  65. Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv 2015, arXiv:1502.03167. [Google Scholar] [CrossRef]
  66. Chollet, F. Deep Learning with Python; Manning Publications Company: Shelter Island, NY, USA, 2017. [Google Scholar]
  67. Buslaev, A.; Parinov, A.; Khvedchenya, E.; Iglovikov, V.I.; Kalinin, A.A. Albumentations: Fast and flexible image augmentations. Information 2020, 11, 125. [Google Scholar] [CrossRef]
  68. Lin, T.; Maire, M.; Belongie, S.J.; Bourdev, L.D.; Girshick, R.B.; Hays, J.; Perona, P.; Ramanan, D.; Doll’a r, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. arXiv 2014, arXiv:1405.0312. [Google Scholar] [CrossRef]
  69. Yu, T.; Hu, C.; Xie, Y.; Liu, J.; Li, P. Mature pomegranate fruit detection and location combining improved F-PointNet with 3D point cloud clustering in orchard. Comput. Electron. Agric. 2022, 200, 107233. [Google Scholar] [CrossRef]
  70. Gené-Mola, J.; Sanz-Cortiella, R.; Rosell-Polo, J.R.; Morros, J.R.; Ruiz-Hidalgo, J.; Vilaplana, V.; Gregorio, E. Fruit detection and 3D location using instance segmentation neural networks and structure-from-motion photogrammetry. Comput. Electron. Agric. 2020, 169, 105165. [Google Scholar] [CrossRef]
  71. Miranda, J.C.; Gené-Mola, J.; Zude-Sasse, M.; Tsoulias, N.; Escolà, A.; Arnó, J.; Rosell-Polo, J.R.; Sanz-Cortiella, R.; Martínez-Casasnovas, J.A.; Gregorio, E. Fruit sizing using AI: A review of methods and challenges. Postharvest Biol. Technol. 2023, 206, 112587. [Google Scholar] [CrossRef]
  72. Chen, S.W.; Shivakumar, S.S.; Dcunha, S.; Das, J.; Okon, E.; Qu, C.; Taylor, C.J.; Kumar, V. Counting Apples and Oranges With Deep Learning: A Data-Driven Approach. IEEE Robot. Autom. Lett. 2017, 2, 781–788. [Google Scholar] [CrossRef]
  73. Kestur, R.; Meduri, A.; Narasipura, O. MangoNet: A deep semantic segmentation architecture for a method to detect and count mangoes in an open orchard. Eng. Appl. Artif. Intell. 2019, 77, 59–69. [Google Scholar] [CrossRef]
  74. Bargoti, S.; Underwood, J.P. Image Segmentation for Fruit Detection and Yield Estimation in Apple Orchards. J. Field Robot. 2017, 34, 1039–1060. [Google Scholar] [CrossRef]
  75. Olson, D.; Anderson, J. Review on unmanned aerial vehicles, remote sensors, imagery processing, and their applications in agriculture. Agron. J. 2021, 113, 971–992. [Google Scholar] [CrossRef]
Figure 1. Methodological pipeline for chestnut yield estimation from UAV imagery, including data acquisition and processing, imagery segmentation, and point cloud processing.
Figure 1. Methodological pipeline for chestnut yield estimation from UAV imagery, including data acquisition and processing, imagery segmentation, and point cloud processing.
Drones 08 00541 g001
Figure 2. Example of a UAV image on the chestnut grove, and the image split into 48 patches.
Figure 2. Example of a UAV image on the chestnut grove, and the image split into 48 patches.
Drones 08 00541 g002
Figure 3. Examples of masks obtained using the threshold approach. Each row represents a sample: the original image (a), the resulting mask (b), and the overlapping visualization (c). Threshold method applied to chestnut trees with phytosanitary issues (first row), and to healthy chestnut trees (second row).
Figure 3. Examples of masks obtained using the threshold approach. Each row represents a sample: the original image (a), the resulting mask (b), and the overlapping visualization (c). Threshold method applied to chestnut trees with phytosanitary issues (first row), and to healthy chestnut trees (second row).
Drones 08 00541 g003
Figure 4. Examples of the transformation applied to the proposed dataset to make it suitable for training object detection models. Red bounding boxes represent areas of chestnut burrs.
Figure 4. Examples of the transformation applied to the proposed dataset to make it suitable for training object detection models. Red bounding boxes represent areas of chestnut burrs.
Drones 08 00541 g004
Figure 5. General overview of the architectures of the selected segmentation models (LinkNet, U-Net, and PSPNet).
Figure 5. General overview of the architectures of the selected segmentation models (LinkNet, U-Net, and PSPNet).
Drones 08 00541 g005
Figure 6. Segmentation examples on Dataset 1 for each segmentation model trained on Dataset 1 and by merging both datasets.
Figure 6. Segmentation examples on Dataset 1 for each segmentation model trained on Dataset 1 and by merging both datasets.
Drones 08 00541 g006
Figure 7. Segmentation examples on Dataset 2 for each segmentation model trained on Dataset 2 and by merging both datasets.
Figure 7. Segmentation examples on Dataset 2 for each segmentation model trained on Dataset 2 and by merging both datasets.
Drones 08 00541 g007
Figure 8. Example of occluded chestnut burr (highlighted in the red box) that was not annotated in Dataset 2 and the segmentation results in the different models.
Figure 8. Example of occluded chestnut burr (highlighted in the red box) that was not annotated in Dataset 2 and the segmentation results in the different models.
Drones 08 00541 g008
Table 1. Overall results for the trained models using different datasets. The best results by dataset are highlighted in bold. In cases of similar values, both values are also underlined. F1: F1-score; IoU: Intersection over Union.
Table 1. Overall results for the trained models using different datasets. The best results by dataset are highlighted in bold. In cases of similar values, both values are also underlined. F1: F1-score; IoU: Intersection over Union.
DatasetU-NetLinkNetPSPNet
F1IoUPrecisionRecallF1IoUPrecisionRecallF1IoUPrecisionRecall
Dataset 10.560.390.510.720.530.360.500.690.510.340.500.54
Dataset 20.760.620.780.790.760.620.760.810.710.550.710.76
Merged0.670.520.760.490.670.520.730.540.650.480.570.66
Table 2. Overall results for the trained models switching the test sets. F1: F1-score; IoU: Intersection-Over-the-Union.
Table 2. Overall results for the trained models switching the test sets. F1: F1-score; IoU: Intersection-Over-the-Union.
TrainingTestU-NetLinkNetPSPNet
F1IoUPrecisionRecallF1IoUPrecisionRecallF1IoUPrecisionRecall
Dataset 1Dataset 20.420.270.290.940.40.250.270.950.350.220.230.95
Dataset 2Dataset 10.070.040.940.040.080.040.880.040.020.010.850.01
MergedDataset 10.430.280.840.300.490.330.790.380.540.380.530.59
MergedDataset 20.750.600.720.830.730.570.680.810.670.510.630.80
Table 3. Mean, mode, and median values for the number of pixels composing the regions in each model, the chestnut burr count using these respective values, the total number of regions, and the standard deviation (std) of region size for each model trained on separate datasets. All values are expressed in pixels. The count was obtained by dividing the number of pixels in the datasets by each statistical measure. The manual chestnut burr count for Dataset 1 was 1322, and for Dataset 2 it was 4234. Best result highlighted in bold.
Table 3. Mean, mode, and median values for the number of pixels composing the regions in each model, the chestnut burr count using these respective values, the total number of regions, and the standard deviation (std) of region size for each model trained on separate datasets. All values are expressed in pixels. The count was obtained by dividing the number of pixels in the datasets by each statistical measure. The manual chestnut burr count for Dataset 1 was 1322, and for Dataset 2 it was 4234. Best result highlighted in bold.
ModelsDataset 1
MeanModeMedianRegionsRegions Size Std
ValueCountValueCountValueCountValueValue
U-Net174426514,8241226086142179
LinkNet171369512,6951125635915200
PSPNet159312599041094554454192
ModelsDataset 2
MeanModeMedianRegionsRegions Size Std
ValueCountValueCountValueCountValueValue
U-Net1209308413281031083433184
LinkNet1229658014711011165439992
PSPNet12110215920941021210394592
Table 4. Mean, mode, and median values for the number of pixels composing the regions in each model, the chestnut burr count using these respective values, the total number of regions, and the standard deviation (std) of region size for each model trained on merged datasets. All values are expressed in pixels. The count was obtained by dividing the number of pixels in the datasets by each statistical measure. The manual chestnut burr count for Dataset 1 was 1322, and for Dataset 2 it was 4234. Best result highlighted in bold.
Table 4. Mean, mode, and median values for the number of pixels composing the regions in each model, the chestnut burr count using these respective values, the total number of regions, and the standard deviation (std) of region size for each model trained on merged datasets. All values are expressed in pixels. The count was obtained by dividing the number of pixels in the datasets by each statistical measure. The manual chestnut burr count for Dataset 1 was 1322, and for Dataset 2 it was 4234. Best result highlighted in bold.
ModelsDataset 1
MeanModeMedianRegionsRegions Size Std
ValueCountValueCountValueCountValueValue
U-Net522181294546245584738
LinkNet563145351644400783952
PSPNet14632659497984854862197
ModelsDataset 2
MeanModeMedianRegionsRegions Size Std
ValueCountValueCountValueCountValueValue
U-Net1329308214961111106434397
LinkNet1291113817,93910813284661101
PSPNet1391095819,02211513244007114
Table 5. Overall results from YOLOv8m. The counting was performed using two configurations in image sizes of 320 × 320 and 608 × 608 pixels. F1: F1-score; IoU: Intersection over Union; mAP: Mean Average Precision.
Table 5. Overall results from YOLOv8m. The counting was performed using two configurations in image sizes of 320 × 320 and 608 × 608 pixels. F1: F1-score; IoU: Intersection over Union; mAP: Mean Average Precision.
TrainingTestingF1PrecisionRecallmAP at 50% of IoUCounting ( 320 × 320 )Counting ( 608 × 608 )
Dataset 1Dataset 10.520.530.520.47379729
Dataset 1Dataset 20.340.490.260.24729943
Dataset 2Dataset 10.150.120.200.1067244
Dataset 2Dataset 20.780.810.760.8124033802
MergedDataset 10.510.500.520.456591080
MergedDataset 20.770.800.750.8024564038
Table 6. Differences between the datasets used in similar studies.
Table 6. Differences between the datasets used in similar studies.
CharacteristicsDataset 1Arakawa et al. [53]Comba et al. [54]
Samples144 (training)
21 (counting)
500 (training)
53 (counting)
44
Image Size320/336418 (training)
608 (counting)
16
Phytosanitary problemsYesNoNo
Flight height30 m12–15 m 20 m
Weather ConditionsSunnyCloudySunny
Image TypeRGBRGBMultispectral
CultivarCastanea sativaCastanea cretana
Castanea molissima
Bouch de Bétizac (hybrid between Castanea sativa and Castanea crenata)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Carneiro, G.A.; Santos, J.; Sousa, J.J.; Cunha, A.; Pádua, L. Chestnut Burr Segmentation for Yield Estimation Using UAV-Based Imagery and Deep Learning. Drones 2024, 8, 541. https://doi.org/10.3390/drones8100541

AMA Style

Carneiro GA, Santos J, Sousa JJ, Cunha A, Pádua L. Chestnut Burr Segmentation for Yield Estimation Using UAV-Based Imagery and Deep Learning. Drones. 2024; 8(10):541. https://doi.org/10.3390/drones8100541

Chicago/Turabian Style

Carneiro, Gabriel A., Joaquim Santos, Joaquim J. Sousa, António Cunha, and Luís Pádua. 2024. "Chestnut Burr Segmentation for Yield Estimation Using UAV-Based Imagery and Deep Learning" Drones 8, no. 10: 541. https://doi.org/10.3390/drones8100541

APA Style

Carneiro, G. A., Santos, J., Sousa, J. J., Cunha, A., & Pádua, L. (2024). Chestnut Burr Segmentation for Yield Estimation Using UAV-Based Imagery and Deep Learning. Drones, 8(10), 541. https://doi.org/10.3390/drones8100541

Article Metrics

Back to TopTop