Next Article in Journal
Impact of Phytoplankton Biomass on the Growth and Development of Agricultural Plants
Next Article in Special Issue
A Lightweight and High-Performance YOLOv5-Based Model for Tea Shoot Detection in Field Conditions
Previous Article in Journal
Design of a Chili Pepper Harvesting Device for Hilly Chili Fields
Previous Article in Special Issue
Image Segmentation-Based Oilseed Rape Row Detection for Infield Navigation of Agri-Robot
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Yield Estimation in Banana Orchards Based on DeepSORT and RGB-Depth Images

1
Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen 518132, China
2
College of Engineering, South China Agricultural University, Guangzhou 510642, China
3
School of Mechanical Engineering, Guangdong Ocean University, Zhanjiang 524000, China
4
School of Electronics and Information Engineering, Wuyi University, Jiangmen 529020, China
*
Authors to whom correspondence should be addressed.
Agronomy 2025, 15(5), 1119; https://doi.org/10.3390/agronomy15051119
Submission received: 27 March 2025 / Revised: 21 April 2025 / Accepted: 30 April 2025 / Published: 30 April 2025
(This article belongs to the Collection Advances of Agricultural Robotics in Sustainable Agriculture 4.0)

Abstract

:
Orchard yield estimation is one of the key indicators of precision agriculture. The traditional random sampling yield estimation method has strict requirements for the laborer experience and scale of orchards. Intelligent orchard management enables growers to use resources more effectively and make wiser decisions to optimize orchard inputs. This study proposes a banana bunch counting and yield estimation method based on the DeepSORT tracking algorithm. This method involves obtaining RGB-D images and calculating the weight of an individual bunch of bananas, which was promoted in our previous work. Building on this, the DeepSORT was used to solve the repeated counting based on the Hungarian algorithm and Kalman filtering. Three constraints were set to improve the statistical accuracy, and a yield estimation system was designed for orchard management monitoring. This system provides managers with bunch weight predictions and statistical plant information to achieve real-time yield estimations for banana orchards. The experimental results showed that the accuracy of the yield estimations reached 97.25% and that banana bunch counting had a success rate of 96.82%. This demonstrates that the effective integration of RGB-D technology and the DeepSORT algorithm can be successfully applied to the intelligent management and harvesting of banana orchards.

1. Introduction

The yield is key information for crop harvesting and market planning [1]. However, the traditional management of crops remains subjective. Currently, orchard yield estimations are usually based on historical yield data and manual experience [2]. Before harvest, the growth status and weight of the sampled trees are visually assessed, and the weight of the fruit on sample fruit trees is measured by manually counting the fruit and multiplying it by the average weight, which is then extrapolated to the entire orchard to estimate the yield. This simple random sampling (SRS) method [3] is not only solely laborious and lengthy, but it also significantly depends on the precision of the workforce involved. Moreover, it cannot promptly grasp the degree of variation in the fruits. Furthermore, such an inaccurate yield estimation is mostly practiced in commercial orchards but not in small and diversified orchards. Therefore, automated yield mapping has become an important goal in precision agriculture.
The widespread application of visual sensing in the agricultural field provides a more accurate and efficient monitoring method for precision agriculture [4]. In intelligent orchard management, the system needs to estimate the orchard yield in a timely and accurate manner to provide information for planting management, labor allocation, harvesting, storage, and sales [5]. It provides a method for researchers to study the correlation between orchard parameters and fruits.
In research related to yield estimation, yield is usually approximated by counting the number or calculating the weight of fruits. We first classify counting-based yield estimation into two categories: image-based and video-based. We subsequently explain weight-based yield estimation.
Most of the research on fruit and vegetable counting and yield estimation has been conducted using the image-based method without considering the problem of repeated counting between images. This method uses machine learning or deep learning algorithms to perform object detection on images taken by cameras on the ground or in the air and calculates the number of fruits, plants, or plant areas in each image. Classic machine learning algorithms mostly analyze fruits based on color [6] or shape [7] features and then use classifiers such as the k-means algorithm [8] to detect and count the fruits to measure the yield. Taking aerial images from above the orchard is an effective way to estimate the yield by plant number [9] or canopy area [10]. In orchards where unmanned aerial vehicles can fly between plants, such as longan orchards, yield estimation was achieved by counting fruit clusters and individual fruits [11]. Other research has involved setting up cameras on the ground in orchards and using deep learning algorithms to detect, locate, and count fruits to estimate the yield. Mathew et al. [12] used YOLOv7 and RGB-Depth (RGB-D) images to improve the counting accuracy of soybean pods. Wu et al. [13] detected tea shoots by improving the YOLOv7 and RGB-D data. Yang et al. [14] detected and located the pumpkin fruit by using a YOLO-series model and RGB-D images. Palacios et al. [15] used SegNet architecture to detect the green grapes based on an RGB camera and support vector regression models to predict quantity and yield, with a quantity estimation R2 of 0.79. For banana orchards, banana leaves are very large, making them suitable for collecting fruit images from the ground for analysis. In our previous work, we analyzed RGB images [16,17] and RGB-D images [18] based on a deep-learning model in banana orchards to achieve detection and localization tasks. Wu et al. [19] adopted Deeplab V3+ to count the number of banana hands or banana fingers in the context of a banana bunch based on RGB images. This method can estimate the weight of an individual banana bunch through empirical mapping. To date, there have been no studies on counting-based yield estimation using static images of banana orchards.
Video-based counting enables growers to better implement management and harvesting decisions. After obtaining the detection results of each image frame, problems with object tracking and repeated localization lead to over-counting due to random occlusion or uneven lighting. Therefore, the problem of video-based counting remains a major challenge for orchard yield estimation in unstructured environments. Video-based counting and yield estimation focus on solving the problem of tracking fruits across image frames. Many methods employ a blend of techniques such as optical flow, structure from motion, Kalman filters, and the Hungarian method to facilitate fruit tracking across different frames. In the early years, the epipolar line corresponding to the centroid was used to track mango counting [20]. In recent years, SFM inter-frame association techniques [21] and optical flow models of recurrent all-pair field transforms based on RGB-D images [22] have been used in fruit counting and yield estimation. The disadvantage of using optical flow in tracking tasks is that optical flow cannot adequately handle situations where occlusion exists, so more and more research has focused on the Hungarian algorithm and Kalman filtering. Hu et al. [23] used the YOLOv7 network and the attention mechanism to count apples based on two tracking methods: simple online real-time tracking (SORT) and Cascade-SORT. Tu et al. [24] detected and tracked passion fruit based on lightweight YOLOv5s and DeepSORT [25]; the average counting accuracy result reached 95.1%. Escamilla [26] proposed a detection and counting method for greenhouse sweet peppers based on the YOLOv5 and DeepSORT algorithms using RGB-D images. The combination of using the YOLO-series detector and the DeepSORT tracker has also been applied for counting fruits such as dragon fruit [27], apples [28], citrus [29,30], green peppers [31], etc. Other tracker designs include ByteTrack [32], BotSort [33], a local feature-matching transformer-tracking algorithm [34], AgriSORT [35], Deep OC-Sort [36], LocalizeSORT [37], etc. Shui et al. [38] employed ByteTrack for video tracking and used depth data from RGB-D cameras to estimate the position of cabbages. Rong et al. [39] used ByteTrack and YOLOv5 to detect and count the number of tomato clusters by analyzing the RGB-D data. Rasika et al. [40] adopted DeepSORT and a YOLO-series algorithm to analyze RGB-D images in apple orchards, achieving a counting accuracy of 86.6%. Villacrés et al. [41] compared five tracking strategies to detect and count apples and determined that DeepSORT is better suited for tracking fruits in orchard settings. Zhou et al. [42] combined yolov5 with the attention mechanism to detect and track strawberries. The DeepSORT and multimap combination was considered to achieve the best counting accuracy for ripe fruits, with an error rate of 8.7%. Although research on video-based object detection and counting has been carried out in other types of orchards, such as apple orchards, there is still a gap in the research on banana plantations, especially in the area of video tracking.
Counting-based yield estimation focuses on the number of fruits rather than the fruit volume or weight, which are important parameters for planting management. Therefore, weight-based yield estimation (including size and volume estimation) requires further investigation. Jianping et al. [43] proposed a smartphone yield estimation for the system to find the relationship between four color features and weight to achieve weight estimation of apple orchards. Hondo et al. [44] used the Mask R-CNN network to record the growth characteristics of apples, capture the growth curve, and estimate the fruit size. The mean absolute percentage discrepancy between the actual value and the measured value was below 0.079. Devanna et al. [45] described MANet architecture to identify, enumerate, and assess the volume/weight of grape clusters utilizing RGB-D images. Sarron et al. [46] used the KNN-based machine vision algorithm to detect aerial images of 144 mango trees, quantify the structural characteristics, including crown area, tree height, and volume, and estimate the weight of mangoes in the orchard. The R2 of the model was greater than 0.77. Wittstruck et al. [47] proposed a pumpkin volume and weight estimation model based on aerial images. By counting 40 pumpkins, the relationship between image pixels and weight was calculated, with a statistical success rate of 95%. Mokria et al. [48] used digital calipers and scales to measure avocado fruit length, diameter, weight, stem diameter, crown diameter, and total tree height and proposed an allometric growth equation to estimate the weight of the fruit on the tree. In our previous work [18], we proposed a model for estimating the weight of banana bunches. As far as we are aware, this represents the initial application of banana weight estimation based on RGB-D images.
Based on the above yield estimation methods, we can see that researchers have built and conducted targeted yield estimation algorithms for the number or weight of fruits in different types of orchards, and more and more studies tend to analyze dynamic images. In this research, we focused on the yield estimation system and dynamic tracking problem of banana orchards. RGB-D data with deep learning algorithms were integrated in this work. Based on the weight estimation of a single banana tree, the DeepSORT target tracking algorithm was used to solve the problem of repeated counting. A hybrid yield estimation model that combines counting-based and weight-based methods was developed. Through video-based counting techniques and three constraints in the weight-based estimation process, the accuracy of banana orchard yield estimation was improved. To our knowledge, this is the first exploration of machine vision technology in banana orchard yield estimation. Our study provides banana growers with an efficient and accurate yield estimation tool, helping them to plan harvesting and transportation in advance, optimize irrigation and fertilization strategies, and improve economic benefits.

2. Materials and Methods

2.1. Sensor System and Data Collection

The machine vision system was mounted on a remote-controlled mobile vehicle. Figure 1 shows the banana orchard yield estimation system and machine assembly diagram. The hardware platform mainly consisted of a machine vision system, a crawler chassis carrier, and an electromechanical and hydraulic control system. On the carrier platform, there was a hydraulic drive system, an electrical control system, and a vision system. The hydraulic drive system workstation was installed at the center of the carrier platform, the electrical control system was arranged away from the chassis motor and reducer end, and the depth camera was mounted on the front left side of the carrier, with the lens direction perpendicular to the direction of advance. The machine vision system consisted of an Intel (R) Core (TM) i7-9750H @2.6 GHz 2.59 GHz, 16.0 GB RAM, an NVIDIA GeForce RTX 2070 laptop, and an Intel Realsense D435i depth camera. In the Realsense D435i camera, two infrared cameras simultaneously captured infrared images, and a depth camera captured a depth image. The camera had an operational range of 0.28 to 3 m. The resolution of the RGB images was set at 1280 × 720 pixels, while the depth image resolution was configured at 848 × 480 pixels. Data collection and experimental validation were performed in the banana orchard of the Guangdong Academy of Agricultural Sciences and the banana demonstration base in Jiangmen City, Guangdong Province. The training data collection environments included overcast conditions, sunny days with backlighting, and sunny days with front lighting. The harvesting experiments were conducted under overcast conditions.

2.2. Weight Estimation of a Bunch of Bananas

In our earlier studies, the YOLO-Banana detection model was introduced, and the weight estimation model of banana bunches (h + Vp model) based on structured light technology was proposed [18]. In [18], we detected and localized banana bunches based on the YOLO-Banana model and RGB-D images and analyzed metrics such as the width (w), height (h), effective pixel ratio (p), volume of the cuboid (V), and volume estimation (Vp) of banana bunches to discuss the weight estimation model for banana bunches. The h + Vp model had an R2 value of 0.8143, which was the highest among all the models. We did not repeat the process here and directly used the weight estimation model w b a n a n a ,
w b a n a n a = 2.301 + 0.0233 h + 5.331 × 10 8 V p
where h is the banana bunch’s height, and Vp is the banana bunch’s estimated volume,
V p = 0.5 w 2 × h × p
where w is the width of the banana bunch, and p is the effective pixel ratio, which means the ratio of the sum of banana bunch pixels to the area of the detection bounding box in the RGB image.

2.3. Banana Bunches Tracking and Counting

The occlusion during the robot’s advance in the banana orchard is random, and changes in the shape and size of banana bunches or unclear videos will affect the target tracking effect. When target detection fails in a certain frame, it is still necessary to track the target based on its position and appearance in the previous frame to avoid repeated counting caused by the same target appearing again in subsequent frames. Therefore, the target tracking algorithm is very important in banana orchard yield estimation. Banana orchard robots need to obtain real-time information on the number and weight of banana bunches, as well as the number and position of banana stalks. We used the DeepSORT algorithm to achieve the dynamic tracking of banana bunches and solve the problem of repeated counting.
In the DeepSORT algorithm, the Kalman filter [49] describes the state vector X at a certain moment and contains eight indicators:
X = u , v , γ , h , x ˙ , v ˙ , γ ˙ , h ˙
where u and v are the horizontal and vertical coordinates of the center point of the detection frame, respectively; γ is the aspect ratio of the detection frame; h is the height of the detection frame; and x ˙ , v ˙ , γ ˙ , h ˙ is the speed of change in image coordinates.
The distance between the tracking state predicted by the Kalman filter and the Hungarian algorithm [50] is used to match the target based on the cost matrix c i , j ,
c i , j = λ d 1 i , j + 1 λ d 2 i , j
where d 1 i , j is the Mahalanobis distance, which provides a more accurate measure of distance by taking into account the covariance between features; d 2 i , j is the minimum cosine distance, and λ is the weight.
In this work, the banana orchard tracking algorithm process is shown in Figure 2. First, the YOLO-Banana model was used to detect bunches and stalks. The feature parameters of the detections were used as the initialization values and input into the Kalman filter in the DeepSORT tracker. The Kalman filter predicted the position of banana bunches and stalks to obtain tracks. In the next frame, YOLO-Banana measured the new detection boxes detections’ of banana bunches and stalks, calculated the Mahalanobis distance and cosine distance with the predicted result tracks, obtained the motion features and appearance features, and generated the cost matrix c i , j ; then, the Hungarian algorithm implemented cascade matching to make a preliminary association between the detections’ and tracks. There are three situations in this stage: (1) Matched tracking: There are targets in both the previous and next frames, and the tracking tracks and detections’ are matched successfully. When the match is successful, the banana bunch and stalk detections’ of the current frame are updated with the Kalman filter’s predicted tracks, and the Kalman gain, state update, and covariance update are calculated to generate the tracking of this frame, which is divided into confirmed tracking and unconfirmed tracking. The difference lies in whether the predicted tracking is the banana bunches and stalks or the background. The confirmed tracking tracks’ are utilized to determine the cost matrix for the subsequent frame. (2) Unmatched detection: When a new banana fruit appears or is blocked for a long time, the new detection has no tracking trajectory to match it. (3) Unmatched tracking: When a detection is missed, the predicted tracking cannot find the corresponding detection target. The unconfirmed tracking, unmatched detection, and unmatched tracking are re-matched by IOU matching. If the match fails, there are also unmatched detections and unmatched tracking; if the match is successful, the Kalman filter is updated, and the confirmed track is acquired as much as possible and participates in the subsequent update.

2.4. Yield Estimation Process in Banana Orchards

The workflow of banana orchard yield estimation is shown in Figure 3. First, RGB images and depth images are obtained, and RGB-D images are generated after registration. Then, the YOLO-Banana detection algorithm is used to obtain the banana position information, and the banana bunch weight is calculated by using Formula (1). The DeepSORT tracking algorithm is further used to achieve banana orchard target tracking. Finally, based on target tracking, three restrictions are set to estimate the number and yield of banana bunches: (1) Setting the counting condition: Only banana bunches that have been successfully tracked after t consecutive position changes are counted, reducing the counting error caused by the temporary false switching of the bunch ID. (2) Setting the distance between the banana bunch and the origin of the camera coordinate system: As the banana orchard machine moves forward, the banana bunches are counted one by one to avoid interference from banana bunches at a distance. (3) Setting the weight range of the banana bunch: When a new banana bunch appears in the field of view, the bunch area gradually increases as the field of view expands. If the banana bunch is counted and weighed immediately after it appears, the size of the bunch is not fully displayed at this time, which can easily cause weight calculation errors. Therefore, when the estimated weight is within the threshold range, the entire banana bunch is considered to be displayed in the field of view, and then the number and weight are counted. Finally, the cumulative number and total weight of the banana bunches are calculated to complete the banana orchard yield estimation.

3. Results and Discussion

The proposed banana orchard yield estimation method was tested and verified in the banana garden of the Guangdong Academy of Agricultural Sciences and the banana demonstration base in Jiangmen City, Guangdong Province. In the experiment, the maximum retention time Max_age was set to 60, the number of continuous matching frames n was set to 3, the number of continuous tracking banana bunches t was set to 6, the banana bunch distance was limited to between 1.5 and 3.5 m, and the bunch weight range was set to between 15 and 35 kg. The number of frames was determined based on the experimental experience. The ranges for the banana bunch distance and the bunch weight were referenced from the ranges provided in the literature [51]. The experiments were carried out on a computer equipped with an NVIDIA GeForce RTX 2070 GPU, with the operating system being Windows version 19,045.5608. We implemented the DeepSort algorithm using Python 3.9 as the programming language. For data visualization, we utilized OpenCV and PyQt. The detection performance of YOLO-Banana and the weight estimation performance for individual banana bunches based on RGB-D images were discussed in our previous work. Here, we present the tracking results and demonstrate the effectiveness of the three limiting conditions in improving tracking performance through counting experiments. Finally, we conducted a statistical analysis of the yield estimation task to verify the practical application capability of the yield estimation system.

3.1. Tracking Results

The results of banana orchard target tracking are shown in Figure 4. In the experiment, we still displayed the detection and tracking results for the banana stalks to provide location information for harvesting and other production operations. Since the banana stalk area was much smaller than that of the banana leaves and branches, the impact of branches and leaves on the banana stalks was very obvious during the machine’s forward movement in the banana orchard. Therefore, this work did not count the number of banana stalks. It is essential to mention that the figure shows the tracking ID number of the banana bunches and stalks, not the statistical number. During the target tracking process, the ID number was used to identify the target identity and for matching in consecutive frames. The tracking IDs that failed to match were deleted. Figure 4a shows the detection results at the beginning of the algorithm. Two banana bunches and one banana stalk were initialized for tracking. The banana stalk corresponding to the second banana bunch was not detected or tracked due to occlusion. During the forward movement, each tracking of the previous frame was continuously updated by the detection frame of the subsequent frame, as shown in Figure 4b. As the banana orchard machine moved forward, the banana stalk of the second bunch entered the field of view and was considered as a new banana stalk for tracking; it entered the subsequent update, as shown in Figure 4c. If there is no need for banana stalk information, the banana stalk detection and tracking results can be omitted from the display, allowing the focus to be solely on the banana bunch, as illustrated in Figure 4d.

3.2. Contribution of Restrictions in Counting

In the process of counting the number of banana clusters, we set three restrictions: (1) the banana bunch position should be tracked more than six times in a row, (2) the distance of the banana bunch should be between 1.5 and 2.5 m, and (3) the banana bunch weight should be between 15 and 35 kg. The following example illustrates the necessity of the three conditions in banana orchard statistics.
First, although the tracking of most banana bunches was relatively stable, a large amount of shaking of the machine during the forward movement will cause tracking failure, such as shaking caused by uneven ground in the banana orchard. At this time, the system will mistakenly judge the detection object as a new ID, thereby repeating the counting of the same bunch of bananas. If the process is very short, it will quickly return to the original ID after the machine returns to stability. For example, in Figure 5a, the banana fruit on the right is detected as banana ID 102. As the machine moves forward, the ID is mistakenly changed to ID 110 (as shown in Figure 5b). Then, within less than two seconds, the ID reverts back to ID 102 (as shown in Figure 5c). To minimize the counting error resulting from such scenarios, a number was set for the continuous tracking of the banana bunch positions, and the new ID that appeared briefly was not counted, thereby ensuring the accuracy of banana bunch statistics. In Figure 5b, the tracking ID of the banana bunch on the left is banana-88. The ID changed due to incomplete visibility in the field of view. The new ID was not counted because it did not meet the weight range criteria.
Second, the distance setting is also an important part of counting. Banana bunches from different planting rows that simultaneously appear within the field of view will lead to repeated counting. Figure 6a,b are the scenes in the vertical direction and the forward direction, respectively. We set the counting distance as between 1.5 and 2.5 m according to the spacing of the banana plants and counted the cumulative number of the current bunches one by one as the machine moved forward. In Figure 6a, the distance of the banana bunch with ID 37 was closer, and the banana bunches with IDs 97 and 13 were in another row. We only counted the banana bunch with ID 37, and when the machine was working in another row, we counted ID 97 and ID 13. Similarly, in Figure 6b, the banana bunch with ID 98 was counted first, and then the banana fruit with ID 97 was counted.
Third, during the counting process, a new banana bunch appeared in the field of view (as shown in Figure 7a), or an obscured banana bunch was more completely displayed in the field of view as the machine moved forward (as shown in Figure 7b). These are two very common situations. When part of the banana bunch appeared, the calculated size and weight were very small and incorrect. The appearance of the banana bunch was also prone to causing new tracking and repeated counting. Therefore, we set the machine to count when the weight of the banana bunch was between 15 and 35 kg. In Figure 7a, the number and weight of the banana bunch ID 44 were only counted after it completely entered the field of view. In Figure 7, ID 88 and ID 91 were both within the distance limit, but ID 91 was partially obscured. Therefore, banana bunch ID 88 was counted first. As the machine moved forward, banana bunch ID 91 appeared more completely in the field of view, and then the number and weight were counted.

3.3. Banana Bunch Counting Results

Based on the tracking, the number of banana bunches in different rows was counted, as shown in Figure 8. Four conditions were tested: no restrictions; setting a restriction on the number of position changes (1); setting restrictions on the number of position changes and the distance (1) (2); and setting restrictions on the number of position changes, the distance, and the weight (1) (2) (3). The results were compared with the actual number. As can be seen from the figure, with an increase in the number of restrictions, the statistics steadily neared the actual value, and the average accuracy of each row gradually increased to 96.82%. In particular, when the distance restriction was set, the error of repeated counting was greatly reduced. The restriction on the weight of banana bunches further re-fined the counting value and made the statistical results more stable.

3.4. Yield Estimation Results

According to the weight estimation and tracking results of each banana bunch, a weight estimation was obtained for each row of banana trees and compared with the empirical predictions of the banana farmers. Three farmers with many years of planting experience were invited to calculate the weight of each banana bunch, and the average value was taken as the empirical prediction. The yield estimation of banana orchards is shown in Table 1.
The experiment was conducted on six rows of banana trees, with a total of 1163.69 kg of banana bunches. The average accuracy of the empirical predictions for each row was 91.37%, and the accuracy of the yield estimations for the entire orchard was 96.36%. Compared with the empirical predictions, the yield estimation of the proposed model was closer to the actual weight, with an average accuracy of 93.01% for each row and an overall yield prediction accuracy of 97.25%. This saves costs and improves the accuracy of yield predictions. The RMSE for the empirical predictions was 17.91 kg, while the RMSE for the model prediction was 12.81 kg. The model prediction exhibited smaller errors and greater accuracy. We calculated the 95% confidence intervals for the differences between the actual and predicted weights. The 95% confidence interval for the empirical predictions was [−18.80, 4.02] kg, whereas for the model prediction, it was [−10.19, 6.15] kg. We observed that the confidence interval for the model prediction was narrower, which further supports the conclusion that the model prediction demonstrated a higher precision and stability compared to the empirical predictions. These results demonstrate that the model prediction method has potential application value in forecasting banana orchard yields.

3.5. Discussions

Table 2 presents a comparison of this study with other yield estimation studies. It can be easily seen from the table that compared to the classic machine learning algorithm SVR, which exhibits significant errors in yield estimation, deep learning algorithms demonstrate a superior performance. The video-based method for yield estimation offers robots greater mobility due to its tracking capabilities, making them more practical compared to image-based estimation. For small-volume fruits, a counting-based approach to yield estimation is a good choice, while weight-based estimation is suitable for yield estimation tasks involving large-volume fruits or fruit clusters. By comparing the results, it is evident that the different methods show slightly varying performances in yield estimation across various fruits. Our method is at a mid-to-high level, providing a research direction for the yield estimation of large-volume fruits.
Our system was designed with a high degree of modularity and scalability, allowing for customization to suit various agricultural environments. In the context of cooperatives, our system can serve as a shared resource utilized by multiple farms; for large plantations, it can be integrated into existing automation and monitoring systems; for small farms requiring high mobility, the system can be equipped with portable devices.
While this method competently realized the objective of yield estimation based on weight predictions, it has some shortcomings and limitations. The banana variety studied in our research was the Cavendish AAA banana and its cultivated varieties. We set the weight range and planting distance of bananas based on this variety, and other varieties would require corresponding parameters to update the tracking conditions. Both experimental orchards in this study were tested under overcast conditions. Although our detection algorithm is robust to different lighting conditions, we must acknowledge that the results of this experiment provide us with a performance benchmark for banana yield estimation, which can be used for a comparative analysis under other lighting conditions in the future. Meanwhile, complex weather conditions (such as rainfall, strong winds, fog, high humidity, etc.) have always been an unexplored area and one of the major challenges for field robotic systems. In addition, as the vehicle moves forward, banana bunches that are occluded by leaves will gradually come into view of the camera. The constraints set in the tracking process can effectively reduce the impact of partial occlusion. However, we note that complete occlusion may temporarily interrupt the tracking. Finally, we chose a crawler chassis carrier as the mobility platform to achieve the stable movement of the machine, but uneven terrain remains one of the important factors affecting tracking accuracy.

4. Conclusions

This research investigated the yield estimation problem of banana orchards and proposed a set of methods suitable for banana yield estimation and target tracking. This study achieved target tracking and yield estimation in banana orchards by integrating the YOLO-Banana detector and DeepSORT tracker and utilizing RGB-D data. By setting the constraints of the tracking process in practice, the accuracy of the visual statistics in banana orchards was significantly improved. The results show that the success rate of banana bunch counting reached 96.82%, and the accuracy of the banana orchard yield estimation reached 97.25%. Through a detailed statistical analysis, we provided a scientific quantitative assessment method for banana yield estimation by offering the 95% confidence interval for the model prediction, which was [−10.19, 6.15] kg. As the first exploration of RGB-D technology and the DeepSORT algorithm in banana orchards, this work provides banana growers with more precise decision-making support. We extended the common object detection tasks to the field of object tracking in banana orchards. The designed banana yield estimation system is suitable not only for small orchards but also for large-scale plantations. This system has a reference value for the yield estimation of large-volume fruits. By predicting banana yields, growers can more reasonably arrange labor and transportation resources and provide information for intelligent harvesting, efficient logistics, and optimized sales supply chains. Future research will be carried out in terms of improving the accuracy of the tracking algorithms under different lighting conditions and complex weather conditions and on the generalization ability of yield estimation in orchards growing different banana varieties.

Author Contributions

Conceptualization, Z.Y.; methodology, L.Z.; software, L.Z. and Z.Y.; validation, L.F. and J.D.; formal analysis, L.F.; investigation, L.F.; resources, J.D.; data curation, L.Z.; writing—original draft preparation, L.Z.; writing—review and editing, L.F. and J.D.; visualization, L.F.; supervision, Z.Y.; project administration, Z.Y.; funding acquisition, L.F. and J.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 32271996; the Ph.D. Research Start-Up Fund of Wuyi University, grant number BSQD2222; and the Open competition program of top ten critical priorities of Agricultural Science and Technology Innovation for the 14th Five-Year Plan of Guangdong Province, grant number 2024KJ27.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available because the laboratory is currently in the process of organizing the complete dataset.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. He, L.; Fang, W.; Zhao, G.; Wu, Z.; Fu, L.; Li, R.; Majeed, Y.; Dhupia, J. Fruit yield prediction and estimation in orchards: A state-of-the-art comprehensive review for both direct and indirect methods. Comput. Electron. Agric. 2022, 195, 106812. [Google Scholar] [CrossRef]
  2. Huang, Y.; Qian, Y.; Wei, H.; Lu, Y.; Ling, B.; Qin, Y. A survey of deep learning-based object detection methods in crop counting. Comput. Electron. Agric. 2023, 215, 108425. [Google Scholar] [CrossRef]
  3. Noor, S.; Tajik, O.; Golzar, J. Simple random sampling. Int. J. Educ. Lang. Stud. 2022, 1, 78–82. [Google Scholar]
  4. Tang, Y.; Qiu, J.; Zhang, Y.; Wu, D.; Cao, Y.; Zhao, K.; Zhu, L. Optimization strategies of fruit detection to overcome the challenge of unstructured background in field orchard environment: A review. Precis. Agric. 2023, 24, 1183–1219. [Google Scholar] [CrossRef]
  5. Chen, M.; Chen, Z.; Luo, L.; Tang, Y.; Cheng, J.; Wei, H.; Wang, J. Dynamic visual servo control methods for continuous operation of a fruit harvesting robot working throughout an orchard. Comput. Electron. Agric. 2024, 219, 108774. [Google Scholar] [CrossRef]
  6. Mekhalfi, M.L.; Nicolò, C.; Ianniello, I.; Calamita, F.; Goller, R.; Barazzuol, M.; Melgani, F. Vision system for automatic on-tree kiwifruit counting and yield estimation. Sensors 2020, 20, 4214. [Google Scholar] [CrossRef]
  7. Roy, P.; Kislay, A.; Plonski, P.A.; Luby, J.; Isler, V. Vision-based preharvest yield mapping for apple orchards. Comput. Electron. Agric. 2019, 164, 104897. [Google Scholar] [CrossRef]
  8. Méndez, V.; Pérez-Romero, A.; Sola-Guirado, R.; Miranda-Fuentes, A.; Manzano-Agugliaro, F.; Zapata-Sierra, A.; Rodríguez-Lizana, A. In-field estimation of orange number and size by 3d laser scanning. Agronomy 2019, 9, 885. [Google Scholar] [CrossRef]
  9. Uribeetxebarria, A.; Martínez-Casasnovas, J.A.; Tisseyre, B.; Guillaume, S.; Escolà, A.; Rosell-Polo, J.R.; Arnó, J. Assessing ranked set sampling and ancillary data to improve fruit load estimates in peach orchards. Comput. Electron. Agric. 2019, 164, 104931. [Google Scholar] [CrossRef]
  10. Apolo-Apolo, O.E.; Martínez-Guanter, J.; Egea, G.; Raja, P.; Pérez-Ruiz, M. Deep learning techniques for estimation of the yield and size of citrus fruits using a uav. Eur. J. Agron. 2020, 115, 126030. [Google Scholar] [CrossRef]
  11. Li, D.; Sun, X.; Jia, Y.; Yao, Z.; Lin, P.; Chen, Y.; Zhou, H.; Zhou, Z.; Wu, K.; Shi, L.; et al. A longan yield estimation approach based on uav images and deep learning. Front. Plant Sci. 2023, 14, 1132909. [Google Scholar] [CrossRef] [PubMed]
  12. Mathew, J.; Delavarpour, N.; Miranda, C.; Stenger, J.; Zhang, Z.; Aduteye, J.; Flores, P. A novel approach to pod count estimation using a depth camera in support of soybean breeding applications. Sensors 2023, 23, 6506. [Google Scholar] [CrossRef] [PubMed]
  13. Wu, Y.; Chen, J.; He, L.; Gui, J.; Jia, J. An rgb-d object detection model with high-generalization ability applied to tea harvesting robot for outdoor cross-variety tea shoots detection. J. Field Robot. 2024, 41, 1167–1186. [Google Scholar] [CrossRef]
  14. Yang, L.; Noguchi, T.; Hoshino, Y. Development of a pumpkin fruits pick-and-place robot using an rgb-d camera and a yolo based object detection ai model. Comput. Electron. Agric. 2024, 227, 109625. [Google Scholar] [CrossRef]
  15. Palacios, F.; Diago, M.P.; Melo-Pinto, P.; Tardaguila, J. Early yield prediction in different grapevine varieties using computer vision and machine learning. Precis. Agric. 2023, 24, 407–435. [Google Scholar] [CrossRef]
  16. Fu, L.; Wu, F.; Zou, X.; Jiang, Y.; Lin, J.; Yang, Z.; Duan, J. Fast detection of banana bunches and stalks in the natural environment based on deep learning. Comput. Electron. Agric. 2022, 194, 106800. [Google Scholar] [CrossRef]
  17. Fu, L.; Yang, Z.; Wu, F.; Zou, X.; Lin, J.; Cao, Y.; Duan, J. Yolo-banana: A lightweight neural network for rapid detection of banana bunches and stalks in the natural environment. Agronomy 2022, 12, 391. [Google Scholar] [CrossRef]
  18. Zhou, L.; Yang, Z.; Deng, F.; Zhang, J.; Xiao, Q.; Fu, L.; Duan, J. Banana bunch weight estimation and stalk central point localization in banana orchards based on rgb-d images. Agronomy 2024, 14, 1123. [Google Scholar] [CrossRef]
  19. Wu, F.; Yang, Z.; Mo, X.; Wu, Z.; Tang, W.; Duan, J.; Zou, X. Detection and counting of banana bunches by integrating deep learning and classic image-processing algorithms. Comput. Electron. Agric. 2023, 209, 107827. [Google Scholar] [CrossRef]
  20. Stein, M.; Bargoti, S.; Underwood, J. Image based mango fruit detection, localisation and yield estimation using multiple view geometry. Sensors 2016, 16, 1915. [Google Scholar] [CrossRef]
  21. Santos, T.T.; de Souza, K.X.; Neto, J.C.; Koenigkan, L.V.; Moreira, A.S.; Ternes, S. Multiple orange detection and tracking with 3-d fruit relocalization and neural-net based yield regression in commercial sweet orange orchards. Comput. Electron. Agric. 2024, 224, 109199. [Google Scholar] [CrossRef]
  22. Tan, C.; Sun, J.; Paterson, A.H.; Song, H.; Li, C. Three-view cotton flower counting through multi-object tracking and rgb-d imagery. Biosyst. Eng. 2024, 246, 233–247. [Google Scholar] [CrossRef]
  23. Hu, J.; Fan, C.; Wang, Z.; Ruan, J.; Wu, S. Fruit detection and counting in apple orchards based on improved yolov7 and multi-object tracking methods. Sensors 2023, 23, 5903. [Google Scholar] [CrossRef]
  24. Tu, S.; Huang, Y.; Liang, Y.; Liu, H.; Cai, Y.; Lei, H. A passion fruit counting method based on the lightweight yolov5s and improved deepsort. Precis. Agric. 2024, 25, 1731–1750. [Google Scholar] [CrossRef]
  25. Wojke, N.; Bewley, A.; Paulus, D. Simple online and realtime tracking with a deep association metric. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17 September 2017. [Google Scholar]
  26. Escamilla, L.D.V.; Gómez-Espinosa, A.; Cabello, J.A.E.; Cantoral-Ceballos, J.A. Maturity recognition and fruit counting for sweet peppers in greenhouses using deep learning neural networks. Agriculture 2024, 14, 331. [Google Scholar] [CrossRef]
  27. Wang, C.; Wang, C.; Wang, L.; Li, Y.; Lan, Y. Real-time tracking based on improved yolov5 detection in orchard environment for dragon fruit. J. Asabe 2023, 66, 1109–1124. [Google Scholar] [CrossRef]
  28. Cao, D.; Luo, W.; Tang, R.; Liu, Y.; Zhao, J.; Li, X.; Yuan, L. Research on apple detection and tracking count in complex scenes based on the improved yolov7-tiny-pde. Agriculture 2025, 15, 483. [Google Scholar] [CrossRef]
  29. Zheng, Z.; Xiong, J.; Wang, X.; Li, Z.; Huang, Q.; Chen, H.; Han, Y. An efficient online citrus counting system for large-scale unstructured orchards based on the unmanned aerial vehicle. J. Field Robot. 2023, 40, 552–573. [Google Scholar] [CrossRef]
  30. Feng, Y.; Ma, W.; Tan, Y.; Yan, H.; Qian, J.; Tian, Z.; Gao, A. Approach of dynamic tracking and counting for obscured citrus in smart orchard based on machine vision. Appl. Sci. 2024, 14, 1136. [Google Scholar] [CrossRef]
  31. Du, P.; Chen, S.; Li, X.; Hu, W.; Lan, N.; Lei, X.; Xiang, Y. Green pepper fruits counting based on improved deepsort and optimized yolov5s. Front. Plant Sci. 2024, 15, 1417682. [Google Scholar] [CrossRef]
  32. Wu, M.; Yuan, K.; Shui, Y.; Wang, Q.; Zhao, Z. A lightweight method for ripeness detection and counting of chinese flowering cabbage in the natural environment. Agronomy 2024, 14, 1835. [Google Scholar] [CrossRef]
  33. Qi, Z.; Zhang, T.; Yuan, T.; Zhou, W.; Zhang, W. Assessment of the tomato cluster yield estimation algorithms via tracking-by-detection approaches. Inf. Process. Agric. 2025, in press. [Google Scholar] [CrossRef]
  34. Hernandez, B.; Medeiros, H. Multi-object tracking in agricultural applications using a vision transformer for spatial association. Comput. Electron. Agric. 2024, 226, 109379. [Google Scholar] [CrossRef]
  35. Saraceni, L.; Motoi, I.M.; Nardi, D.; Ciarfuglia, T.A. Agrisort: A simple online real-time tracking-by-detection framework for robotics in precision agriculture. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13 May 2024. [Google Scholar]
  36. Tan, S.; Kuang, Z.; Jin, B. Appleyolo: Apple yield estimation method using improved yolov8 based on deep oc-sort. Expert Syst. Appl. 2025, 272, 126764. [Google Scholar] [CrossRef]
  37. Vemuru, S.; Ankit, K.; Venkatesan, S.P.; Ghose, D.; Kolathaya, S. Yield prediction and counting using world coordinates. Ssrn 2024, 4829514. [Google Scholar] [CrossRef]
  38. Shui, Y.; Yuan, K.; Wu, M.; Zhao, Z. Improved multi-size, multi-target and 3d position detection network for flowering chinese cabbage based on yolov8. Plants 2024, 13, 2808. [Google Scholar] [CrossRef]
  39. Rong, J.; Zhou, H.; Zhang, F.; Yuan, T.; Wang, P. Tomato cluster detection and counting using improved yolov5 based on rgb-d fusion. Comput. Electron. Agric. 2023, 207, 107741. [Google Scholar] [CrossRef]
  40. Abeyrathna, R.R.D.; Nakaguchi, V.M.; Minn, A.; Ahamed, T. Recognition and counting of apples in a dynamic state using a 3d camera and deep learning algorithms for robotic harvesting systems. Sensors 2023, 23, 3810. [Google Scholar] [CrossRef]
  41. Villacrés, J.; Viscaino, M.; Delpiano, J.; Vougioukas, S.; Cheein, F.A. Apple orchard production estimation using deep learning strategies: A comparison of tracking-by-detection algorithms. Comput. Electron. Agric. 2023, 204, 107513. [Google Scholar] [CrossRef]
  42. Zhou, X.; Zhang, Y.; Jiang, X.; Riaz, K.; Rosenbaum, P.; Lefsrud, M.; Sun, S. Advancing tracking-by-detection with multimap: Towards occlusion-resilient online multiclass strawberry counting. Expert Syst. Appl. 2024, 255, 124587. [Google Scholar] [CrossRef]
  43. Jianping, Q.; Bin, X.; Wu, X.; Chen, M.; Wang, Y. A smartphone-based apple yield estimation application using imaging features and the ann method in mature period. Sci. Agric. 2018, 75, 273–280. [Google Scholar]
  44. Hondo, T.; Kobayashi, K.; Aoyagi, Y. Real-time prediction of growth characteristics for individual fruits using deep learning. Sensors 2022, 22, 6473. [Google Scholar] [CrossRef] [PubMed]
  45. Devanna, R.P.; Romeo, L.; Reina, G.; Milella, A. Yield estimation in precision viticulture by combining deep segmentation and depth-based clustering. Comput. Electron. Agric. 2025, 232, 110025. [Google Scholar] [CrossRef]
  46. Sarron, J.; Malézieux, É.; Sané, C.; Faye, É. Mango yield mapping at the orchard scale based on tree structure and land cover assessed by uav. Remote Sens. 2018, 10, 1900. [Google Scholar] [CrossRef]
  47. Wittstruck, L.; Kühling, I.; Trautz, D.; Kohlbrecher, M.; Jarmer, T. Uav-based rgb imagery for hokkaido pumpkin (cucurbita max.) Detection and yield estimation. Sensors 2021, 21, 118. [Google Scholar] [CrossRef]
  48. Mokria, M.; Gebrekirstos, A.; Said, H.; Hadgu, K.; Hagazi, N.; Dubale, W.; Bräuning, A. Fruit weight and yield estimation models for five avocado cultivars in ethiopia. Environ. Res. Commun. 2022, 4, 75013. [Google Scholar] [CrossRef]
  49. Bishop, G.; Welch, G. An introduction to the kalman filter. Proc Siggraph Course 2001, 8, 41. [Google Scholar]
  50. Kuhn, H.W. The hungarian method for the assignment problem. Nav. Res. Logist. Q. 1955, 2, 83–97. [Google Scholar] [CrossRef]
  51. Jing, T.; Xie, J.; Zhou, D. Banana Cultivation and Pest and Disease Control, 1st ed.; China Agriculture Press: Beijing, China, 2022; pp. 31–87. [Google Scholar]
Figure 1. Banana orchard yield estimation system and machine assembly diagram.
Figure 1. Banana orchard yield estimation system and machine assembly diagram.
Agronomy 15 01119 g001
Figure 2. The tracking algorithm flow of banana bunches and stalks.
Figure 2. The tracking algorithm flow of banana bunches and stalks.
Agronomy 15 01119 g002
Figure 3. The workflow of banana orchard yield estimation.
Figure 3. The workflow of banana orchard yield estimation.
Agronomy 15 01119 g003
Figure 4. Tracking results in the banana orchard: (a) start of tracking; (b) tracking results after 3 s; (c) tracking results after 10 s; (d) tracking results for banana bunches only.
Figure 4. Tracking results in the banana orchard: (a) start of tracking; (b) tracking results after 3 s; (c) tracking results after 10 s; (d) tracking results for banana bunches only.
Agronomy 15 01119 g004
Figure 5. Banana bunch ID mismatch during statistics: (a) tracking at a certain moment; (b) ID error; (c) ID recovery.
Figure 5. Banana bunch ID mismatch during statistics: (a) tracking at a certain moment; (b) ID error; (c) ID recovery.
Agronomy 15 01119 g005
Figure 6. The appearance of banana bunches at different distances during the statistical process: (a) vertical direction; (b) forward direction.
Figure 6. The appearance of banana bunches at different distances during the statistical process: (a) vertical direction; (b) forward direction.
Agronomy 15 01119 g006
Figure 7. Banana bunches appeared incomplete during the counting process: (a) a new banana bunch appeared; (b) a partially obscured banana bunch.
Figure 7. Banana bunches appeared incomplete during the counting process: (a) a new banana bunch appeared; (b) a partially obscured banana bunch.
Agronomy 15 01119 g007
Figure 8. Banana bunch counting results.
Figure 8. Banana bunch counting results.
Agronomy 15 01119 g008
Table 1. Yield estimation results in banana orchards.
Table 1. Yield estimation results in banana orchards.
Row NumberNumber of Banana BunchesActual Weight (kg)Empirical PredictionModel Prediction
Predicted Weight (kg)Accuracy (%)Predicted Weight (kg)Accuracy (%)
18176.7219589.66167.594.78
210192.8521190.59206.1393.11
39176.3416392.44189.3692.62
413249.9127789.16263.5194.59
511169.217894.8154.2591.16
610198.6718291.61214.9491.81
Total611163.69120696.361195.6997.25
Average Accuracy-- 91.37 93.01
RMSE 17.91 kg12.81 kg
95% confidence interval [−18.80, 4.02] kg[−10.19, 6.15] kg
Table 2. Comparison with other studies on yield estimation.
Table 2. Comparison with other studies on yield estimation.
FruitSensorDetectorTrackerYield Estimation TypeResults
Grapevine [15] RGBSVR Counting-based (image-based)NRMSE of 29.77%
Longan [11]RGBSF-YD model Counting-based (image-based)Average error rate of 2.99%
Passion fruit [24]RGBYOLOv5s DeepSORTCounting-based (video-based)Average accuracy of 95.1%
Strawberry [42]RGBYOLOv5sDeepSORTCounting-based (video-based)The lowest error rate of 8.7%
Cotton flower [22]RGB-DYolov8RAFTCounting-based (video-based)Mean absolute percentage error of 6.22%
Tomato cluster [39]RGB-DYOLOv5ByteTrackCounting-based (video-based)Average accuracy of 95.1%
Grape cluster [45]RGB-DMANet Weight-based (volume and image)Average error of 12%
Banana (This work)RGB-DYolo-BananaDeepSORTWeight-basedAverage accuracy of 93.01%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhou, L.; Yang, Z.; Fu, L.; Duan, J. Yield Estimation in Banana Orchards Based on DeepSORT and RGB-Depth Images. Agronomy 2025, 15, 1119. https://doi.org/10.3390/agronomy15051119

AMA Style

Zhou L, Yang Z, Fu L, Duan J. Yield Estimation in Banana Orchards Based on DeepSORT and RGB-Depth Images. Agronomy. 2025; 15(5):1119. https://doi.org/10.3390/agronomy15051119

Chicago/Turabian Style

Zhou, Lei, Zhou Yang, Lanhui Fu, and Jieli Duan. 2025. "Yield Estimation in Banana Orchards Based on DeepSORT and RGB-Depth Images" Agronomy 15, no. 5: 1119. https://doi.org/10.3390/agronomy15051119

APA Style

Zhou, L., Yang, Z., Fu, L., & Duan, J. (2025). Yield Estimation in Banana Orchards Based on DeepSORT and RGB-Depth Images. Agronomy, 15(5), 1119. https://doi.org/10.3390/agronomy15051119

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop