Application of the Improved YOLOv8-DeepSORT Framework in Motion Tracking of Pepper Leaves Under Droplet Occlusion

Guo, Fengfeng; Liu, Kuan; Ma, Jing; Qiu, Baijing

doi:10.3390/agronomy16030384

Open AccessArticle

Application of the Improved YOLOv8-DeepSORT Framework in Motion Tracking of Pepper Leaves Under Droplet Occlusion

¹

Department of Computer Information, Suzhou Vocational and Technical College, Suzhou 234101, China

²

Key Laboratory of Modern Agriculture Equipment and Technology, Jiangsu University, Zhenjiang 212013, China

³

School of Mechanical Engineering, Zhejiang University of Water Resources and Electric Power, Hangzhou 310018, China

^*

Authors to whom correspondence should be addressed.

Agronomy 2026, 16(3), 384; https://doi.org/10.3390/agronomy16030384

Submission received: 26 December 2025 / Revised: 27 January 2026 / Accepted: 31 January 2026 / Published: 5 February 2026

(This article belongs to the Special Issue Research Progress in Agricultural Robots in Arable Farming)

Download

Browse Figures

Versions Notes

Abstract

In agricultural plant protection spraying, dynamic occlusion by droplet swarms on leaf surfaces poses a major challenge to accurately acquiring leaf motion parameters, limiting the optimization of precision spraying and pesticide utilization. Traditional contact-based methods interfere with natural leaf dynamics, while non-contact optical approaches suffer from tracking failures under occlusion. This study proposes an improved framework combining YOLOv8 integrated with a Spatial Attention Module (SAM) and optimized DeepSORT for robust non-contact tracking of marked points on pepper leaves. High-speed binocular cameras were used to collect leaf motion data under controlled droplet occlusion conditions. Results demonstrate that, under 5% occlusion, the improved model achieves a 19.6% increase in detection mAP@0.5 and significantly enhances tracking MOTA, with trajectory breakage rate reduced to 3.2% and ID switches decreased by approximately 71.4% in long-sequence tracking. Quantitative analysis of leaf midrib motion reveals a clear spatial gradient: average speed increases from 0.012 m s⁻¹ at the base to 0.153 m s⁻¹ at the tip, with intensified fluctuations toward the tip and a consistent dominant vibration frequency of 0.403 Hz across all points. This method provides an efficient, reliable non-contact solution for measuring leaf motion parameters in complex spraying scenarios, offering valuable data support for targeted spray parameter optimization and improved deposition efficiency in precision agriculture.

Keywords:

crop leaf; spraying; occlusion; YOLOv8; DeepSORT; motion parameters

1. Introduction

Accurate acquisition of plant leaf motion parameters under mechanical loads is a topic of common concern in fields such as agricultural and forestry plant protection spraying, mechanical structure optimization, and energy harvester design [1,2,3]. During the process of agricultural plant protection spraying, parameters such as the motion speed and frequency of leaves under the impact of pesticide droplet swarms directly affect the droplet retention efficiency on leaf surfaces. Thus, obtaining leaf motion parameters is of great significance for improving the pesticide utilization rate [4]. Furthermore, the leaf bending angle has been proven to be a key indicator for measuring the rainfall interception capacity and water storage capacity of trees, serving as an important foundation for enhancing the accuracy of hydrological models [5]. In research on mechanical structure optimization, leaf deformation parameters are a crucial basis for optimizing the dynamic parameters of mechanical structures [2]. Particularly in bionic structures, leaf motion parameters under airflow fields are utilized to improve bionic designs, thereby enhancing the energy conversion efficiency of leaf-like wind energy harvesters [6]. However, during agricultural plant protection operations, pesticide droplet swarms which are either in an in-flight state or deposited on leaf surfaces cause visual occlusion for leaf tracking. Furthermore, in the currently widely adopted UAV-based plant protection operations, the airflow generated by the propellers further exacerbates leaf movement, bringing additional interference to leaf recognition. Meanwhile, the acquisition of leaf motion parameters still heavily relies on manual monitoring and manual extraction, with both limited efficiency and accuracy. The core objective of this study is to provide an efficient solution for the acquisition of leaf motion parameters under conditions of droplet occlusion and airflow interference based on deep learning models [7,8]. The practical significance of obtaining leaf motion parameters (such as speed, frequency, and amplitude) lies in quantifying the dynamic leaf responses (e.g., vibration, flipping, and deformation) under droplet impact, which directly influence pesticide droplet deposition, retention, and uniform distribution on both adaxial and abaxial leaf surfaces, thereby enabling optimization of spraying parameters, improvement of pesticide utilization efficiency, and reduction in drift and environmental pollution [9,10,11].

Various contact measurement methods such as strain gauges, accelerometers, and inclinometers have been widely used for measuring plant motion parameters induced by mechanical forces [1,12]. For a long time, contact measurement technologies have been extensively applied in research on plant motion parameter acquisition. Strain gauges are attached to the surface of plant stems and use resistance changes caused by deformation to characterize plant stress and vibration responses. They are often used to monitor plant deformation under environmental wind to evaluate crop lodging resistance [13]. Similarly, accelerometers have been used to monitor the acceleration signals of trees to analyze plant vibration frequency and amplitude, thereby inferring changes in flowering and leaf stages [14,15]. However, plant leaves have structural characteristics such as small mass, high flexibility, and complex geometric shapes. When used as the measured system, the physical properties of contact sensors themselves, such as weight, can greatly change the dynamic characteristics of leaves. This results in load effects and wiring constraints, leading to inaccurate measurements and difficulty in adapting to the acquisition of motion parameters of flexible leaves. The development of non-contact measurement technology has made the acquisition of leaf motion parameters more accurate and non-intrusive.

Non-contact measurement has the advantage of measuring leaf motion parameters without touching the leaves. Lasers and visible light cameras have become common tools for obtaining leaf motion parameters, among which visible light cameras are more widely used [16,17]. For example, visible light cameras can record morphological changes in leaves in airflow in real time to evaluate leaf bending and vibration under the influence of spraying or wind (e.g., pear leaves in previous studies [18,19]). The underlying optical monitoring principles are general and applicable to various crops, including the pepper leaves investigated in the current study. Optical target monitoring measures leaf motion by marking optical targets on leaves and using cameras or sensors to track changes in target positions in real time. It is often used for evaluating plant growth dynamics and environmental responses, such as monitoring small displacements of leaves under light or wind to quantify biomechanical parameters [20]. However, during agricultural spraying, the marker points on plant leaves measured optically are easily occluded by flying droplets and those deposited on leaf surfaces. This causes the tracking points to be untraceable, resulting in noisy or even unavailable data. The wide application of deep learning models in the field of image processing and their good reasoning ability make it possible to solve the failure in acquiring leaf motion parameters caused by droplet occlusion [21,22].

In recent years, the research on object detection algorithms has shown an obvious classified development trend. They are mainly categorized into two-stage detectors (e.g., Faster R-CNN and Cascade R-CNN), which first generate candidate boxes through a Region Proposal Network (RPN) followed by classification and regression. These algorithms offer the highest accuracy but suffer from slow inference speed, making them unsuitable for high-frame-rate real-time applications [23,24]; one-stage detectors (e.g., YOLO series, SSD, and RetinaNet), which directly predict bounding boxes and categories on the grid to achieve end-to-end fast inference. They perform excellently in the speed-accuracy trade-off and are particularly suitable for real-time scenarios [25]; anchor-free detectors (e.g., CenterNet and FCOS), which avoid anchor hyperparameter tuning through key point or center point prediction to improve generalization ability, but their detection performance for extremely small targets still needs further optimization [26]; Transformer-based detectors (e.g., DETR and RT-DETR), which utilize self-attention mechanisms for global modeling and lead in accuracy in complex scenarios, yet they incur high computational costs, have slow inference speed, and demand high hardware specifications [27]; in addition, there are emerging directions such as lightweight models (e.g., NanoDet) and open-vocabulary detectors (e.g., YOLO-World and Grounding DINO) [28].

The SOTA (State-of-the-Art) models in the field of object detection are mainly concentrated on the latest YOLO variants (e.g., YOLOv10 and YOLOv11) and the RT-DETR series, which continuously refresh accuracy records on general benchmarks such as COCO [25,27]. However, in specific scenarios like precision agriculture the one-stage YOLO series, especially YOLOv8, is still widely regarded as the most balanced and practical baseline. This is attributed to its extremely high inference speed, mature open-source ecosystem, and numerous successful applications in agricultural small-target detection tasks (e.g., weed identification, fruit counting, and spray target localization) [29,30]. Compared with two-stage and Transformer-based detectors, YOLOv8 has obvious advantages in real-time performance and customization flexibility; compared with other one-stage or lightweight models, it provides stronger support for occlusion robustness. Therefore, selecting YOLOv8 as the basic framework in this study can achieve the optimal detection performance and practical deployment effect in the highly dynamic and occluded environment of pesticide spraying.

Aiming at the problems such as poor stability of motion parameter tracking caused by dynamic aggregation and occlusion of leaf target points by group droplets, and highlight noise caused by droplet reflection during agricultural plant protection spraying operations, this study has the following main characteristics: (1) A robust method for acquiring leaf motion parameters in group droplet occlusion scenarios is proposed. Based on the YOLOv8 algorithm, optimization for droplet occlusion is performed to improve the robustness and accuracy of leaf motion parameter extraction under droplet occlusion, providing an effective solution for target parameter perception in complex spraying scenarios; (2) A dedicated optimization module suitable for leaf tracking is designed and integrated. Firstly, the multi-frame tracking strategy of DeepSORT is optimized, and the Kalman filter state equation is adjusted according to the flexible motion characteristics of leaves to effectively solve the problems of tracking drift and loss that are prone to occur during occlusion switching.

2. Materials and Methods

2.1. Method Implementation Process

The technical route of this study is shown in Figure 1. Firstly, the improved YOLOv8 algorithm is used to detect the current frame of the input video and obtain the position information of leaf target points. Subsequently, the obtained target point position information is transmitted to the improved DeepSORT algorithm for data association, and the leaf target position information of adjacent frames is matched. IDs are assigned to the target points to obtain tracking results. Finally, the obtained target point ID information and bounding box information are counted, and the displacement and speed information of the leaves are output.

2.2. Data Collection

2.2.1. Crop Cultivation and Target Production

This study took ‘Tianshuai101’, a representative pepper variety in China, as the research object to carry out research on leaf motion parameter tracking when leaf targets are occluded by droplets. Pepper seedlings were cultivated in flowerpots with perlite as the culture medium, which has good water retention, root fixation, and water-air-fertilizer exchange properties. They were watered once a day with 200 mL of nutrient solution. During the experiment, pepper leaves with a length of 100 to 130 mm and no mechanical damage were selected.

Before the experiment, marker points for leaf motion tracking were produced, and the effect after production is shown in Figure 2. The white target markers pasted on the surface of pepper leaves have a diameter of 1.2 ± 0.1 mm and each target weighs approximately 0.05 mg. These target points are evenly distributed on the leaf surface with a spacing of 5 mm, and the number of target points is flexibly adjusted according to the leaf area (a typical leaf with a length of 100 to 130 mm has 13 to 17 target points distributed on the midrib). The layout adopts a grid-like array along the midrib to evenly cover all areas of the leaf. This arrangement ensures high contrast, facilitating pixel level detection of images under high-speed imaging, thereby better assisting the accurate tracking and motion parameter extraction of the YOLOv8 and DeepSORT algorithms under droplet occlusion conditions.

2.2.2. Spraying System

The spraying system is used to generate the droplet swarms required for the experiment, and its structure and key components are shown in Figure 3. The experimental set nozzle spray pressure is 0.3 MPa, corresponding to a flow rate of 0.39 L min⁻¹. The selected flow rate of 0.39 L min⁻¹ (corresponding to a pressure of 0.3 MPa) aligns with the typical operating parameters of the employed standard flat fan nozzle (Licheng VP110-01, Nb-licheng Inc., Ningbo, China, spray angle 110°), which is widely used in ground-based agricultural plant protection spraying in China. This pressure and flow rate range (0.2–0.4 MPa, 0.3–0.5 L min⁻¹ per nozzle) is a common setup in field applications for row crops such as peppers, as it ensures sufficient coverage and penetration while controlling drift. A diaphragm pump provides stable working pressure for the nozzle, a pressure regulating valve is responsible for pressure regulation with an adjustment range of 0.05 to 0.85 MPa, and the real-time pressure can be directly read through a pressure gauge. A relief valve can automatically release pressure when the system pressure is overloaded to avoid damage to the device.

2.2.3. Leaf Motion Data Collection Under Spraying Operations

The leaf motion measurement system is used for leaf motion data collection, and the on-site layout is shown in Figure 4. The system mainly consists of two high-speed cameras (i-SPEED TR, Olympus Co., Tokyo, Japan) equipped with macro lenses (AT-X Pro D 100 mm F2.8 Macro, Kenko Tokina Co., Ltd., Lübeck, Germany). When the camera shooting frame rate is set to 3000 FPS, the system needs to use a light source (OSRAM HLX64602, OSRAM GmbH, Munich, Germany) to increase the light input. Finally, an image sequence with a resolution of 1280 × 1024 pixels is obtained.

To achieve camera parameter calibration and the establishment of a three-dimensional (3D) spatial coordinate system, a binocular camera calibration experiment was conducted. Figure 5 presents a schematic diagram of the camera calibration process and leaf motion parameter extraction based on the principle of triangulation. To ensure that high-speed cameras attain measurement accuracy meeting experimental requirements within the small field of view (FOV) at the leaf scale, this study independently designed a dedicated calibration board suitable for small FOV scenarios; furthermore, to suppress reflective interference during the calibration process, the calibration board was fabricated from aluminum oxide material.

Binocular cameras are used to collect leaf motion data, which not only focuses on solving the core problem of tracking leaf surface target points under droplet occlusion but also reserves a data foundation for subsequent 3D tracking research. The experiment completed camera calibration through a stereo calibration board with a size of 188 mm × 228 mm and 128 calibration points with a diameter of 7.62 mm. After calibration, the image reprojection error is only 0.09 pixels, indicating excellent imaging accuracy of the system.

2.3. Detection and Tracking Methods

2.3.1. Improved YOLOv8

In agricultural plant protection spraying scenarios, the occlusion and scattering of group droplets are likely to severely weaken the visual signals of leaf target points. At the same time, they cause confusion between targets and deposited droplets on leaves, resulting in insufficient robustness of the traditional YOLOv8 model in detecting target points. Specific manifestations include technical bottlenecks such as positioning deviation, increased missed detection rate, and frequent false detection. To solve this core problem, this study proposes an improved YOLOv8 model integrated with a Spatial Attention Module (SAM), and the optimized network structure is shown in Figure 6. The Spatial Attention Module (SAM) enhances feature expression ability through adaptive weighting [31]. Specifically, the spatial attention mechanism weights the spatial dimension of the feature map to improve the expression ability of occluded target points. In this study, the SAM is inserted at the front end of the SPPF module at the end of the backbone network. Through the optimization of the spatial dimension, the perception ability of the model for leaf target points in spraying occlusion scenarios is significantly improved. The SAM is inserted after the deepest-level features extracted by the stacked multi-round conv and C2f modules in the backbone network and before the SPPF module, for two main reasons: (1) The SPPF module is responsible for performing multi-scale pooling on these deep features with high semantic value and low resolution. Inserting the spatial attention mechanism prior to pooling can prioritize enhancing the edge features of leaf key points, suppress the redundant responses from droplet highlight noise, and prevent the pooling operation from diluting the spatial location information of target features; (2) Preliminary ablation experiments show that inserting the SAM at this position can improve the detection mAP@0.5 by approximately 4.2%, which outperforms the schemes of inserting it into the preceding C2f stacked layers or before the head. Inserting it too early tends to scatter attention on low-level details, while inserting it too late will lead to significant attenuation of the attention effect due to the pooling operation. This design ensures that the attention mechanism exerts its maximum effect before multi-scale feature fusion, thus improving the localization accuracy of leaf target points in spray occlusion scenarios. This improvement effectively enhances the positioning accuracy of target points, reduces background interference and false detection phenomena, and provides reliable basic data for subsequent dynamic spraying state tracking based on DeepSORT.

2.3.2. DeepSORT

DeepSORT (Deep Simple Online and Realtime Tracking) is an efficient multi-target tracking framework. It integrates a deep learning-based appearance feature matching mechanism on the basis of the SORT algorithm, which can significantly enhance the tracking stability of targets in dynamic occlusion environments [32,33]. DeepSORT predicts motion trajectories through Kalman filtering and combines with Re-ID networks to extract robust features, achieving real-time tracking with a low ID switch rate [34]. The overall flow of the optimized DeepSORT in this study is shown in Figure 7.

Step 1: Initialization and Frame Detection Preprocessing

Upon algorithm startup, an empty trajectory set (tracks) is created, and key hyperparameters are configured based on the characteristics of the leaf spraying scenario: the track confirmation threshold n_init = 3 (requiring 3 consecutive detections), which is determined by the fact that leaves are prone to 1–2 frames of detection fluctuations when impacted by droplets, and 3 consecutive valid detections can avoid false trajectory initiation caused by transient noise; the maximum track age max_age = 3 frames, which is derived from field spraying experiment statistics showing that the maximum duration of leaf occlusion by droplet clusters does not exceed 25 frames, with a 5-frame redundancy reserved to prevent premature deletion of trajectories when occlusion persists; the detection confidence threshold conf_thr = 0.5, which is set considering the low signal-to-noise ratio of leaf targets (interfered by droplet highlight), balancing the suppression of low-quality false detections and the reduction in missed detection risks for valid leaf targets.

For each input video frame, the improved YOLOv8 detector is invoked to generate a list of detection boxes (detects), including position, class, and confidence scores. If no valid detections are present in the frame, the algorithm increments the age counter for all existing trajectories, removes those exceeding max_age, visualizes confirmed trajectories (e.g., with bounding box annotations), and skips subsequent operations to conserve computational resources. This mechanism optimizes algorithm performance in empty-frame scenarios by eliminating unnecessary feature extraction processes. For frames with valid detections, high-confidence results (valid subset) are filtered, and deep appearance features (feats) are batch-extracted from cropped target regions using GPU parallel acceleration, laying a reliable foundation for downstream matching while reducing noise interference from low-quality detection boxes.

Step 2: Trajectory Prediction and Hierarchical Matching

Active trajectories (excluding those marked for deletion) are selected from the trajectory set. For each active trajectory, the Kalman filter’s predict() function is called to forecast its state, and the global track age counter is updated. The association phase adopts a hierarchical (cascaded) matching strategy: in the first stage, IOU-based matching (iou_match) calculates the overlap between detection boxes and predicted boxes, rapidly generating initial matched pairs (iou_m), unmatched trajectories (u_t), and unmatched detections (u_d). This step employs a gating mechanism to filter non-overlapping candidates, significantly reducing invalid matching computations.

In the second stage, appearance-enhanced matching (app_match) is performed on the remaining unmatched targets: the extracted features (feats) are fed into a re-identification (Re-ID) network (e.g., a CNN-based model) to calculate cosine similarity with historical trajectory features; a composite cost matrix is constructed by fusing appearance similarity and Mahalanobis distance (used to quantify motion uncertainty), and the Hungarian algorithm is applied to solve for the optimal assignment, yielding supplementary matched pairs (app_m); the supplementary matched pairs are merged with the initial matched pairs to form the complete trajectory-detection association results (matches). This two-stage matching design prioritizes high-confidence associations, reduces computational load, and leverages deep embedding features to enhance the discrimination of similar-appearing targets. In dense occlusion scenarios, compared with the SORT algorithm, this strategy reduces the target ID switch rate by approximately 50%, ensuring long-sequence tracking consistency and adapting to the scenario where leaves are frequently occluded by droplets.

Step 3: Trajectory Management and Result Output

For each matched pair

(t, i)

in the matches, the corresponding trajectory is updated via the update

(valid [i], feats [i])

function, refreshing the Kalman filter state and appearance model, resetting the track age to 0, and incrementing the hit counter (hits). Tentative trajectories with hits ≥ n_init are promoted to confirmed status, activating full output functionality. This design further validates trajectory validity and avoids fragmented trajectories caused by transient false detections.

New tentative trajectories (new_track) are initialized with their initial features for unmatched detections in u_d and appended to the trajectory set. Unmatched trajectories in u_t that exceed max_age are marked for deletion (deleted = true), and the active trajectory list is refreshed. Finally, confirmed trajectories are filtered as output results, with simultaneous visual rendering (adding ID labels, trajectory lines, and bounding boxes). This stage enables refined management of the trajectory lifecycle, suppresses fragmented output results, and ensures the smoothness and continuity of leaf motion trajectory tracking.

3. Results

3.1. Target Occlusion Evolution

Figure 8 shows a dynamic sequence of images of droplets on the surface of pepper leaves from deposition to aggregation and then to occlusion. It includes 18 frames with a time span from 2.86 s to 5.15 s. Figure 8 intuitively presents the occlusion process of leaf target points by droplet aggregation behavior, providing real working conditions for the performance verification of subsequent detection and tracking methods. The image sequence starts at 2.86 s after the start of spraying. At this time, droplets adhere to the leaf surface in discrete hemispherical shapes with a diameter of 1.0 to 2.0 mm, mainly distributed in the leaf base and edge areas. All 34 white target points (diameter 1.2 ± 0.1 mm) on the leaf surface are unoccluded and have high contrast with the leaf background, providing high-quality images for algorithm initialization.

During the period from 3.94 s to 4.75 s, droplet aggregation occurs. Driven by capillary action and the hydrophobicity of pepper leaves, droplets fuse to form composite droplets of 3.0 to 8.0 mm. Among the 34 target points, 2 (at 3.94 s) to 7 (at 4.75 s) experience partial occlusion of 5.9% to 20.6%. The highlight characteristics of composite droplets lead to overlap between target points and noise pixels, posing challenges for detection algorithms.

During the period from 4.88 s to 5.15 s, droplets occlude the target points. Larger droplets form on the leaf surface, especially at the midrib position. Eight target points on the leaf have an occlusion ratio greater than 20%, among which three target points near the midrib are completely submerged in sequence. This directly causes trajectory breakage of the baseline model, while the improved model only has one short term breakage, initially confirming the antiocclusion advantage of the optimized method. At 5.15 s, some large droplets fall off, and two target points are exposed but their contrast is reduced, still causing detection interference. This highlights the demand for method robustness in complex spraying scenarios.

3.2. Target Detection Results of Improved YOLOv8

3.2.1. Comparison of Detection Performance Under Different Occlusion Ratios and Types

Figure 9 intuitively shows the comparison of core detection performance between the baseline YOLOv8 and the improved model integrated with the SAM under 0%, 2%, and 5% target pixel occlusion ratios (1000 test frames, 3 repetitions). The occlusion ratio refers to the percentage of marker pixel area covered by droplets, with markers being white circles of 1.2 ± 0.1 mm in diameter. For calculation, 100 frames were randomly sampled per occlusion condition; target boundaries were manually annotated via ImageJ 1.53, and the ratio of droplet-covered pixels to total marker pixels was computed. These three ratios correspond to typical stages in spraying scenarios: 0% denotes full visibility at initial spraying; 2% indicates mild edge occlusion (2–3% coverage) in early droplet deposition; 5% represents moderate occlusion (5–7% coverage with highlight noise) at peak droplet aggregation, covering baseline, accuracy-declining, and high-miss-detection scenarios to verify the SAM’s adaptability to droplet occlusion.

A total of 3000 test frames from 10 independent spray sequences were adopted for the detection experiments, with 300 frames randomly extracted from each sequence to cover the complete evolution process of droplet occlusion states. All detection experiments were replicated three times to ensure the statistical robustness of the results. For the tracking experiments, 5 complete spray sequences with a duration of 5 s were selected; based on an acquisition frame rate of 3000 FPS, each sequence contained approximately 15,000 frames, and these tracking experiments were also replicated three times. It should be noted that all the leaf samples involved in the experimental sequences were collected from mature individuals of different pepper plants, so as to reflect the natural biological differences in size, morphology, and dynamic response characteristics among different leaves.

Figure 9 shows the performance of the algorithms under different occlusion conditions using mAP@0.5, precision, and recall as evaluation indicators, focusing on verifying the adaptability of the SAM to droplet occlusion scenarios. The detailed quantitative comparison is summarized in Table 1.

As shown in Table 1 and Figure 9, under 0% occlusion, the mAP@0.5 of the improved model is 97.0 ± 1.2%, which is only 1.9% higher than that of the baseline model (95.1 ± 1.5%). The improvements in precision (96.4 ± 0.8% vs. 95.2 ± 1.1%) and recall (95.9 ± 0.9% vs. 94.8 ± 1.0%) are limited, indicating that the SAM has no redundant interference under non-occlusion conditions.

Under 2% mild occlusion, the mAP@0.5 of the improved model reaches 92.4 ± 1.3%, which is 4.2% higher than that of the baseline model (88.2 ± 1.6%). The precision (92.1 ± 1.0%) and recall (91.5 ± 1.2%) are increased by 3.4% and 4.2% compared with the baseline model, respectively. This confirms that the SAM can effectively suppress highlight noise from small droplets and reduce false negatives caused by edge occlusion of target points. Under 5% moderate to severe occlusion, the performance advantage of the improved model is significant. The mAP@0.5 increases from 61.8 ± 2.1% of the baseline model to 81.4 ± 1.8%, with an increase of 19.6%. The recall rate rises from 61.1 ± 1.9% to 81.7 ± 1.5%, and the precision rate increases from 62.4 ± 2.3% to 84.3 ± 1.6%, both exceeding 80%. This completely solves the problems of target point missed detection and misjudgment of the baseline model under moderate to severe occlusion, laying a foundation for subsequent accurate tracking. The final tracking effect is shown in Figure 10.

3.2.2. Ablation Experiment of SAM

Figure 11 shows the results of the ablation experiment of the SAM, presenting the variation trend of mAP@0.5 of the models with and without the SAM under 0%, 2%, and 5% occlusion ratios. After removing the SAM, the mAP@0.5 of the model under 0% occlusion is 95.3 ± 1.4%, which is basically consistent with the baseline model. Under 2% occlusion, it is 88.5 ± 1.7%, which is only 0.3% higher than the baseline model. Under 5% occlusion, it is 62.3 ± 1.7%, which is comparable to the performance of the baseline model (61.8 ± 2.1%). This confirms that the performance improvement of the improved model is completely derived from the SAM, and the higher the occlusion ratio, the more significant the module gain, which meets the actual needs of droplet occlusion scenarios.

3.2.3. Consistency Verification Between Detection Indicators and Practical Application Effects

Figure 12 focuses on target point positioning error and detection integrity to verify the consistency between detection indicators and practical application effects under different occlusion ratios (taking 34 evenly distributed target points on a single leaf as the true value). Under 0% occlusion, both models achieve full detection of 34 target points with no false positives or negatives, and the positioning error is less than 0.5 pixels, which is consistent with the quantitative results of no significant difference in performance in Figure 9. Under 2% occlusion, the baseline model misses 1 leaf base target point (25% occlusion) and has 1 false positive (misjudging a 1.5 mm small droplet) with a positioning error of 0.65 ± 0.15 pixels. The improved model has no missed or false detections, and the positioning error is 0.42 ± 0.11 pixels, which is 35.4% lower than that of the baseline model. This confirms that the improved model has both detection integrity and positioning accuracy under mild occlusion.

Under 5% occlusion, the baseline model misses 2 midrib target points (40–50% occlusion) and has 3 false positives (3–4 mm composite droplets) with a positioning error of 1.23 ± 0.21 pixels. The improved model successfully identifies all target points with a confidence level of ≥0.8 and no false positives, and the positioning error is reduced to 0.45 ± 0.13 pixels, which is 63.4% lower than that of the baseline model. In the leaf base area where droplets are most densely deposited, the recall rate of the improved model is 98.5%, which is 8.8% higher than that of the baseline model (89.7%). This directly proves that the improvement of mAP@0.5 translates into better detection effects in practical scenarios, providing reliable target point positioning data for the accurate extraction of leaf motion parameters.

3.3. Leaf Tracking Results

3.3.1. Verification of Tracking Stability in Long Sequences and Multi-Scenes

Figure 13 shows the core performance and multi-scene stability of the improved YOLOv8 + DeepSORT model in 60 s long sequence offline tracking scenarios, providing tracking level reliability support for the accurate acquisition of leaf motion parameters under droplet occlusion. The 60 s long sequence tracking results (Figure 13a) show that the improved model integrated with the SAM, optimized Kalman filtering, and ResNet-50 Re-ID feature matching achieves a multi-objective tracking accuracy (MOTA) of 95.3 ± 1.2%, which is 12.1% higher than the baseline model without the SAM (83.2 ± 1.5%). The number of ID switches decreases from 14 ± 2 times of the baseline model to 4 ± 1 times, with a reduction of 71.4%. The trajectory breakage rate decreases from 15.7 ± 1.3% to 3.2 ± 0.5%, with a reduction of 80.3%, and the average trajectory length increases by 82.5%. Even during the high occlusion density period of 30 to 45 s, the improved model only has one trajectory breakage which can be quickly recovered, while the baseline model has five trajectory breakages. This indicates that the improved model has better trajectory continuity in long-sequence offline tracking and can avoid the loss of leaf motion parameters caused by tracking interruption. The multi-scene stability test results (Figure 13b) further verify the model robustness. Under 2% droplet occlusion scenario, the MOTA of the improved model is 96.5 ± 1.0%, which is 7.3% higher than the baseline model (89.2 ± 1.5%), and the number of ID switches decreases from 6 ± 1 times to 2 ± 1 times. Under 5% droplet occlusion scenario, the MOTA of the improved model remains 93.7 ± 1.2%, which is 15.3% higher than the baseline model (78.4 ± 2.0%), and stable tracking can be achieved even when the target point occlusion degree reaches 50%. Under the scenario of light fluctuation (2000–15,000 lx), the MOTA of the improved model only decreases by 2.7% (from 97.8 ± 0.8% to 95.1 ± 1.3%), while the baseline model decreases by 8.5%. Under the scenario of mild leaf overlap (<10%), the trajectory breakage rate of the improved model is 4.1 ± 0.6%, which is much lower than 12.8 ± 1.5% of the baseline model. The Re-ID feature effectively distinguishes overlapping target points through texture differences, ensuring the stability of multi-target offline tracking. The above results indicate that the improved model can maintain high tracking accuracy and low breakage rate in complex scenarios such as droplet occlusion, light interference, and leaf overlap, laying a solid tracking foundation for the accurate extraction of subsequent leaf motion parameters. It also verifies the feasibility and reliability of the improved YOLOv8 + DeepSORT method integrated with the SAM for obtaining leaf motion parameters in complex scenarios.

3.3.2. Occlusion Recovery Capability

Table 2 shows the occlusion recovery capabilities of the baseline model and the improved model under three target pixel occlusion ratios of 0%, 2%, and 5%. The results complement the quantitative tracking performance data. Under 0% non-occlusion conditions, both models achieve stable tracking of target points, and the improved model performs better. In the quantitative data, its MOTA reaches 97.5%, IDF1 is 98.0%, the number of ID switches is only 1 time, and the average trajectory length is 1678 frames. Compared with the baseline model (MOTA 96.8%, 2 ID switches, trajectory length 1653 frames), it has better stability without redundant interference. Under 2% mild occlusion, droplet aggregation causes edge occlusion of target points and highlight noise. The baseline model has obvious trajectory drift, the number of ID switches increases to 6 times, the average trajectory length shortens to 1578 frames, and the MOTA drops to 89.4%. Relying on the noise suppression capability of the SAM and the optimized Kalman filtering strategy, the improved model has no trajectory drift or breakage, the number of ID switches is only 2 times, the average trajectory length is 1621 frames, and the MOTA is maintained at 93.2%, showing significant robustness in occlusion recovery. Under 5% moderate to severe occlusion, target points near the leaf midrib are partially or even completely occluded. The baseline model has frequent trajectory breakages, the number of ID switches reaches 15 times, the average trajectory length is only 1465 frames, and the MOTA is only 68.2%, making effective tracking difficult. The improved model has a particularly prominent advantage in occlusion recovery. The Re-ID feature effectively suppresses identity drift, and tracking can be quickly recovered even when target points are partially occluded. The number of ID switches is reduced to 5 times, the average trajectory length is extended to 1557 frames, and the MOTA is increased to 87.5%, providing stable trajectory support for the subsequent extraction of leaf motion parameters.

3.4. Analysis of Leaf Motion Results

3.4.1. Quantification of Motion Parameters at Each Point of the Leaf Midrib

Figure 14 shows that the algorithm successfully extracts the motion speed of each target point on the leaf vein when the leaf area is 15.90 cm² and the spray pressure is 0.3 MPa. The variation characteristic of the motion speed of each point on the leaf midrib from the base to the tip over time is that the motion speed gradually increases from the leaf base to the tip. Figure 14 uses time as the horizontal axis, speed as the vertical axis, and the other dimension corresponds to different measurement points on the midrib (numbered from 13 at the leaf base to 1 at the leaf tip). From the data distribution, the average motion speed of the measurement point corresponding to the leaf base (yellow curve in the figure) is 0.012 m s⁻¹ with a small fluctuation range. The average speed of the measurement points in the middle of the midrib (orange curve in the figure) shows a gradient increase with the increase in distance from the base, and the average speed of the measurement points is 0.0543 m s⁻¹. The measurement point corresponding to the leaf tip (red curve in the figure) has the highest motion speed, with an average value of 0.153 m s⁻¹ and a relatively larger fluctuation range. The above results indicate that the motion speed of the leaf midrib has obvious spatial gradient characteristics, increasing gradually from the base to the tip. This characteristic is consistent with the mechanical response law of the flexible leaf structure. At the same time, it verifies that the improved YOLOv8 + DeepSORT method integrated with the SAM can accurately capture the differences in motion speed at different positions of the midrib, providing refined quantitative basis for subsequent optimization of spray parameters combined with spatial speed distribution.

Figure 15 intuitively shows the 3D motion trajectory and spatial distribution of the leaf’s maximum displacement region, the leaf tip. The figure’s 3D coordinate axes are defined as Longitudinal, Lateral, and Normal Directions, corresponding to the leaf tip’s three spatial trajectory components. The 3D trajectory uses a color-coded method, with curve colors representing the leaf tip’s spatial positions at different times. This color gradient design clearly traces the temporal evolution of leaf motion, including the pre-impact initial state, instantaneous impact response and subsequent vibration phase. Meanwhile, trajectory density and extension direction directly reflect the leaf’s motion range: dense clusters indicate repeated vibration regions, and trajectory length correlates with displacement in specific directions.

When impacted by the droplet swarm, the leaf tip shows a distinct dynamic response: it first moves rapidly downward along the Normal Direction under impact force, then continues oscillating downward with damped vibration, reaching a maximum normal displacement of 35 mm. In contrast, motion in the other two directions is relatively small (≈8 mm), indicating significant constraints on leaf sway.

Comprehensive analysis of Figure 15’s trajectory morphology and motion parameters shows that the leaf primarily undergoes pitch motion centered on the leaf base after droplet swarm impact. This aligns with the mechanical properties of the leaf’s flexible structure and the direction of droplet impact force. The motion pattern reflects the leaf’s inherent structural constraints and provides direct visual evidence for quantifying leaf motion parameters and optimizing spray parameters.

3.4.2. Analysis of Dynamic Characteristics of Leaf Motion

Figure 16 shows the motion characteristics of 13 measurement points on the leaf midrib from the tip to the base, intuitively presenting the spatial differentiation law of midrib motion parameters and providing quantitative basis for the dynamic interaction mechanism between leaves and droplets.

From the perspective of the motion speed characteristics of each measurement point on the midrib, the average speed from the leaf tip to the base shows a significant linear decreasing trend. The average speed of measurement point 1 near the leaf tip is the highest, reaching 0.153 ± 0.108 m s⁻¹. As the distance from the leaf base decreases (the sequence number of the measurement point increases), the speed gradually decreases. The average speed of measurement point 7 in the middle is 0.054 ± 0.040 m s⁻¹. The average speed of measurement point 13 at the leaf base is the lowest, only 0.012 ± 0.009 m s⁻¹, which is 92.1% lower than that of measurement point 1. The speed fluctuation range (standard deviation) also shows a decreasing law from measurement point 1 to 13. The speed standard deviation of measurement point 1 is 0.108 m s⁻¹, which is 12 times that of measurement point 13 (0.009 m s⁻¹). This characteristic reflects that the motion of the leaf tip area under droplet impact is more dynamic. The leaf tip is far from the stem constraint, with a thin structure and light mass, resulting in high motion freedom and a larger response amplitude to droplet impact. In contrast, the leaf base area is closely connected to the stem with high structural stiffness and strong motion constraints, so the motion amplitude and fluctuation caused by droplet impact are significantly weakened.

From the perspective of dominant vibration frequency characteristics (right axis of Figure 16), the dominant vibration frequencies of the 13 measurement points are consistent at 0.403 Hz. This indicates that the overall vibration of the leaf midrib follows a unified basic period (approximately 2.48 s), which is jointly determined by the structural stiffness and mass distribution of the leaf itself and has no direct correlation with the excitation period of external droplet impact. Combined with the speed characteristics, although the overall vibration period of the midrib is consistent, there are significant spatial differences in the motion amplitude of different regions, reflecting the dynamic law of leaf motion with unified overall period and heterogeneous local response.

In summary, the results in Figure 16 confirm that the dynamic characteristics of leaf midrib motion show a spatial law: the closer to the leaf tip, the higher the average speed and the more intense the fluctuation; the closer to the leaf base, the lower the average speed and the more stable the motion. Moreover, the midrib as a whole follows a dominant vibration frequency of 0.403 Hz. These characteristics not only verify the dominant role of leaf structure constraints and stiffness distribution on motion response but also provide measured data support for subsequent optimization of spray parameters for different leaf regions, such as reducing the spray pressure in the leaf tip region to reduce motion amplitude.

4. Discussion

The difficulty in acquiring leaf motion parameters caused by dynamic droplet occlusion during agricultural spraying is a core bottleneck restricting the optimization of precision spraying [21]. Traditional contact measurement methods such as strain gauges and micro piezoelectric accelerometers not only easily interfere with the natural motion of flexible leaves but also struggle to cope with parameter distortion caused by droplet occlusion. Studies by Meder et al. [1] show that the error of laser ranging reaches 15% under 3% occlusion, and the load effect of contact strain gauges can lead to a deviation of more than 10% in leaf vibration frequency measurement. This study proposes a combined method of improved YOLOv8 integrated with a Spatial Attention Module (SAM) and optimized DeepSORT to address this problem. Under 5% occlusion, the detection mAP@0.5 is improved by 19.6% and the tracking MOTA is significantly enhanced, effectively reducing trajectory breakage and ID switches. At the same time, the spatial gradient characteristics of leaf midrib motion under droplet impact are quantified: the average motion speed gradually increases from the base (0.012 m s⁻¹) to the tip (0.153 m s⁻¹), accompanied by intensified fluctuations toward the tip, while the dominant vibration frequency remains consistent at 0.403 Hz across all measured points. This provides a reliable non-contact solution for measuring leaf motion parameters in complex spraying scenarios. A similar framework has been used for spray droplet pattern segmentation to improve deposition estimation accuracy [35].

The performance improvement of the improved YOLOv8 stems from the accurate enhancement of occluded target point features by the SAM. This module suppresses the highlight noise of droplet coalescence through spatial adaptive weighting and strengthens the edge features of partially occluded target points. This is highly consistent with the conclusion proposed by Ariza-Sentís et al. [21] in their review of agricultural target detection that local target features should be prioritized for enhancement in occlusion scenarios. Similar improvements have been applied in YOLOv8 for small target leaf/fruit detection to improve occlusion robustness [36,37]. Compared with existing technologies, the leaf tracking method based on traditional vision proposed by Gibbs et al. [22,38] has a trajectory breakage rate of 12.3% under 3% occlusion. In contrast, this study reduces the trajectory breakage rate under high occlusion to 3.2% in long-sequence tracking through the combination of the SAM and optimized DeepSORT, without relying on complex multi-camera calibration processes.

In terms of tracking stability, the improved DeepSORT reduces the number of ID switches by approximately 71.4% through the fusion of Kalman filter historical speed prediction and ResNet-50 Re-ID appearance matching. This optimization idea is consistent with the core idea of multi-modal matching to improve occlusion robustness proposed by Wojke et al. [32]. Recent work has further applied DeepSORT to agricultural spray droplet tracking to improve stability under dynamic occlusion [39]. Specifically, the deep features extracted by the Re-ID network effectively distinguish similar target points under droplet occlusion and reduce identity drift. This is more efficient than the appearance motion dual-feature constraint strategy adopted by Gibbs et al. [22], which has an IDF1 of only 68.2% under 5% occlusion, while this study reaches 89.1%. At the same time, the ability of Kalman filtering to bridge short-term occlusion of up to 5 frames leverages the short-term predictability and continuity inherent in dynamic leaf movements, as observed in time-lapse tracking for plant phenotyping, ensuring the physical rationality of the trajectory [40].

Despite the above progress, this study still has limitations. Firstly, it relies on high-contrast white target points (reflectivity > 90%), which have insufficient generalization on leaves with complex natural textures such as hairy or spotted leaves. This is a common problem in the field as pointed out by Ariza-Sentís et al. [21] that agricultural target detection has poor generalization. Recent reviews have further discussed the challenges of agricultural occlusion and small target detection as well as attention mechanism solutions [41]. Future research can focus on two aspects: firstly, combining unsupervised domain adaptation technology to train target detection models for leaves with natural textures without manually marking high-contrast target points. This idea has been proven to improve generalization in weed detection [26]; secondly, using the MobileNetV3 lightweight network instead of ResNet-50 as the Re-ID backbone, combined with model quantization and compression. Referring to the edge device optimization scheme proposed by Ariza-Sentís et al. [21], the time consumed per frame processing can be reduced.

Future research can further expand the applicability and robustness of this method to address more complex field environments. Specifically, first, the impact of different spray flow rates (e.g., in the range of 0.2–0.5 L min⁻¹) on leaf motion parameters and droplet deposition can be explored. This is because flow rate variations significantly alter droplet velocity, size distribution, and impact force, thereby affecting occlusion dynamics and pesticide utilization efficiency [42,43]. Second, natural raindrop impact scenarios (rather than being limited to pesticide spray droplets) can be simulated to quantify raindrop-induced leaf vibration, bending, and droplet ejection behaviors, which is of great significance for understanding crop water interception and mechanical responses under rainfall conditions [44,45]. Third, the effects of different pesticide dilution concentrations on the wetting, retention, and deposition behaviors of droplets on leaf surfaces can be tested: high concentrations tend to increase the adhesion rate but may raise the risk of rebound, while low concentrations contribute to uniform distribution [46,47]. Finally, various wind direction and speed conditions (e.g., 2–8 m s⁻¹) within the operational thresholds for spraying can be incorporated to evaluate the impact of wind disturbance on leaf motion trajectories, droplet drift, and canopy penetration, thereby optimizing wind-field-assisted spraying strategies [48]. These expansion directions will significantly enhance the generalization ability of the method, providing more comprehensive data support and theoretical guidance for spray parameter optimization and efficient pesticide utilization in precision agriculture under variable natural environments.

In actual field canopy environments, mutual occlusion between leaves is widespread, and the interference from stems and adjacent leaves will introduce additional challenges to target detection and tracking, potentially resulting in higher trajectory interruption rates and target ID switching frequencies. Future research will further expand the applicable scenarios of the algorithm framework, conduct experiments on whole plants or multi-leaf canopies, quantify the impact of mutual occlusion between leaves by combining binocular 3D reconstruction technology, and thus improve the practical application value of this method in complex field operation environments.

5. Conclusions

Aiming at the challenge of difficult acquisition of leaf motion parameters caused by droplet occlusion of leaf target points during agricultural spraying, this study realizes robust and continuous tracking of leaf target points based on the improved YOLOv8 target detection algorithm and DeepSORT multi-target tracking algorithm, and accurately obtains the displacement and speed of leaves. The main conclusions are as follows: (1) he improved YOLOv8 model integrated with the SAM significantly improves the robustness of target detection in droplet occlusion scenarios. Through spatial dimension feature weighting, this module effectively suppresses the highlight noise generated by droplet aggregation, strengthens the feature expression of partially occluded target points, and greatly improves the missed detection and misjudgment problems of the baseline model in moderate to severe occlusion scenarios, providing reliable guarantee for leaf target point identification in dynamic spraying environments. (2) The optimized DeepSORT tracking algorithm significantly improves the stability of multi-target tracking in complex occlusion scenarios by improving the Kalman filter prediction strategy and Re-ID appearance feature matching mechanism. The algorithm effectively reduces the problems of trajectory drift, identity switching, and breakage caused by droplet occlusion, and can maintain long-term stable trajectory continuity, laying a solid foundation for the accurate extraction of leaf motion parameters. (3) The analysis of leaf motion characteristics reveals the key laws of leaf motion under droplet impact. The motion speed of the leaf midrib shows obvious spatial gradient characteristics, gradually increasing from the base to the tip with intensified fluctuation amplitude, and the whole follows a unified dominant vibration frequency.

These quantitative laws provide an important measured basis for targeted optimization of spray parameters and improvement of pesticide deposition efficiency, and have practical significance for the development of precision agriculture technology.

Author Contributions

Conceptualisation, visualisation, software, data curation, writing—original draft preparation, F.G. and K.L.; methodology, writing—review and editing, J.M.; validation, investigation, formal analysis, project administration, B.Q.; resources, supervision, funding acquisition, J.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was made possible by the National Natural Science Foundation of China (No. 31971790), the Key Research and Development Program of Jiangsu Province (No. BE2020328), the Scientific Research Foundation of Zhejiang University of Water Resources and Electric Power (No. JBGS2025014), and A Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD2023-87).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Meder, F.; Naselli, G.A.; Mazzolai, B. Wind Dynamics and Leaf Motion: Approaching the Design of High-Tech Devices for Energy Harvesting for Operation on Plant Leaves. Front. Plant Sci. 2022, 13, 994429. [Google Scholar] [CrossRef] [PubMed]
Sun, Z.; Cui, T.; Zhu, Y.; Zhang, W.; Shi, S.; Tang, S.; Du, Z.; Liu, C.; Cui, R.; Chen, H.; et al. The Mechanical Principles behind the Golden Ratio Distribution of Veins in Plant Leaves. Sci. Rep. 2018, 8, 13859. [Google Scholar] [CrossRef] [PubMed]
Ma, J.; Liu, K.; Dong, X.; Huang, X.; Ahmad, F.; Qiu, B. Force and Motion Behaviour of Crop Leaves during Spraying. Biosyst. Eng. 2023, 235, 83–99. [Google Scholar] [CrossRef]
Wang, Z.; Zheng, C.; Li, T.; He, X. Analysing the Preference for Pesticide Spray to Be Deposited at Leaf-Tips. Biosyst. Eng. 2021, 204, 247–256. [Google Scholar] [CrossRef]
Holder, C.D.; Gibbes, C. Influence of Leaf and Canopy Characteristics on Rainfall Interception and Urban Hydrology. Hydrol. Sci. J. 2017, 62, 182–190. [Google Scholar] [CrossRef]
Zhang, X.; Li, A.; Fei, Y.; Sun, M.; Zhu, L.; Huang, Z. Design of Biomimetic Leaf-like Flow Fields Using Three-Dimensional Numerical Simulation for Co-Electrolysis in Solid Oxide Electrolysis Cell. Int. J. Hydrogen Energy 2024, 72, 326–337. [Google Scholar] [CrossRef]
Wang, Q.; Ren, Y.; Wang, H.; Wang, J.; Zhou, G.; Yang, Y.; Xie, Z.; Bai, X. Research on UAV Downwash Airflow and Wind-Induced Response Characteristics of Rapeseed Seedling Stage Based on Computational Fluid Dynamics Simulation. Agriculture 2024, 14, 1326. [Google Scholar] [CrossRef]
Mai, Y.; Wen, S.; Zhang, J.; Lan, Y.; Huang, G. Analysis of the Two-Way Fluid-Structure Interaction between the Rice Canopy and the Downwash Airflow of a Quadcopter UAV. Biosyst. Eng. 2025, 250, 343–364. [Google Scholar] [CrossRef]
Dorr, G.J.; Kempthorne, D.M.; Mayo, L.C.; Forster, W.A.; Zabkiewicz, J.A.; McCue, S.W.; Belward, J.A.; Turner, I.W.; Hanan, J. Towards a Model of Spray–Canopy Interactions: Interception, Shatter, Bounce and Retention of Droplets on Horizontal Leaves. Ecol. Modell. 2014, 290, 94–101. [Google Scholar] [CrossRef]
Massinon, M.; De Cock, N.; Forster, W.A.; Nairn, J.J.; McCue, S.W.; Zabkiewicz, J.A.; Lebeau, F. Spray Droplet Impaction Outcomes for Different Plant Species and Spray Formulations. Crop Prot. 2017, 99, 65–75. [Google Scholar] [CrossRef]
Dorr, G.J.; Forster, W.A.; Mayo, L.C.; McCue, S.W.; Kempthorne, D.M.; Hanan, J.; Turner, I.W.; Belward, J.A.; Young, J.; Zabkiewicz, J.A. Spray Retention on Whole Plants: Modelling, Simulations and Experiments. Crop Prot. 2016, 88, 118–130. [Google Scholar] [CrossRef]
Tsugawa, S.; Asakawa, H.; Hirata, M.; Nonoyama, T.; Kang, Z.; Toyota, M.; Suda, H. Inference of Mechanical Forces through 3D Reconstruction of the Closing Motion in Venus Flytrap Leaves. Sci. Rep. 2025, 15, 24860. [Google Scholar] [CrossRef] [PubMed]
Li, Z.; Liu, Y.; Hossain, O.; Paul, R.; Yao, S.; Wu, S.; Ristaino, J.B.; Zhu, Y.; Wei, Q. Real-Time Monitoring of Plant Stresses via Chemiresistive Profiling of Leaf Volatiles by a Wearable Sensor. Matter 2021, 4, 2553–2570. [Google Scholar] [CrossRef]
Jaeger, D.M.; Looze, A.C.M.; Raleigh, M.S.; Miller, B.W.; Friedman, J.M.; Wessman, C.A. From Flowering to Foliage: Accelerometers Track Tree Sway to Provide High-Resolution Insights into Tree Phenology. Agric. For. Meteorol. 2022, 318, 108900. [Google Scholar] [CrossRef]
Gray, R.E.J.; Ewers, R.M. Monitoring Forest Phenology in a Changing World. Forests 2021, 12, 297. [Google Scholar] [CrossRef]
de Langre, E. Plant Vibrations at All Scales: A Review. J. Exp. Bot. 2019, 70, 3521–3531. [Google Scholar] [CrossRef]
Sano, M.; Nakagawa, Y.; Sugimoto, T.; Shirakawa, T.; Yamagishi, K.; Sugihara, T.; Ohaba, M.; Shibusawa, S. Estimation of Water Stress of Plant by Vibration Measurement of Leaf Using Acoustic Radiation Force. Acoust. Sci. Technol. 2015, 36, 248–253. [Google Scholar] [CrossRef]
Fan, J.; Guo, X.; Wang, C.; Lu, X.; Wu, S. The State of Motion Stereo about Plant Leaves Monitoring System Design and Simulation. In Proceedings of the IFIP Advances in Information and Communication Technology; Li, D., Zhao, C., Eds.; Springer International Publishing: Jilin, China, 2017; Volume AICT-546, pp. 419–431. [Google Scholar]
Paturkar, A.; Sen Gupta, G.; Bailey, D. Plant Trait Measurement in 3D for Growth Monitoring. Plant Methods 2022, 18, 59. [Google Scholar] [CrossRef]
Zhao, D.-J.; Chen, Y.; Wang, Z.-Y.; Xue, L.; Mao, T.-L.; Liu, Y.-M.; Wang, Z.-Y.; Huang, L. High-Resolution Non-Contact Measurement of the Electrical Activity of Plants in Situ Using Optical Recording. Sci. Rep. 2015, 5, 13425. [Google Scholar] [CrossRef]
Ariza-Sentís, M.; Vélez, S.; Martínez-Peña, R.; Baja, H.; Valente, J. Object Detection and Tracking in Precision Farming: A Systematic Review. Comput. Electron. Agric. 2024, 219, 108757. [Google Scholar] [CrossRef]
Gibbs, J.A.; Burgess, A.J.; Pound, M.P.; Pridmore, T.P.; Murchie, E.H. Recovering Wind-Induced Plant Motion in Dense Field Environments via Deep Learning and Multiple Object Tracking. Plant Physiol. 2019, 181, 28–42. [Google Scholar] [CrossRef]
Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving into High Quality Object Detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings of the 29th International Conference on Neural Information Processing Systems, Virtual, 22–26 November 2022; MIT Press: Cambridge, MA, USA, 2015; Volume 1, pp. 91–99. [Google Scholar]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9626–9635. [Google Scholar]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-Time Object Detection. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 19–21 June 2024; pp. 16965–16974. [Google Scholar]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
Dalal, M.; Mittal, P. A systematic review of deep learning-based object detection in agriculture: Methods, challenges, and future directions. Comput. Mater. Contin. 2025, 84, 57–91. [Google Scholar] [CrossRef]
Khan, Z.; Shen, Y.; Liu, H. ObjectDetection in Agriculture: A Comprehensive Review of Methods, Applications, Challenges, and Future Directions. Agriculture 2025, 15, 1351. [Google Scholar] [CrossRef]
Lin, C.; Jiang, W.; Zhao, W. Others DPD-YOLO: Dense Pineapple Fruit Target Detection Algorithm in Complex Environments Based on YOLOv8 Combined with Attention Mechanism. Front. Plant Sci. 2025, 16, 1523552. [Google Scholar] [CrossRef] [PubMed]
Wojke, N.; Bewley, A.; Paulus, D. Simple Online and Realtime Tracking with a Deep Association Metric. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; IEEE Press: Beijing, China, 2017; pp. 3645–3649. [Google Scholar]
Duong-Trung, H.; Duong-Trung, N. Integrating YOLOv8-Agri and DeepSORT for Advanced Motion Detection in Agriculture and Fisheries. EAI Endorsed Trans. Ind. Netw. Intell. Syst. 2024, 11, e4. [Google Scholar] [CrossRef]
Tu, S.; Zeng, Q.; Liang, Y. Others Automated Behavior Recognition and Tracking of Group-Housed Pigs with an Improved DeepSORT Method. Agriculture 2022, 12, 1907. [Google Scholar] [CrossRef]
Acharya, P.; Burgers, T.; Nguyen, K.-D. A Deep-Learning Framework for Spray Pattern Segmentation and Estimation in Agricultural Spraying Systems. Sci. Rep. 2023, 13, 7545. [Google Scholar] [CrossRef] [PubMed]
Yue, X.; Qi, K.; Yang, F.; Na, X.; Liu, Y.; Liu, C. RSR-YOLO: A Real-Time Method for Small Target Tomato Detection Based on Improved YOLOv8 Network. Discov. Appl. Sci. 2024, 6, 268. [Google Scholar] [CrossRef]
Zhang, Z.; Yang, Y.; Xu, X.; Liu, L.; Yue, J.; Ding, R.; Lu, Y.; Liu, J.; Qiao, H. GVC-YOLO: A Lightweight Real-Time Detection Method for Cotton Aphid-Damaged Leaves Based on Edge Computing. Remote Sens. 2024, 16, 3046. [Google Scholar] [CrossRef]
Luo, W.; Yuan, S. Enhanced YOLOv8 for Small-Object Detection in Multiscale UAV Imagery: Innovations in Detection Accuracy and Efficiency. Digit. Signal Process. 2025, 158, 104964. [Google Scholar] [CrossRef]
Shengde, C.; Junyu, L.; Xiaojie, X.; Jianzhou, G.; Shiyun, H.; Zhiyan, Z.; Yubin, L. Detection and Tracking of Agricultural Spray Droplets Using GSConv-Enhanced YOLOv5s and DeepSORT. Comput. Electron. Agric. 2025, 235, 110353. [Google Scholar] [CrossRef]
Rehman, T.U.; Zhang, L.; Wang, L.; Ma, D.; Maki, H.; Sánchez-Gallego, J.A.; Mickelbart, M.V.; Jin, J. Automated Leaf Movement Tracking in Time-Lapse Imaging for Plant Phenotyping. Comput. Electron. Agric. 2020, 175, 105623. [Google Scholar] [CrossRef]
Badgujar, C.M.; Poulose, A.; Gan, H. Agricultural object detection with you only look once (YOLO) algorithm: A bibliometric and systematic literature review. Comput. Electron. Agric. 2024, 223, 109090. [Google Scholar] [CrossRef]
Sapkota, M.; Virk, S.; Rains, G. Spray Deposition and Quality Assessment at Varying Ground Speeds for an Agricultural Sprayer with and without a Rate Controller. AgriEngineering 2023, 5, 506–519. [Google Scholar] [CrossRef]
Li, L.; Hu, Z.; Liu, Q.; Yi, T.; Han, P.; Zhang, R.; Pan, L. Effect of Flight Velocity on Droplet Deposition and Drift of Combined Pesticides Sprayed Using an Unmanned Aerial Vehicle Sprayer in a Peach Orchard. Front. Plant Sci. 2022, 13, 981494. [Google Scholar] [CrossRef] [PubMed]
Bhosale, Y.; Esmaili, E.; Bhar, K.; Jung, S. Bending, Twisting and Flapping Leaf upon Raindrop Impact. Bioinspiration Biomim. 2020, 15, 036007. [Google Scholar] [CrossRef] [PubMed]
Gilet, T.; Tadrist, L. Leaf Oscillation and Upward Ejection of Droplets in Response to Drop Impact. Phys. Rev. Fluids 2025, 10, 053601. [Google Scholar] [CrossRef]
Damak, M.; Hyder, M.N.; Varanasi, K.K. Enhancing Droplet Deposition through In-Situ Precipitation. Nat. Commun. 2016, 7, 12560. [Google Scholar] [CrossRef]
Wang, C.; Cao, Y.; Chen, Y.; Xu, L.; Qiu, W. Understanding Dilution Effects on Particle-Containing Pesticide Droplets Deposition on Rice Leaf via Developing CFD-VOF-DPM Model. Pest Manag. Sci. 2024, 80, 4725–4735. [Google Scholar] [CrossRef]
Xu, T.; Li, X.; Ding, L.; Qi, Y.; Lu, H.; Xiao, W.; Lv, X.; Li, J. Effects of Ambient Wind on Droplet Deposition Uniformity in Orchard Air-Assisted Sprayers. Sci. Rep. 2026, 16, 2250. [Google Scholar] [CrossRef]

Figure 1. Technical route of this study.

Figure 2. Pepper leaf with target points.

Figure 3. The components of the spray system.

Figure 4. On-site diagram of leaf motion data collection under spraying operations.

Figure 5. Schematic diagram of a binocular camera used to acquire leaf motion parameters.

Figure 6. Improved YOLOv8 network structure.

Figure 7. Improved DeepSORT algorithm.

Figure 8. Aggregation behavior of droplets on the leaf surface at different times.

Figure 9. Comparison of detection performance between baseline and improved YOLOv8 under different occlusion ratios (0%, 2%, 5%).

Figure 10. Recognition results under different occlusion conditions. (a) YOLOv8 recognition result when occlusion ratio is 0%; (b) Improved YOLOv8 recognition result when occlusion ratio is 0%; (c) YOLOv8 recognition result when occlusion ratio is 5% (The red line arrow indicates the unidentified target points.); (d) Improved YOLOv8 recognition result when occlusion ratio is 5%.

Figure 11. mAP@0.5 of Models with/without SAM (Ablation Experiment).

Figure 12. Target point positioning error and detection integrity under different occlusion ratios.

Figure 13. Comparison of tracking performance between baseline and improved models: (a) Core indicators of 60 s long-sequence tracking (3000 FPS), (b) MOTA values under multi-scenes.

Figure 14. Motion speed of the leaf (leaf tip) under droplet impact, with different colors representing different target points.

Figure 15. The 3D motion trajectory of the leaf tip and its projections on different planes, where the color of the 3D curve indicates the position of the leaf at different moments. The leaf moves continuously from the highest point in the normal direction (red) to the lowest point in the normal direction (orange).

Figure 16. Motion speed and frequency of each point on the leaf midrib.

Table 1. Comparison of detection performance between baseline YOLOv8 and improved model (with SAM) under different occlusion ratios.

Occlusion Ratio	Model	mAP@0.5 (%)	Precision (%)	Recall (%)
0%	Baseline YOLOv8	95.1 ± 1.5	95.2 ± 1.1	94.8 ± 1.0
0%	Improved	97.0 ± 1.2	96.4 ± 0.8	95.9 ± 0.9
2%	Baseline YOLOv8	88.2 ± 1.6	-	-
2%	Improved	92.4 ± 1.3	92.1 ± 1.0	91.5 ± 1.2
5%	Baseline YOLOv8	61.8 ± 2.1	62.4 ± 2.3	61.1 ± 1.9
5%	Improved	81.4 ± 1.8	84.3 ± 1.6	81.7 ± 1.5

Table 2. Tracking performance of baseline and improved DeepSORT under different occlusion levels.

Occlusion Ratio	Model	MOTA	IDF1	ID Switches	Average Trajectory Length
0	YOLOv8	96.8	97.2	2	1653
0	Improved YOLOv8	97.5	98.0	1	1678
2	YOLOv8	89.4	90.1	6	1578
2	Improved YOLOv8	93.2	94.3	2	1621
5	YOLOv8	68.2	66.7	15	1465
5	Improved YOLOv8	87.5	89.1	5	1557

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Guo, F.; Liu, K.; Ma, J.; Qiu, B. Application of the Improved YOLOv8-DeepSORT Framework in Motion Tracking of Pepper Leaves Under Droplet Occlusion. Agronomy 2026, 16, 384. https://doi.org/10.3390/agronomy16030384

AMA Style

Guo F, Liu K, Ma J, Qiu B. Application of the Improved YOLOv8-DeepSORT Framework in Motion Tracking of Pepper Leaves Under Droplet Occlusion. Agronomy. 2026; 16(3):384. https://doi.org/10.3390/agronomy16030384

Chicago/Turabian Style

Guo, Fengfeng, Kuan Liu, Jing Ma, and Baijing Qiu. 2026. "Application of the Improved YOLOv8-DeepSORT Framework in Motion Tracking of Pepper Leaves Under Droplet Occlusion" Agronomy 16, no. 3: 384. https://doi.org/10.3390/agronomy16030384

APA Style

Guo, F., Liu, K., Ma, J., & Qiu, B. (2026). Application of the Improved YOLOv8-DeepSORT Framework in Motion Tracking of Pepper Leaves Under Droplet Occlusion. Agronomy, 16(3), 384. https://doi.org/10.3390/agronomy16030384

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Application of the Improved YOLOv8-DeepSORT Framework in Motion Tracking of Pepper Leaves Under Droplet Occlusion

Abstract

1. Introduction

2. Materials and Methods

2.1. Method Implementation Process

2.2. Data Collection

2.2.1. Crop Cultivation and Target Production

2.2.2. Spraying System

2.2.3. Leaf Motion Data Collection Under Spraying Operations

2.3. Detection and Tracking Methods

2.3.1. Improved YOLOv8

2.3.2. DeepSORT

3. Results

3.1. Target Occlusion Evolution

3.2. Target Detection Results of Improved YOLOv8

3.2.1. Comparison of Detection Performance Under Different Occlusion Ratios and Types

3.2.2. Ablation Experiment of SAM

3.2.3. Consistency Verification Between Detection Indicators and Practical Application Effects

3.3. Leaf Tracking Results

3.3.1. Verification of Tracking Stability in Long Sequences and Multi-Scenes

3.3.2. Occlusion Recovery Capability

3.4. Analysis of Leaf Motion Results

3.4.1. Quantification of Motion Parameters at Each Point of the Leaf Midrib

3.4.2. Analysis of Dynamic Characteristics of Leaf Motion

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI