Detection of Floricane Raspberry Shrubs from Unmanned Aerial Vehicle Imagery Using YOLO Models

Kapłan, Magdalena; Buczyński, Kamil; Jarosz, Zbigniew

doi:10.3390/agriculture16060664

Open AccessArticle

Detection of Floricane Raspberry Shrubs from Unmanned Aerial Vehicle Imagery Using YOLO Models

by

Magdalena Kapłan

^*

,

Kamil Buczyński

and

Zbigniew Jarosz

Institute of Horticulture Production, University of Life Sciences in Lublin, Głęboka 28, 20-612 Lublin, Poland

^*

Author to whom correspondence should be addressed.

Agriculture 2026, 16(6), 664; https://doi.org/10.3390/agriculture16060664

Submission received: 19 February 2026 / Revised: 9 March 2026 / Accepted: 12 March 2026 / Published: 14 March 2026

(This article belongs to the Special Issue Adapting Horticultural Plant Cultivation Technology and Storage to Changing Conditions)

Download

Browse Figures

Versions Notes

Abstract

The present study investigated the detection performance of the YOLOv8s, YOLO11s, and YOLO12s models, implemented within convolutional neural network architectures, for identifying floricane raspberry (Rubus idaeus L.) shrubs using RGB imagery and multispectral data acquired in the near-infrared, red-edge, red, and green spectral bands with a DJI Mavic 3 Multispectral drone. Model training and validation were conducted to evaluate both within-modality detection performance and cross-modality transferability. Under all training scenarios, the YOLO-based detectors reached near-saturated accuracy levels. However, cross-domain assessments demonstrated substantial variability depending on the spectral configuration of the input imagery. Overall, the combination of UAV-based multispectral sensing with convolutional neural network detection frameworks establishes a technological basis for automated shrub monitoring and constitutes a meaningful advancement toward intelligent raspberry production systems. This integration further creates new prospects for the technological development of cultivation practices for this crop within the rapidly evolving landscape of artificial intelligence-driven agriculture.

Keywords:

Rubus; CNN; convolutional neural network; drone; UAV; multispectral

1. Introduction

Modern technologies employed in agriculture offer significant potential for enhancing both the productivity and profitability of farming operations. A key prerequisite for the effective transformation of the agricultural sector is the selection of appropriate technological solutions that address the specific needs and requirements of contemporary agriculture [1]. Current precision agriculture techniques rely on the systematic acquisition of high-quality data, which can be effectively collected using unmanned aerial systems (UASs). Further development in this area requires an advanced level of autonomy and the integration of drones and other cyber physical systems within a coherent farm management framework. Such an approach enables the efficient application of big data technologies and artificial intelligence (AI) in agricultural decision support processes [2].

Unmanned Aerial Vehicles (UAVs), commonly known as drones, have undergone rapid and dynamic development over the past several decades [3]. The transformative potential of drones is regarded as a key factor in achieving more sustainable, productive, and resilient agriculture in the face of the global challenges of the 21st century. At the same time, emphasis is placed on the importance of an integrated approach that combines technological innovation with appropriately tailored policies and comprehensive training for farmers [4].

Remote sensing is a technique used to obtain information about an object or phenomenon from a distance through sensors that can be mounted on satellites, unmanned aerial vehicles, or unmanned ground vehicles [5]. Remote sensing technologies are applied across numerous domains, providing diverse and easily accessible data sources through the use of various types of sensors, including RGB cameras, multispectral and hyperspectral imagers, as well as LiDAR systems [6]. The easy accessibility of information, combined with improved temporal, radiometric, and spatial resolution, has led to the accumulation of massive volumes of data. Meeting the analytical demands posed by these datasets requires the development of innovative solutions [7].

The integration of modern data acquisition platforms with advanced analytical methods requires a profound understanding of the complex and dynamic processes occurring in agriculture, necessitating an approach based on fractional thinking. In this context, particular promise is associated with the development of artificial intelligence, especially machine learning (ML) and computer vision (CV) methods. Their ability to efficiently process and interpret large datasets plays a crucial role in transforming raw information into actionable knowledge, representing one of the principal directions of contemporary research in agricultural science [8]. Deep learning enables the construction of computational models based on multiple layers of processing, capable of automatically learning data representations at various levels of abstraction. These techniques have profoundly revolutionized approaches to object recognition and detection in images. A particularly groundbreaking milestone was the development of convolutional neural networks (CNNs) in the early 2010s, which marked a shift away from manually engineered features toward the automatic extraction of representations directly from pixel-level image data, thereby redefining the foundations of image analysis [9].

A major breakthrough in computer image analysis was achieved with the development of AlexNet [10], which, during the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) [11], demonstrated the substantial superiority of convolutional neural networks trained on graphics processing units (GPUs) over earlier classical methods, thereby redefining the field of image analysis. Convolutional Neural Networks consist of convolutional layers characterized by weight sharing and local receptive fields [12], nonlinear activation functions, most commonly the Rectified Linear Unit (ReLU) [13], and aggregation layers such as max or average pooling [14,15]. Weight sharing ensures that the number of parameters increases slower than in fully connected networks [12,16]. The first effective object detectors, R-CNN, Fast R-CNN, and Faster R-CNN, employed a two-stage procedure consisting of region proposal generation followed by the classification of each proposed region using a CNN. Although these models achieved high accuracy, they were relatively slow for real-time applications [17,18,19]. This moment proved to be a breakthrough in the development of detection models, as the introduction of the Regions with CNN features (R-CNN) concept overcame the stagnation that had previously characterized the field. This novel approach revolutionized the object detection process, initiating a period of rapid advancement and accelerating progress at an unprecedented pace [20].

Single-stage algorithms such as Single Shot Detector (SSD) and You Only Look Once (YOLO), based on CNN architectures, treat object detection as a regression problem performed in a single network pass. This approach enables high throughput while maintaining an acceptable reduction in mean Average Precision [21,22]. YOLO represents an exceptionally flexible and efficient solution for real-time object detection, proving effective across a wide range of domains. Ongoing research efforts, particularly those focused on developing lighter and more energy-efficient variants, have ensured that this technology consistently remains at the forefront of modern object detection methods [23]. YOLO divides the input image into an S × S grid, assigning each cell the responsibility for detecting objects whose centers fall within it. Each cell predicts bounding boxes along with their coordinates, dimensions, and confidence scores, as well as conditional class probabilities. By simultaneously predicting the positions and classes of all objects in the image, YOLO enables end-to-end training and real-time detection while maintaining a high mean Average Precision [22]. The integration of the YOLO algorithm with Unmanned Aerial Vehicles significantly enhances their functionality, enabling large-scale monitoring and the acquisition of detailed information that would be difficult and time-consuming to collect manually. Each successive YOLO variant introduces distinct innovations aimed at optimizing both detection speed and accuracy [23].

The growing emphasis on enhancing drone autonomy is increasingly driven by deep learning techniques applied in computer vision. The global use of Unmanned Aerial Vehicles has expanded rapidly, encompassing both military and civilian applications. Artificial intelligence significantly augments the capabilities of autonomous UAVs, contributing to advances across a wide range of disciplines [24]. The development of unmanned aerial vehicles is expected to progress toward the integration of increasingly advanced sensing technologies and data acquisition systems, including thermal imaging, spectral sensors, and LiDAR platforms, enabling the collection of high-resolution and multidimensional information on crop status. Simultaneously, UAV systems are becoming progressively more autonomous through the implementation of optimized flight trajectories and enhanced data acquisition protocols, thereby improving operational efficiency. Machine learning is anticipated to play a central role in this transformation, facilitating the analysis of large-scale datasets and supporting the development of site-specific crop management strategies, ultimately contributing to the advancement of precision agriculture [25]. Unmanned aerial vehicles enhanced by artificial intelligence are expected to operate with a high degree of autonomy, enabling real-time intervention while substantially reducing the need for direct human supervision [26].

Species of the genus Rubus L. are gaining importance as promising sources of food with potential health benefits, as well as valuable raw materials used in pharmaceuticals and medicine. Their increasing economic and functional value provides a strong rationale for the intensification of their cultivation [27]. The implementation of modern technologies to optimize the cultivation of Rubus species is of particular importance for the future of this genus. Only a limited number of studies have addressed the use of convolutional neural networks, multispectral imaging, and unmanned aerial vehicles in relation to these plants. In particular, the automated detection of floricane raspberry shrubs from UAV imagery remains scarcely investigated, despite its practical relevance for precision cultivation. Furthermore, existing UAV-based detection studies in agriculture predominantly focus on single-modality scenarios. A structured assessment of cross-modality transferability especially using multispectral imagery and successive YOLO architectures remains largely unexplored. Addressing both the crop-specific and cross-spectral evaluation gaps is essential for developing robust and transferable UAV-based detection systems for precision agriculture. Achieving drone autonomy in detecting raspberry shrubs within a predefined area would substantially reduce the time required for precise mission planning and manual marking of crop rows. Moreover, it would enhance the efficiency of shrub monitoring in terms of pest occurrence and plant health. In the case of drones performing tasks such as mineral fertilization, biostimulation, or crop protection, such autonomy could also translate into greater precision and overall effectiveness of agricultural treatments. The aim of this study was to evaluate the detection accuracy of YOLO models for raspberry shrub identification using RGB and multispectral (NIR, R, RE, G) drone imagery.

2. Materials and Methods

2.1. Data Collection

2.1.1. Drone

Data were collected using a DJI Mavic 3 Multispectral (SZ DJI Technology Co., Ltd., Shenzhen, China) drone equipped with an RGB camera (4/3″ CMOS, 20 MP, 5280 × 3956 px) and a multispectral camera (1/2.8″ CMOS, 5 MP, 2592 × 1944 px) capturing the following spectral bands: Near-Infrared (NIR) 860 ± 26 nm, Red Edge (RE) 730 ± 16 nm, Red (R) 650 ± 16 nm, and Green (G) 560 ± 16 nm. To ensure precise altitude control relative to raspberry rows, the drone’s built-in RTK module was utilized, providing centimeter level positioning accuracy and microsecond-level synchronization between the flight control system and the cameras. This setup enables highly accurate georeferencing of captured imagery [28]. It should be noted that the UAV platform was used exclusively for image acquisition purposes and was not used to evaluate the trained YOLO models.

Data were collected during the 2025 growing season on raspberry crops, beginning from the stage of inflorescence development and continuing through to the end of harvest. For each plantation, data were collected ten times at the same developmental stages. The drone was flown along the crop rows, with the gimbal oriented vertically downward and centered on the middle of each row. The maximum flight altitude was adjusted relative to the visibility of adjacent rows, ensuring that it did not exceed 50% of their visible width, thus maintaining optimal spatial coverage and image clarity. In practice, the flight altitude relative to the raspberry shrubs ranged approximately from 2.5 m to 6 m above ground level (AGL). Based on the manufacturer’s camera specifications, this corresponds to an approximate ground sampling distance (GSD) of 0.07–0.16 cm/pixel for the RGB sensor and 0.11–0.27 cm/pixel for the multispectral sensors. These values are provided as indicative estimates derived from nominal sensor parameters and flight altitude. A total of 9750 multisensor image captures were acquired using the DJI Mavic 3 Multispectral drone, resulting in 48,825 individual images. Each multisensor capture consisted of one image in the RGB color space (Red, Green, Blue) and four single-band images NIR, RE, R, and G recorded by the multispectral camera (Figure 1). Consequently, five separate datasets were generated, each containing 9750 images corresponding to a specific imaging modality.

2.1.2. Overview of the Research Area

Raspberry plantations were located in the eastern part of Poland, in the Lublin Voivodeship, in the Lublin Upland macroregion. Data were collected from four farms across six distinct raspberry plantations, encompassing five different floricane raspberry cultivars of Rubus idaeus L.: Glen Ample, Glen Carron, Glen Mor, Przehyba, and Laszka. In total, data was collected from an area of three hectares. The plantations were located in two primary areas at the following geographic coordinates: 51.193986° N, 21.817580° E and 51.104057° N, 22.193973° E.

All plantations were cultivated in a single-row trellis system, though they exhibited agrotechnical variability in management practices. The supporting structures consisted of concrete or wooden poles. The inter-row spacing varied among plantations, with distances of 2.5 m, 3.0 m, 3.5 m, and 4.0 m. Some sites were equipped with anti-frost protection systems based on Flipper-type sprinklers, supplied with water through PE pipes suspended above the rows, while others lacked such systems. Additionally, the presence or absence of lateral wires and support strings used to hold branches during fruiting differed between plantations.

To ensure a high generalization capability of the trained object detection models, imagery was collected under diverse cultivation, lighting, and weather conditions. The photographs were taken during daylight hours, covering a broad range of times of day (from morning to evening), varying degrees of cloud cover and sunlight intensity, as well as different surface moisture levels of the plants, ranging from dry to wet, due to rainfall or morning dew condensation. This variability made it possible to account for natural fluctuations in contrast and light reflections, which constitute critical factors influencing the performance and robustness of computer vision models under real-world conditions.

2.2. Data Preprocessing

2.2.1. Image Scaling

All images from the RGB, NIR, RE, R, and G modalities were uniformly rescaled to 640 × 480 pixels (4:3 aspect ratio preserved) using a custom Python program designed to standardize images dimensions. During model training, all input images originally sized 640 × 480 px were resized to 640 × 640 px using letterbox padding implemented in the YOLO data loading pipeline. This approach preserved the original aspect ratio of the images, thereby preventing geometric distortion of the targets. The selected 640-pixel input resolution aligns with the standard YOLO training configuration and ensures uniformity across the entire dataset. All preprocessing, training, and evaluation scripts used in this study are available in the associated repository [29].

2.2.2. Image Labeling

Using the LabelImg tool [30], rows of raspberry bushes were manually labeled for each image modularity. Only rows that were fully visible along which images were taken were labeled. Bushes in adjacent rows that were not fully visible were not labeled. Examples of such situations during the labeling process are presented in Figure 2.

When there was a gap in the row, labels were applied only to the area where the bushes occurred, dividing the row into sections. Figure 3 shows examples with labels for both full rows without gaps and also for situations where the row was not fully filled with plants. This is an exception, but may be more common, especially in older plantations.

A total of 49,220 objects were manually annotated using the LabelImg tool. This resulted in 9844 labeled instances for each dataset composed of modular images, including RGB, NIR, RE, R, and G.

2.2.3. Preparing Datasets

The collected data were divided into five independent datasets corresponding to the image modalities: RGB, NIR, RE, R, and G. Each modality dataset was further split into three subsets: a training set (7802 images, 79.90%) containing 7868 labeled instances, a validation set (966 images, 9.89%) containing 974 labeled instances, and a test set (997 images, 10.21%) containing 1002 labeled instances. The data split was performed using a program developed in Python. The program verified the completeness of each multisensor series, ensuring that every capture contained the full set of images in the RGB, NIR, RE, R, and G channels, and that the corresponding annotation files were valid before the final division. Each dataset consisted of images originating from the same multisensor captures, including the complete set of spectral channels. The division into training, validation, and test subsets was performed synchronously across all modalities, ensuring that images belonging to the same multisensor series were always assigned to the same subset. This approach preserved data consistency between spectral channels and eliminated potential data leakage between the training and testing sets.

2.3. Models Training

2.3.1. YOLO Models

Three YOLO architectures were selected for training and evaluation: YOLOv8s, YOLO11s, and YOLO12s. The selection aimed to ensure methodological consistency across experiments and to represent the recent evolution of efficient one-stage detectors. YOLOv8s served as a compact and well established baseline, while YOLO11s and YOLO12s incorporated successive architectural refinements. The small (‘s’) variants were chosen to maintain comparable model capacities and real-time feasibility during inference, which is particularly important for onboard UAV processing where computational resources, energy efficiency, and payload limitations constrain model complexity.

Introduced in 2023, YOLOv8 represents the eighth generation of the You Only Look Once family and marks a significant milestone in the ongoing evolution of real-time object detection architectures. Unlike its predecessors, this model is not a simple extension of YOLOv7 but a completely redesigned framework developed by the Ultralytics team, emphasizing enhanced modularity, flexibility, and training efficiency. The most notable innovation is the transition to an anchor-free approach, which removes the need for predefined anchor boxes, thereby simplifying the detection process and improving the model’s adaptability to objects with unconventional shapes and sizes. Additionally, YOLOv8 employs an enhanced CSPDarknet architecture as its backbone and an optimized PANet for multiscale feature fusion, resulting in improved accuracy, stability, and inference speed compared to earlier generations [31,32,33].

Introduced in 2024, YOLO11 represents a significant advancement in computer vision, combining high performance with remarkable versatility across diverse application domains. This version of the YOLO architecture exhibits notable improvements in both detection accuracy and processing speed while simultaneously reducing the number of required model parameters, potentially opening new opportunities for practical deployment in various sectors. YOLO11 is built upon three key architectural innovations: the Cross Stage Partial block with kernel size 2, which streamlines feature propagation across stages; the Spatial Pyramid Pooling–Fast (SPPF) module, which efficiently aggregates multiscale contextual information in a single pass; and the Convolutional Parallel Spatial Attention block (C2PSA), which selectively enhances spatially relevant features critical for object detection. All these components integrate seamlessly with a unified detection head capable of handling traditional bounding box detection, segmentation, skeleton keypoint localization, and oriented object detection. Furthermore, the dual anchor assignment mechanism stabilizes the training process without increasing computational complexity, ensuring robust and efficient model convergence [34,35,36].

Released in 2025, YOLO12 is the first iteration of the YOLO family designed around attention mechanisms. It incorporates a lightweight Area Attention (A2) module, an enhanced backbone based on the Residual Efficient Layer Aggregation Network (R-ELAN), and integrated FlashAttention, maintaining detection speeds comparable to earlier purely convolutional models. These advancements enable the architecture to combine a broader spatial context with efficient memory utilization, resulting in a more expressive yet computationally optimized framework. YOLO12 operates across five scales (N–X), all supported by a single unified multi-task head, which facilitates concurrent handling of diverse vision tasks such as detection, segmentation, and keypoint localization within a cohesive design [37,38].

2.3.2. Training Parameters

Training was conducted for 200 epochs with a batch size of 32 and an input resolution of 640 × 640 pixels. The optimization followed the stochastic gradient descent (SGD) algorithm with an initial learning rate (lr₀ = 0.01), a final learning rate factor (lrf = 0.01), momentum of 0.937, and weight decay of 0.0005. Data augmentation was enabled to improve generalization, and training was performed using eight data loading workers. No early stopping was applied (patience = 0), ensuring full convergence across all epochs. The random seed was fixed (seed = 0) to guarantee experiment reproducibility. All training processes were executed on an NVIDIA GPU using CUDA acceleration (Table 1). All models were initialized using the official pretrained weights provided by the Ultralytics framework and subsequently fine-tuned on the prepared dataset.

The original input images (640 × 480 px) were automatically resized to 640 × 640 px through letterbox padding within the YOLO data loader. This ensured consistent tensor dimensions while maintaining the native aspect ratio and avoiding geometric deformation of the canopy structures. Since the YOLO framework natively processes three channel (RGB) inputs, the single-channel spectral modalities (NIR, RE, R, and G) were converted to three-channel tensors by duplicating the original channel across all three input dimensions. This approach maintained consistency in the input tensor shape while retaining the original spectral characteristics of each modality.

2.4. Evaluation of Trained Models

The performance of each model was evaluated using standard object detection metrics: Precision, Recall, mean Average Precision at a 0.5 Intersection over Union (IoU) threshold (mAP₅₀), the mean Average Precision averaged over IoU thresholds from 0.5 to 0.95 (mAP_50:95), and the F₁-score. These metrics follow the evaluation protocol implemented in the official Ultralytics YOLO framework and correspond to the standard performance indicators recommended for YOLO-based object detection models [39]. Metric calculations were automated using lightweight Python scripts built upon the official YOLO evaluation pipeline, ensuring reproducible and consistent metric computation across models and datasets.

Precision and Recall quantify the accuracy and completeness of detections, respectively, as defined in Equations (1) and (2), where TP, FP, and FN denote the numbers of true positives, false positives, and false negatives, respectively.

P r e c i s i o n = \frac{T P}{T P + F P}

(1)

R e c a l l = \frac{T P}{T P + F N}

(2)

The mean Average Precision (mAP) quantifies the area under the Precision–Recall curve, summarizing the trade-off between detection confidence and accuracy across thresholds. mAP₅₀ and mAP_50:95 correspond to single- and multi-threshold variants, computed at an Intersection over Union threshold of 0.5, and as an average across IoU thresholds from 0.5 to 0.95 in 0.05 increments (Equations (3) and (4)).

{m A P}_{50} = \frac{1}{N} \sum_{i = 1}^{N} {A P}_{i} (I o U = 0.5)

(3)

{m A P}_{50 : 95} = \frac{1}{N} \sum_{i = 1}^{N} (\frac{1}{10} \sum_{k = 0}^{9} {A P}_{i} (I o U = 0.5 + 0.05 k))

(4)

The F₁-score (Equation (5)) expresses the harmonic balance between Precision and Recall. Together, these metrics provide a comprehensive evaluation of detection quality, jointly reflecting localization accuracy, confidence calibration, and robustness across all image modalities.

F_{1} - S c o r e = 2 \cdot \frac{P r e c i s i o n \cdot R e c a l l}{P r e c i s i o n + R e c a l l}

(5)

Classical object detection metrics such as Precision, Recall, and mAP evaluate detection accuracy under known conditions, where the test data distribution closely matches that of the training set. However, in multispectral or multimodal imaging scenarios, models may encounter spectral domains that deviate significantly from their training distribution. To assess such cross-domain adaptability, we introduce a set of complementary metrics that quantify how well the detector preserves its performance when the input modality changes. These measures collectively capture the elasticity, robustness, and stability of object detectors under spectral domain shifts. The proposed metrics were implemented in Python scripts dedicated to model centric and data centric analysis, ensuring a reproducible and framework-independent evaluation setup. To assess such cross-domain adaptability, and in the absence of standardized evaluation criteria for cross-spectral transferability in object detection, we define a set of derived indicators based on the mAP distribution. These indicators were formulated specifically for this study to quantify how well the detector preserves its performance when the input modality changes.

In-domain mean Average Precision (

m A P_{i n}

) represents the reference performance obtained when each model is evaluated on the same modality on which it was trained. This term corresponds to the diagonal elements of the modality–modality performance matrix (

m A P_{i i}

) and reflects the model’s optimal detection accuracy under matching training and testing conditions (Equation (6)).

{m A P}_{i n} = \frac{1}{M} \sum_{i = 1}^{M} {m A P}_{i i}

(6)

Cross-domain mean Average Precision (

m A P_{o u t}

) measures the average detection performance when a model trained on modality

i

is evaluated on all other modalities

j \neq i

(Equation (7)).

{m A P}_{o u t} = \frac{1}{M (M - 1)} \sum_{i = 1}^{M} \sum_{j \neq i} {m A P}_{i j}

(7)

Relative mAP Retention (ρ_ret) is defined as the ratio

m A P_{o u t} / m A P_{i n}

, quantifying how much detection accuracy is retained when the model operates on unseen spectral domains. A value close to 1 indicates high cross-domain robustness, whereas lower values suggest strong modality dependence (Equation (8)).

ρ_{r e t} = \frac{{m A P}_{o u t}}{{m A P}_{i n}}

(8)

Worst-Case Retention (WCR) expresses the minimum normalized cross-domain performance, calculated as the ratio between the lowest cross-domain mAP and the mean in-domain mAP. It provides a conservative measure of worst-case degradation when transferring a model to another image modality (Equation (9)).

W C R = \frac{m i n ({m A P}_{i j} : j \neq i)}{{m A P}_{i n}}

(9)

Cross-Domain Variability (σ_out) quantifies the stability of model performance across different target modalities. It is computed as the standard deviation of cross-domain mAP values around their mean (

m A P_{o u t}

). Lower variability indicates that the model maintains consistent detection accuracy across multiple domains, regardless of spectral differences (Equation (10)).

σ_{o u t} = \sqrt{\frac{1}{M (M - 1)} \sum_{i = 1}^{M} \sum_{j \neq i} ({m A P}_{i j} - {m A P}_{o u t})^{2}}

(10)

Although all proposed cross-domain indicators originate from the same mAP distribution, they are designed to capture complementary facets of spectral generalization rather than redundant information. Specifically, mAPᵢₙ defines baseline reference performance under matched conditions, mAPₒᵤₜ quantifies average transfer accuracy across unseen modalities, Relative mAP Retention expresses relative robustness, Worst-Case Retention constrains worst-case degradation, and Cross-Domain Variability measures stability of cross-domain accuracy. Together, these indicators provide a complementary and coherent characterization of model transferability, spanning effectiveness, robustness, and consistency, thereby enabling a structured and comprehensive assessment of spectral generalization.

2.5. Hardware Configuration

Table 2 summarizes the hardware and software configuration of the primary workstation used for model training and evaluation. All YOLO models were trained on a high-performance desktop system equipped with an AMD Ryzen 9 9950X CPU and an NVIDIA GeForce RTX 5080 GPU (16 GB VRAM). The setup was optimized for stable multi-GPU utilization and consistent batch throughput across all training runs. The software environment was based on Microsoft Windows 11 Pro with Python 3.11, PyTorch 2.8, and CUDA 12.8, ensuring compatibility with the Ultralytics YOLO framework. This configuration provided reproducible training performance and served as the reference platform for baseline inference results later compared with embedded deployments.

3. Results

3.1. Evaluation of YOLO Models Trained and Tested on Corresponding Image Modalities

Based on the results summarized in Table 3, all evaluated YOLO models achieved near-saturated detection performance across all analyzed image modalities. Precision, Recall, mAP₅₀ and F₁-scores values remained consistently within the 0.994–0.999 range, indicating exceptionally stable predictions and minimal variance between network architectures. Among the input modalities, NIR and RE demonstrated the highest overall detection accuracy, reflected in their superior mAP_50:95. The RGB and G bands also yielded robust and uniform results, whereas the R channel exhibited slightly reduced performance.

Figure 4 presents detection results obtained by the trained YOLO model across different image modalities (RGB, NIR, RE, R, and G).

3.2. Cross-Domain Evaluation of YOLO Models Trained and Tested on Different Image Modalities

Table 4 presents the results of models trained on the RGB modality and evaluated across other spectral domains. All YOLO variants exhibit a distinct degradation in detection accuracy, confirming limited cross-modality generalization. Precision and Recall drop substantially compared to in-domain performance, with the decline most pronounced for YOLO11s, which displays unstable predictions across all non-RGB inputs. Among the evaluated architectures, YOLO12s exhibits the highest overall stability across spectral domains. Its mAP_50:95 and F₁-scores remain strong when tested on RE and G imagery, while only moderate degradation is observed in the NIR and R modalities. These findings indicate that YOLO12s retains partial robustness to spectral domain shifts, particularly within wavelengths close to the visible range. Overall, the results indicate that RGB-based training provides limited generalization to other modalities, while deeper architectures such as YOLO12s retain partial adaptability to spectrally correlated domains.

Table 5 reports cross-domain results for models trained on NIR. Transfer to RGB is heterogeneous: YOLOv8s collapses, while YOLO11s retains moderate accuracy and clearly outperforms the other variants. Transfer to R is also weaker than that to other bands. By contrast, evaluation on RE is nearly indistinguishable from in-domain NIR across all models, and G sustains high accuracy, with YOLO12s. Overall, NIR training generalizes well to spectral bands R and G, but only partially to visible RGB and R, with YOLO11s and YOLO12s generally more stable than YOLOv8s.

Table 6 presents the performance of YOLO models trained on the RE modality and tested across other spectral domains. The results indicate strong cross-modal transfer within the NIR and G bands for each YOLO model. In contrast, performance dropped for RGB and R imagery. Among the evaluated models, YOLO12s achieved the most balanced results across modalities, sustaining both precision and consistency under spectral variation.

Table 7 presents the cross-domain performance of YOLO models trained on the R modality. The results reveal strong generalization within the visible spectrum and moderate transfer toward NIR and RE domains. Across all architectures, YOLOv8s, YOLO11s, and YOLO12s exhibited comparable performance patterns, indicating consistent generalization and minimal sensitivity to model depth. Overall, R-based training supports reliable transfer across visible bands and red-edge but shows slightly lower adaptation to invisible spectral regions such as NIR.

Table 8 summarizes the cross-domain performance of YOLO models trained on the G modality. The results demonstrate exceptionally stable detection accuracy across both visible, near-infrared and red-edge domains. All evaluated architectures maintained nearly identical Precision, Recall, and mAP values, confirming robust and uniform generalization across spectral bands. Transfer to NIR and RE data preserved near-ideal performance at mAP_50:95, while evaluation on RGB and R inputs yielded slightly lower accuracy. Overall, G-based training produced the broadest spectral adaptability among all tested modalities, maintaining consistent performance across both visible and near-infrared domains. All YOLO models showed comparable performance patterns, indicating consistent generalization and minimal sensitivity when trained on this image modality

In cross-domain evaluation, mAP_50:95 serves as a robust indicator of model generalization, reflecting the ability to maintain detection consistency under varying spectral and distributional shifts. Unlike single-threshold measures, it aggregates performance across multiple IoU levels, providing a more holistic view of spatial localization and classification stability. Consequently, it is employed as the principal metric for inter modality comparison, enabling a compact yet comprehensive representation of spectral transferability.

Figure 5 presents the averaged mAP_50:95 across all YOLO models for each training–testing modality pair. The highest transfer performance is observed among the bands NIR, RE and G, which maintain strong bidirectional consistency.

The weakest results are observed for NIR and RE when transferred to R and RGB domains, RGB-trained models when evaluated on G, NIR, and R inputs evaluated on G, NIR, and R inputs.

Figure 6 compares the in-domain and cross-domain performance of YOLOv8s, YOLO11s, and YOLO12s using the proposed elasticity-related metrics. All models reached nearly identical in-domain accuracy, confirming consistent convergence across architectures.

In contrast, cross-domain generalization revealed clear differentiation. YOLO12s achieved the highest mAP_(out) and Relative mAP Retention, demonstrating greater resilience to spectral distribution shifts. YOLOv8s and YOLO11s maintained stable yet lower retention levels, reflecting stronger dependence on the training image modality.

Overall, YOLO12s demonstrated the most balanced trade-off between in-domain precision and cross spectral adaptability, underscoring its robustness under multispectral imaging conditions.

Cross-domain results, however, revealed marked variability in spectral transferability. The G and R modalities achieved the highest mAP retention and elasticity, indicating that features learned from vegetation sensitive wavelengths generalize most effectively across the multispectral space. Red-edge retained moderate adaptability, while NIR and RGB modalities exhibited reduced elasticity reflecting their stronger dependence on domain-specific spectral distributions. Overall, the findings highlight that vegetation correlated wavelengths, particularly G and RE, provide the most balanced trade-off between in-domain accuracy and cross-spectral robustness (Figure 7).

Figure 8 compares the robustness and cross-domain stability of the YOLOv8s, YOLO11s, and YOLO12s architectures. YOLO12s achieved the highest resistance to performance degradation, exhibiting both the largest worst-case ratio and the lowest cross-domain variability. This indicates a superior ability to maintain stable accuracy under spectral distribution shifts.

YOLOv8s showed higher performance dispersion and reduced stability under challenging cross-modal conditions, while YOLO11s maintained balanced behavior but demonstrated weaker robustness against extreme domain discrepancies. Overall, YOLO12s provided the most consistent and reliable response across heterogeneous spectral inputs.

Figure 9 compares the robustness and consistency of YOLO models trained on individual image modalities. The worst-case ratio quantifies the degree of performance degradation under the most adverse cross-modal transfer, while cross-domain variability reflects the dispersion of detection accuracy across target modalities. Models trained on the G and R bands exhibit the highest robustness, combining large worst-case ratio values with minimal variability indicating stable and predictable transfer across the multispectral space. In contrast, models trained on near-infrared NIR, RE and RGB imagery show lower robustness and greater variability, revealing weaker cross-domain consistency.

4. Discussion

Object detection in UAV-based agricultural imagery presents unique challenges related to scale variability, canopy structure, background heterogeneity, and changing illumination conditions. In multispectral settings, these challenges are further amplified by differences in spectral reflectance characteristics across wavelength bands, which may alter feature distributions learned during training. Consequently, evaluating not only in-domain detection accuracy but also cross-modality robustness becomes essential for assessing the practical deployability of detection models in real-world precision agriculture scenarios.

Against this background, the conducted experiments demonstrated that YOLOv8s, YOLO11s, and YOLO12s achieved near-saturated detection accuracy across all analyzed image modalities of floricane raspberry bushes under matched training and testing conditions. Performance remained highly consistent across architectures, with only minor variation between spectral inputs, and slightly stronger localization stability observed for NIR and RE modalities.

Comparable findings have been reported in recent UAV-based studies on fruit tree and shrub detection using CNN-based models. Model YOLOv4 has proven effective in fruit trees [40], as well as YOLOv5 and YOLOv7 in citrus orchards [41,42] maintaining reliable detection. YOLOv5 also enabled accurate detection of blueberry bushes [43].

In vineyards, YOLOv5 achieved high accuracy in detecting individual vines based on RGB + NIR images acquired with a drone equipped with a multispectral camera, confirming its suitability for multispectral data utilization [44].

High detection efficiency using drone data was also achieved in litchi, walnut, apricot, plum, pomegranate, cherry, and date palm plantations, where in these studies, the improved YOLO11 and YOLOv5 variants increased recognition efficiency under natural field conditions [45,46,47]. In the case of apple tree detection and UAV camera data, good results were obtained using the YOLOv5 model [48,49].

Using data in the RGB and multispectral color space model, different accuracy was obtained depending on the algorithm based on convolutional neural networks (MASK R-CNN, YOLOv3, SAM) used in the detection and mapping of pistachio trees [50], which confirms not only the selection of architectures for the appropriate tasks, but also the algorithms based on them characterized by different features. A high level of detection accuracy using multispectral data acquired by drone was also achieved for cotton seedlings and coffee plants [51,52].

When combined with advanced data acquisition techniques, such as the use of unmanned aerial vehicles, convolutional neural networks significantly enhance data analysis capabilities in precision agriculture. Various imaging technologies enable the acquisition of data tailored to specific agricultural applications. Conventional RGB imagery captures color information within the visible spectrum, whereas multispectral and hyperspectral imaging extend beyond this range. Their ability to record data across multiple discrete spectral bands (in multispectral imagery) or within a continuous spectrum of hundreds of bands (in hyperspectral imagery) makes them highly valuable in applications where distinguishing features are not apparent in conventional optical images [53].

Results exposed significant variability under cross-domain evaluation. This pattern highlights the intrinsic dependence of convolutional detectors on the spectral distribution of their training data. Although all models reached consistent convergence in same domain, their generalization across image modalities was uneven, confirming that spectral shifts remain a critical limitation for multispectral object detection. Among the evaluated architectures, YOLO12s demonstrated the most stable cross-domain behavior, combining strong retention with low variability. This suggests that deeper feature hierarchies contribute to robustness without significantly affecting inference efficiency. Conversely, YOLOv8s showed greater performance dispersion under spectral shifts, while YOLO11s maintained a balanced yet less resistant response. The differences in cross-domain stability among YOLOv8s, YOLO11s, and YOLO12s can be attributed to their architectural design. YOLOv8 and YOLO11 rely predominantly on convolution-based feature extraction, which emphasizes local spatial patterns and may be more sensitive to spectral intensity shifts. This likely contributes to the higher variability observed for YOLOv8s and the comparatively lower robustness of YOLO11s under extreme cross-modality conditions. In contrast, YOLO12 incorporates attention-based mechanisms (Area Attention and R-ELAN aggregation), enabling broader spatial context integration and adaptive feature weighting. Such mechanisms likely reduce dependence on localized spectral cues and contribute to the improved worst-case robustness and reduced cross-domain variability observed in the results.

The strong cross-modal generalization observed, particularly for the G band and partially for the RE band, can be interpreted in the context of vegetation spectral physiology. The green band corresponds to a region of relatively high reflectance in healthy vegetation, providing strong structural and geometric contrast between shrub canopy and soil background. Unlike red wavelengths, which are strongly absorbed by chlorophyll, or NIR wavelengths, which may saturate in dense foliage, the green band preserves stable spatial gradients and contour information that are critical for convolutional feature extraction.

The red-edge band represents the transition between chlorophyll absorption and near-infrared reflectance. This spectral transition is highly sensitive to leaf structure and chlorophyll concentration, offering a robust representation of vegetation morphology. Consequently, models trained on G imagery, and to a moderate extent on RE imagery, learn feature representations that remain partially compatible with both visible and near-infrared domains, explaining their higher mAP retention and reduced cross-domain variability relative to RGB and NIR training.

In contrast, RGB-based training exhibited reduced cross-modal robustness. Although RGB imagery provides rich visual information, it integrates multiple spectral components that are strongly influenced by illumination conditions and background variability. As a result, models trained on RGB data may rely more heavily on color-dependent cues and local intensity contrasts rather than intrinsic vegetation structure. When transferred to spectrally distinct domains such as NIR or RE, these learned representations become less compatible, leading to reduced retention and increased cross-domain variability.

Importantly, these transfer patterns would remain largely hidden if evaluation were restricted to conventional in-domain metrics. While all models achieved near-saturated accuracy under matched training and testing conditions, the proposed cross-domain indicators revealed substantial asymmetry in spectral transferability. Relative mAP retention quantified the proportional degradation under modality shifts, the worst-case ratio identified vulnerability to extreme spectral discrepancies, and cross-domain variability captured stability across heterogeneous inputs. Together, these measures provide insight into robustness characteristics that traditional single-domain indicators fail to expose. From a practical standpoint, these findings emphasize that spectral coherence is more influential for cross-domain robustness than architectural scale alone. Models trained on vegetation relevant visible bands offer a more reliable foundation for multispectral transfer learning, particularly under real-world UAV deployment constraints. Furthermore, the efficiency plateau observed in computational performance confirms that architectural optimization should prioritize representational density over depth expansion. The study demonstrates that spectral generalization and computational efficiency are governed by complementary but distinct principles. Compact architectures paired with spectrally coherent training data, particularly in the green and red-edge regions, yield the most stable and transferable detection performance across heterogeneous multispectral domains. Within this rapidly expanding field, the automated detection of perennial fruit crops using UAV-based multispectral imagery remains a relatively underexplored research direction.

However, the use of drones in agriculture is gaining increasing popularity, which confirms the relevance of the conducted research. Technological progress in the development of drones significantly influences the current direction of development of smart agriculture [54,55,56]. Currently, drones are already used for plant disease protection [57,58,59,60], insect pest control [61,62,63,64,65], as well as weed management [66,67,68,69]. It should also be noted that in the context of crop protection, the integration of UAV technology with machine learning is increasingly emphasized and considered essential for addressing operational and technological challenges in modern agriculture [70]. Due to difficulties in achieving adequate pollination efficiency, unmanned aerial vehicles are also becoming increasingly important in supporting pollination processes [71,72,73,74]. Furthermore, UAVs are used in fertilization practices [75,76,77] and can support decision-making systems for irrigation management [78,79,80,81,82]. Increasing attention is also being devoted to yield prediction based on data collected by drones and analyzed using machine learning techniques [83,84,85,86]. In the context of food security, as well as environmental protection and sustainable land-use planning, UAV-based field mapping is also expected to play an increasingly important role [87]. The use of drones in precision agriculture therefore remains at a stage of intensive development, both in research and practical applications, while demonstrating considerable potential for future agricultural systems.

Limitations of this study include the lack of data from the leafless stage, the beginning of leaf development, and after removing fruiting canes, leaving only the current year’s canes. Primocane raspberry varieties were also not specifically targeted in this research. The study also did not include nighttime conditions or the use of other sensors, including thermal imaging and LiDAR. A primary limitation of this study is the absence of real-time onboard detection experiments during UAV flight operations. All model training and inference procedures were conducted offline to ensure controlled and reproducible evaluation conditions. Implementing real-time detection would require integration of the multispectral imaging system with an onboard edge-computing device capable of running YOLO models under hardware, energy, and payload constraints. Such a configuration would introduce additional challenges related to computational efficiency, power consumption, and system integration, which were beyond the scope of the present study. Future research may explore fully autonomous UAV-based detection pipelines incorporating embedded inference modules.

Future research should focus on training models for shrub detection at all developmental stages of both primocane and floricane cultivars, with the highest possible degree of generalization, while also taking into account various agronomic systems.

By utilizing multispectral imagery, it is possible to achieve more accurate identification, particularly along the boundaries between plants of differing health conditions, even at lower spatial resolutions compared to RGB imagery, provided that an appropriate data analysis approach is applied [88]. Interdisciplinary collaboration is expected to play a pivotal role in accelerating the implementation of multispectral unmanned aerial vehicle remote sensing. By integrating expertise from agricultural sciences, remote sensing technologies, data analytics, and intelligent automation, UAV-based precision agriculture is anticipated to foster the development of more sustainable and efficient production systems [89]. The optimization of agronomic management strategies in raspberry production remains a critical research domain with considerable applied importance [90,91,92,93,94].

Detecting or localizing objects in high-resolution aerial imagery presents a considerable challenge due to the presence of small, rotated, and multi-scale targets. The YOLO family of deep neural networks has emerged as one of the most widely adopted approaches for real-time object detection in remote sensing applications [95], while achieving an effective balance between computational cost and detection performance [96]. The use of convolutional neural networks for object detection may be crucial for developing advanced solutions in modern Rubus cultivation systems.

5. Conclusions

All evaluated YOLO architectures achieved near-saturated detection accuracy for floricane raspberry bushes under matched training and testing conditions, indicating that the detection task is well learned across RGB and multispectral modalities. However, cross-domain evaluation revealed substantial differences in spectral transferability. Among the tested architectures, YOLO12s demonstrated the highest robustness to spectral distribution shifts, achieving the strongest mAP retention, the most favorable worst-case performance, and the lowest cross-domain variability. In terms of spectral modalities, models trained on the green and red-edge bands exhibited the most consistent cross-modal generalization. These findings indicate that vegetation-correlated wavelengths, particularly the green band, provide the most transferable feature representations across heterogeneous multispectral domains. The proposed cross-domain evaluation framework proved essential for revealing these asymmetries, offering a structured approach for assessing spectral robustness beyond traditional single-domain accuracy measures.

The potential of emerging technologies such as unmanned aerial vehicles, multispectral imaging, and convolutional neural network-based computer vision algorithms in raspberry cultivation remains largely unexplored. The integration of these technologies represents a necessary step toward the advancement of agricultural innovation, offering new perspectives for optimizing raspberry production and management. Recent technological progress has enabled the acquisition of vast amounts of data. The challenge now is not only to see more, but to understand better, to design systems where vision and reasoning converge to shape the next era of agriculture.

Author Contributions

Conceptualization, M.K. and K.B.; methodology, M.K. and K.B.; software, M.K. and K.B.; validation, M.K., K.B. and Z.J.; formal analysis, M.K. and K.B.; investigation, M.K., K.B. and Z.J.; resources, M.K. and K.B.; data curation, M.K.; writing—original draft preparation, M.K. and K.B.; writing—review and editing, M.K., K.B. and Z.J.; visualization, M.K. and K.B.; supervision, M.K.; project administration, M.K.; funding acquisition, M.K. and Z.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All preprocessing, training, and evaluation scripts used in this study are available in the associated repository: https://github.com/kamilczynski/Detection-of-Floricane-Raspberry-Shrubs-from-Unmanned-Aerial-Vehicles-Imagery-Using-YOLO-Models (accessed on 16 February 2026). The original UAV datasets and trained YOLO model weights (best.pt and best.engine) are not publicly available due to their large volume and storage constraints but can be obtained from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Avhale, V.R.; Senthil Kumar, G.; Kumaraperumal, R.; Prabukumar, G.; Bharathi, C.; Sathya Priya, R.; Yuvaraj, M.; Muthumanickam, D.; Parasuraman, P.; Pazhanivelan, S. AgriDrones: A Holistic Review on the Integration of Drones in Indian Agriculture. Agric. Res. 2025, 14, 34–46. [Google Scholar] [CrossRef]
Merz, M.; Pedro, D.; Skliros, V.; Bergenhem, C.; Himanka, M.; Houge, T.; Matos-Carvalho, J.P.; Lundkvist, H.; Cürüklü, B.; Hamrén, R.; et al. Autonomous UAS-Based Agriculture Applications: General Overview and Relevant European Case Studies. Drones 2022, 6, 128. [Google Scholar] [CrossRef]
Rejeb, A.; Abdollahi, A.; Rejeb, K.; Treiblmaier, H. Drones in Agriculture: A Review and Bibliometric Analysis. Comput. Electron. Agric. 2022, 198, 107017. [Google Scholar] [CrossRef]
Guebsi, R.; Mami, S.; Chokmani, K. Drones in Precision Agriculture: A Comprehensive Review of Applications, Technologies, and Challenges. Drones 2024, 8, 686. [Google Scholar] [CrossRef]
Weiss, M.; Jacob, F.; Duveiller, G. Remote Sensing for Agricultural Applications: A Meta-Review. Remote Sens. Environ. 2020, 236, 111402. [Google Scholar] [CrossRef]
Zhang, X.; Bi, P.; Zhou, Q.; Liu, L.; Ren, L.; Luo, Y. Monitoring of Yellow Leaf Disease (YLD) Damage Based on Ground-Based LiDAR and UAV Multispectral Data. Comput. Electron. Agric. 2025, 236, 110461. [Google Scholar] [CrossRef]
Teixeira, I.; Morais, R.; Sousa, J.J.; Cunha, A. Deep Learning Models for the Classification of Crops in Aerial Imagery: A Review. Agriculture 2023, 13, 965. [Google Scholar] [CrossRef]
Niu, H.; Chen, Y. Smart Big Data in Digital Agriculture Applications: Acquisition, Advanced Analytics, and Plant Physiology-Informed Artificial Intelligence; Agriculture Automation and Control; Springer Nature: Cham, Switzerland, 2024. [Google Scholar]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Nair, V.; Hinton, G.E. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on Machine Learning (ICML 2010); Omnipress: Madison, WI, USA, 2010; pp. 807–814. Available online: https://dl.acm.org/doi/10.5555/3104322.3104425 (accessed on 19 October 2025).
Zeiler, M.D.; Fergus, R. Visualizing and Understanding Convolutional Networks. In Computer Vision—ECCV 2014; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2014; Volume 8689, pp. 818–833. [Google Scholar]
Hutchison, D.; Kanade, T.; Kittler, J.; Kleinberg, J.M.; Mattern, F.; Mitchell, J.C.; Naor, M.; Nierstrasz, O.; Pandu Rangan, C.; Steffen, B.; et al. Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition. In Artificial Neural Networks—ICANN 2010; Diamantaras, K., Duch, W., Iliadis, L.S., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2010; Volume 6354, pp. 92–101. [Google Scholar]
Goodfellow, I.; Courville, A.; Bengio, Y. Deep Learning; Adaptive Computation and Machine Learning; The MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Columbus, OH, USA, 2014; pp. 580–587. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv 2015, arXiv:1506.01497. [Google Scholar] [CrossRef] [PubMed]
Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV); IEEE: Santiago, Chile, 2015; pp. 1440–1448. [Google Scholar]
Sapkota, R.; Flores-Calero, M.; Qureshi, R.; Badgujar, C.; Nepal, U.; Poulose, A.; Zeno, P.; Vaddevolu, U.B.P.; Khan, S.; Shoman, M.; et al. YOLO Advances to Its Genesis: A Decadal and Comprehensive Review of the You Only Look Once (YOLO) Series. Artif. Intell. Rev. 2025, 58, 274. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision—ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2016; Volume 9905, pp. 21–37. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Las Vegas, NV, USA, 2016; pp. 779–788. [Google Scholar]
Ali, M.L.; Zhang, Z. The YOLO Framework: A Comprehensive Review of Evolution, Applications, and Benchmarks in Object Detection. Computers 2024, 13, 336. [Google Scholar] [CrossRef]
Katkuri, A.V.R.; Madan, H.; Khatri, N.; Abdul-Qawy, A.S.H.; Patnaik, K.S. Autonomous UAV Navigation Using Deep Learning-Based Computer Vision Frameworks: A Systematic Literature Review. Array 2024, 23, 100361. [Google Scholar] [CrossRef]
Xing, Y.; Liu, X.; Wang, X. Integrating UAVs, Satellite Remote Sensing, and Machine Learning in Precision Agriculture: Pathways to Sustainable Food Production, Resource Efficiency, and Scalable Innovation. Front. Agron. 2026, 7, 1670380. [Google Scholar] [CrossRef]
Agrawal, J.; Arafat, M.Y. Transforming Farming: A Review of AI-Powered UAV Technologies in Precision Agriculture. Drones 2024, 8, 664. [Google Scholar] [CrossRef]
Buczyński, K.; Kapłan, M.; Jarosz, Z. Review of the Report on the Nutritional and Health-Promoting Values of Species of the Rubus L. Genus. Agriculture 2024, 14, 1324. [Google Scholar] [CrossRef]
DJI Agriculture. Available online: https://ag.dji.com/Mavic-3-m (accessed on 20 October 2025).
Kamilczynski. GitHub Repository. 2026. Available online: https://github.com/kamilczynski/detection-of-floricane-raspberry-shrubs-from-unmanned-aerial-vehicles-imagery-using-yolo-models (accessed on 16 February 2026).
Tzutalin. LabelImg: Image Annotation Tool, Version 1.8.6; GitHub Repository. 2015. Available online: https://github.com/tzutalin/labelimg (accessed on 1 June 2025).
Yaseen, M. What Is YOLOv8: An In-Depth Exploration of the Internal Features of the Next-Generation Object Detector. arXiv 2024, arXiv:2408.15857. [Google Scholar]
Alif, M.A.R.; Hussain, M. YOLOv1 to YOLOv10: A Comprehensive Review of YOLO Variants and Their Application in the Agricultural Domain. arXiv 2024, arXiv:2406.10139. [Google Scholar] [CrossRef]
Ultralytics. Available online: https://docs.ultralytics.com/models/yolov8/#supported-tasks-and-modes (accessed on 20 October 2025).
Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
Jegham, N.; Koh, C.Y.; Abdelatti, M.; Hendawi, A. YOLO Evolution: A Comprehensive Benchmark and Architectural Review of YOLOv12, YOLO11, and Their Previous Versions. arXiv 2025, arXiv:2411.00201. [Google Scholar]
Ultralytics. Available online: https://docs.ultralytics.com/models/yolo11/ (accessed on 20 October 2025).
Tian, Y.; Ye, Q.; Doermann, D. YOLOv12: Attention-Centric Real-Time Object Detectors. arXiv 2025, arXiv:2502.12524. [Google Scholar]
Ultralytics. Available online: https://docs.ultralytics.com/models/yolo12 (accessed on 20 October 2025).
Ultralytics. Available online: https://docs.ultralytics.com/guides/yolo-performance-metrics/#class-wise-metrics (accessed on 10 November 2025).
Zhu, Y.; Zhou, J.; Yang, Y.; Liu, L.; Liu, F.; Kong, W. Rapid Target Detection of Fruit Trees Using UAV Imaging and Improved Light YOLOv4 Algorithm. Remote Sens. 2022, 14, 4324. [Google Scholar] [CrossRef]
Tian, H.; Fang, X.; Lan, Y.; Ma, C.; Huang, H.; Lu, X.; Zhao, D.; Liu, H.; Zhang, Y. Extraction of Citrus Trees from UAV Remote Sensing Imagery Using YOLOv5s and Coordinate Transformation. Remote Sens. 2022, 14, 4208. [Google Scholar] [CrossRef]
Zhang, Y.; Fang, X.; Guo, J.; Wang, L.; Tian, H.; Yan, K.; Lan, Y. CURI-YOLOv7: A Lightweight YOLOv7tiny Target Detector for Citrus Trees from UAV Remote Sensing Imagery Based on Embedded Device. Remote Sens. 2023, 15, 4647. [Google Scholar] [CrossRef]
Nguyen, H.D.; McHenry, B.; Nguyen, T.; Zappone, H.; Thompson, A.; Tran, C.; Segrest, A.; Tonon, L. Accurate Crop Yield Estimation of Blueberries Using Deep Learning and Smart Drones. arXiv 2025, arXiv:2501.02344. [Google Scholar] [CrossRef]
Gavrilović, M.; Jovanović, D.; Božović, P.; Benka, P.; Govedarica, M. Vineyard Zoning and Vine Detection Using Machine Learning in Unmanned Aerial Vehicle Imagery. Remote Sens. 2024, 16, 584. [Google Scholar] [CrossRef]
Jintasuttisak, T.; Edirisinghe, E.; Elbattay, A. Deep Neural Network Based Date Palm Tree Detection in Drone Imagery. Comput. Electron. Agric. 2022, 192, 106560. [Google Scholar] [CrossRef]
Peng, H.; Xie, H.; Lia, W.; Liuc, H.; Li, X. YOLOv11-Litchi: Efficient Litchi Fruit Detection Based on UAV-Captured Agricultural Imagery in Complex Orchard Environments. arXiv 2025, arXiv:2510.10141. [Google Scholar]
Wang, Q.; Pu, Z.; Luo, L.; Wang, L.; Gao, J. A Study on Tree Species Recognition in UAV Remote Sensing Imagery Based on an Improved YOLOv11 Model. Appl. Sci. 2025, 15, 8779. [Google Scholar] [CrossRef]
Jemaa, H.; Bouachir, W.; Leblon, B.; LaRocque, A.; Haddadi, A.; Bouguila, N. UAV-Based Computer Vision System for Orchard Apple Tree Detection and Health Assessment. Remote Sens. 2023, 15, 3558. [Google Scholar] [CrossRef]
Jemaa, H.; Bouachir, W.; Leblon, B.; Bouguila, N. Computer Vision System for Detecting Orchard Trees from UAV Images. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2022, XLIII-B4-2022, 661–668. [Google Scholar] [CrossRef]
Kelly, M.; Feirer, S.; Hogan, S.; Lyons, A.; Lin, F.; Jacygrad, E. Mapping Orchard Trees from UAV Imagery Through One Growing Season: A Comparison Between OBIA-Based and Three CNN-Based Object Detection Methods. Drones 2025, 9, 593. [Google Scholar] [CrossRef]
Feng, Y.; Chen, W.; Ma, Y.; Zhang, Z.; Gao, P.; Lv, X. Cotton Seedling Detection and Counting Based on UAV Multispectral Images and Deep Learning Methods. Remote Sens. 2023, 15, 2680. [Google Scholar] [CrossRef]
Wang, X.; Zhang, C.; Qiang, Z.; Liu, C.; Wei, X.; Cheng, F. A Coffee Plant Counting Method Based on Dual-Channel NMS and YOLOv9 Leveraging UAV Multispectral Imaging. Remote Sens. 2024, 16, 3810. [Google Scholar] [CrossRef]
El Sakka, M.; Ivanovici, M.; Chaari, L.; Mothe, J. A Review of CNN Applications in Smart Agriculture Using Multimodal Data. Sensors 2025, 25, 472. [Google Scholar] [CrossRef]
El Melki, M.N.; Yahyaoui, A.; Faqeih, K.Y.; Al-Khayri, J.M. Agricultural Drones: An Eye in the Sky for Smart Agriculture. In Handbook of Agricultural Technologies; Al-Khayri, J.M., Yatoo, A.M., Jain, S.M., Penna, S., Eds.; Springer Nature: Singapore, 2025; pp. 1–23. [Google Scholar]
Gamboa-Cruzado, J.; Estrada-Gutierrez, J.; Bustos-Romero, C.; Alzamora Rivero, C.; Valenzuela, J.N.; Tavera Romero, C.A.; Gamarra-Moreno, J.; Amayo-Gamboa, F. A Review of Drones in Smart Agriculture: Issues, Models, Trends, and Challenges. Sustainability 2026, 18, 507. [Google Scholar] [CrossRef]
Makam, S.; Komatineni, B.K.; Meena, S.S.; Meena, U. Unmanned Aerial Vehicles (UAVs): An Adoptable Technology for Precise and Smart Farming. Discov. Internet Things 2024, 4, 12. [Google Scholar] [CrossRef]
Asif, M.; Rayamajhi, A.; Mahmud, M.S. Technological Progress Toward Peanut Disease Management: A Review. Sensors 2025, 25, 1255. [Google Scholar] [CrossRef] [PubMed]
Șerban, M.I.; Grad-Rusu, E.; Florian, T.; Grad, M.; Florian, V.C. Efficacy of Drone-Applied Fungicide Treatments in Control of Sunflower Diseases. Drones 2026, 10, 33. [Google Scholar] [CrossRef]
Furiosi, M.; Triachini, S.; Beone, G.M.; Fontanella, M.C.; Gaaied, S.; Arbi, G.; Lomadze, A.; Grella, M.; Mozzanini, E.; Dicembrini, E.; et al. Aerial Spray Application of Plant Protection Products for Grapevine Downy Mildew Control: Efficacy and Canopy Deposit Evaluation in Semi-Field Trials. Agronomy 2025, 15, 2703. [Google Scholar] [CrossRef]
Pansy, D.L.; Murali, M. UAV Hyperspectral Remote Sensor Images for Mango Plant Disease and Pest Identification Using MD-FCM and XCS-RBFNN. Environ. Monit. Assess. 2023, 195, 1120. [Google Scholar] [CrossRef]
Ochieng’, V.; Rwomushana, I.; Ong’amo, G.; Ndegwa, P.; Kamau, S.; Makale, F.; Chacha, D.; Gadhia, K.; Akiri, M. Optimum Flight Height for the Control of Desert Locusts Using Unmanned Aerial Vehicles (UAV). Drones 2023, 7, 233. [Google Scholar] [CrossRef]
Alsadik, B.; Ellsäßer, F.J.; Awawdeh, M.; Al-Rawabdeh, A.; Almahasneh, L.; Oude Elberink, S.; Abuhamoor, D.; Al Asmar, Y. Remote Sensing Technologies Using UAVs for Pest and Disease Monitoring: A Review Centered on Date Palm Trees. Remote Sens. 2024, 16, 4371. [Google Scholar] [CrossRef]
Juan, Y.; Ke, Z.; Chen, Z.; Zhong, D.; Chen, W.; Yin, L. Rapid Density Estimation of Tiny Pests from Sticky Traps Using Qpest RCNN in Conjunction with UWB-UAV-Based IoT Framework. Neural Comput. Appl. 2024, 36, 9779–9803. [Google Scholar] [CrossRef]
Albattah, W.; Masood, M.; Javed, A.; Nawaz, M.; Albahli, S. Custom CornerNet: A Drone-Based Improved Deep Learning Technique for Large-Scale Multiclass Pest Localization and Classification. Complex Intell. Syst. 2023, 9, 1299–1316. [Google Scholar] [CrossRef]
Amarasingam, N.; Powell, K.; Sandino, J.; Bratanov, D.; Ashan Salgadoe, A.S.; Gonzalez, F. Mapping of Insect Pest Infestation for Precision Agriculture: A UAV-Based Multispectral Imaging and Deep Learning Techniques. Int. J. Appl. Earth Obs. Geoinf. 2025, 137, 104413. [Google Scholar] [CrossRef]
Bautista, A.S.; Tarrazó-Serrano, D.; Uris, A.; Blesa, M.; Estruch-Guitart, V.; Castiñeira-Ibáñez, S.; Rubio, C. Remote Sensing Evaluation Drone Herbicide Application Effectiveness for Controlling Echinochloa Spp. in Rice Crop in Valencia (Spain). Sensors 2024, 24, 804. [Google Scholar] [CrossRef]
Fathimathul Rajeena, P.P.; Ismail, W.N.; Ali, M.A.S. A Metaheuristic Harris Hawks Optimization Algorithm for Weed Detection Using Drone Images. Appl. Sci. 2023, 13, 7083. [Google Scholar] [CrossRef]
Kebede, A.S.; Muluneh, T.W.; Adege, A.B. Detection of Weeds in Teff Crops Using Deep Learning and UAV Imagery for Precision Herbicide Application. Sci. Rep. 2025, 15, 30708. [Google Scholar] [CrossRef]
Takekawa, J.Y.; Hagani, J.S.; Edmunds, T.J.; Collins, J.M.; Chappell, S.C.; Reynolds, W.H. The Sky Is Not the Limit: Use of a Spray Drone for the Precise Application of Herbicide and Control of an Invasive Plant in Managed Wetlands. Remote Sens. 2023, 15, 3845. [Google Scholar] [CrossRef]
Toscano, F.; Fiorentino, C.; Santana, L.S.; Magalhães, R.R.; Albiero, D.; Tomáš, Ř.; Klocová, M.; D’Antonio, P. Recent Developments and Future Prospects in the Integration of Machine Learning in Mechanised Systems for Autonomous Spraying: A Brief Review. AgriEngineering 2025, 7, 142. [Google Scholar] [CrossRef]
Miyoshi, K.; Hiraguri, T.; Shimizu, H.; Hattori, K.; Kimura, T.; Okubo, S.; Endo, K.; Shimada, T.; Shibasaki, A.; Takemura, Y. Development of Pear Pollination System Using Autonomous Drones. AgriEngineering 2025, 7, 68. [Google Scholar] [CrossRef]
Manthos, I.; Sotiropoulos, T.; Vagelas, I. Is the Artificial Pollination of Walnut Trees with Drones Able to Minimize the Presence of Xanthomonas Arboricola Pv. Juglandis? A Review. Appl. Sci. 2024, 14, 2732. [Google Scholar] [CrossRef]
Rice, C.R.; McDonald, S.T.; Shi, Y.; Gan, H.; Lee, W.S.; Chen, Y.; Wang, Z. Perception, Path Planning, and Flight Control for a Drone-Enabled Autonomous Pollination System. Robotics 2022, 11, 144. [Google Scholar] [CrossRef]
Hulens, D.; Van Ranst, W.; Cao, Y.; Goedemé, T. Autonomous Visual Navigation for a Flower Pollination Drone. Machines 2022, 10, 364. [Google Scholar] [CrossRef]
Chen, B.; Su, Q.; Li, Y.; Chen, R.; Yang, W.; Huang, C. Field Rice Growth Monitoring and Fertilization Management Based on UAV Spectral and Deep Image Feature Fusion. Agronomy 2025, 15, 886. [Google Scholar] [CrossRef]
Xia, X.; Zhang, R.; Ma, L.; Su, J.; Yi, T.; Zhang, L.; Chen, X. Optimization of Unmanned Aerial Vehicle Operational Parameters to Maximize Fertilizer Application Efficiency in Rice Cultivation. J. Clean. Prod. 2025, 514, 145762. [Google Scholar] [CrossRef]
Zhou, H.; Yao, W.; Su, D.; Guo, S.; Zheng, Z.; Yu, Z.; Gao, D.; Li, H.; Chen, C. Application of a Centrifugal Disc Fertilizer Spreading System for UAVs in Rice Fields. Heliyon 2024, 10, e29837. [Google Scholar] [CrossRef] [PubMed]
Al-Najadi, R.; Al-Mulla, Y.; Al-Abri, I.; Al-Sadi, A.M. Effectiveness of Drone-Based Thermal Sensors in Optimizing Controlled Environment Agriculture Performance under Arid Conditions. Sci. Rep. 2025, 15, 9042. [Google Scholar] [CrossRef] [PubMed]
Goswami, A.; Singh, R. A Review of Drone Based Irrigation System for Large Farms. JGEU 2025, 13, 323–338. [Google Scholar] [CrossRef]
Yadav, M.; Vashisht, B.B.; Vullaganti, N.; Kumar, P.; Jalota, S.K.; Kumar, A.; Kaushik, P. UAV-Enabled Approaches for Irrigation Scheduling and Water Body Characterization. Agric. Water Manag. 2024, 304, 109091. [Google Scholar] [CrossRef]
Sharma, H.; Sidhu, H.; Bhowmik, A. Remote Sensing Using Unmanned Aerial Vehicles for Water Stress Detection: A Review Focusing on Specialty Crops. Drones 2025, 9, 241. [Google Scholar] [CrossRef]
Ortega-Farias, S.; Ramírez-Cuesta, J.M.; Nieto, H. Recent Advances on Water Management Using UAV-Based Technologies. Irrig. Sci. 2025, 43, 1–3. [Google Scholar] [CrossRef]
Liang, Z.; Fu, Z.; Kiplagat, D.; Wang, W.; Yang, J.; Li, Z.; Cao, Q.; Tian, Y.; Zhu, Y.; Cao, W.; et al. Rice Yield Prediction Base on UAV Multispectral Imagery Using Machine Learning Methods. Smart Agric. Technol. 2025, 12, 101549. [Google Scholar] [CrossRef]
Tripathi, R.; Gouda, A.K.; Jena, S.S.; Mohapatra, R.R.; Lal, M.K.; Dash, S.K.; Sahoo, R.N.; Nayak, A.K. Rice Yield Prediction Using UAV-Mounted RGB Sensors and Machine Learning Algorithms. Proc. Indian Natl. Sci. Acad. 2025. [Google Scholar] [CrossRef]
Kešelj, K.; Stamenković, Z.; Kostić, M.; Aćin, V.; Tekić, D.; Novaković, T.; Ivanišević, M.; Ivezić, A.; Magazin, N. Machine Learning (AutoML)-Driven Wheat Yield Prediction for European Varieties: Enhanced Accuracy Using Multispectral UAV Data. Agriculture 2025, 15, 1534. [Google Scholar] [CrossRef]
Dos Santos Felipetto, H.; Mercante, E.; Viana, O.; Elias, A.R.; Benin, G.; Scolari, L.; Armadori, A.; Donato, D.G. Combining Machine Learning with UAV Derived Multispectral Aerial Images for Wheat Yield Prediction, in Southern Brazil. Eur. J. Remote Sens. 2025, 58, 2464663. [Google Scholar] [CrossRef]
Ivošević, B.; Pajević, N.; Brdar, S.; Waqar, R.; Khan, M.; Valente, J. Comprehensive Dataset from High Resolution UAV Land Cover Mapping of Diverse Natural Environments in Serbia. Sci. Data 2025, 12, 66. [Google Scholar] [CrossRef] [PubMed]
Pinkas, J.; Toman, P.; Svoboda, J. Comparison of RGB and Multispectral Cameras for Targeted Applications in Agriculture. APP 2024, 51, 69–74. [Google Scholar] [CrossRef]
Zhang, S.; Wang, X.; Lin, H.; Dong, Y.; Qiang, Z. A Review of the Application of UAV Multispectral Remote Sensing Technology in Precision Agriculture. Smart Agric. Technol. 2025, 12, 101406. [Google Scholar] [CrossRef]
Wróblewska, W.; Pawlak, J.; Paszko, D. The Influence of Factors on the Yields of Two Raspberry Varieties (Rubus idaeus L.) and the Economic Results. ASPHC 2020, 19, 63–70. [Google Scholar] [CrossRef]
Wróblewska, W.; Pawlak, J.; Paszko, D. Economic Aspects in the Raspberry Production on the Example of Farms from Poland, Serbia and Ukraine. J. Hortic. Res. 2019, 27, 71–80. [Google Scholar] [CrossRef]
Paszko, D.; Krawiec, P.; Pawlak, J.; Wróblewska, W. Assess the Cost and Profitability of Raspberry Production under Cover in the Context of Building Competitive Advantage on Example of Selected Farm. Ann. PAAAE 2017, XIX, 218–223. [Google Scholar] [CrossRef]
Baranowska, A.; Skowera, B.; Węgrzyn, A. Wpływ Warunków Meteorologicznych i Zabiegów Agrotechnicznych Na Wynik Produkcyjny i Ekonomiczny Uprawy Maliny Jesiennej—Studium Przypadku. Agron. Sci. 2025, 79, 169–182. [Google Scholar] [CrossRef]
Buczyński, K.; Kapłan, M.; Borkowska, A.; Kilmek, K. Wpływ Węgla Brunatnego Na Wielkość i Jakość Plonu Maliny Odmiany Polana. Ann. Hortic. 2024, 32, 5–20. [Google Scholar] [CrossRef]
Kasetty, S.B.; K, R. Advancing Object Detection in Remote Sensing: A Rigorous Evaluation of YOLO Models. In Proceedings of the 2025 11th International Conference on Communication and Signal Processing (ICCSP); IEEE: Melmaruvathur, India, 2025; pp. 999–1004. [Google Scholar]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]

Figure 1. Example images’ modality, RGB, NIR, RE, R and G.

Figure 2. (a–f) Examples of raspberry row labeling in situations with partially visible adjacent shrubs.

Figure 3. (a–d) Examples of labeling complete raspberry rows and with gaps caused by missing plants.

Figure 4. Detection example by the trained YOLO model on the RGB, NIR, RE, R and G image modality.

Figure 5. Cross-modality performance across training and testing modalities.

Figure 6. Comparison of in-domain mAP, cross-domain mAP, and relative mAP retentionmetrics across YOLO models.

Figure 7. Comparison of in-domain mAP, cross-domain mAP, and relative mAP retention metrics across image modalities.

Figure 8. Comparison of worst-case ratio and cross-domain variability metrics across YOLO models.

Figure 9. Comparison of worst-case ratio and cross-domain variability metrics across training image modalities.

Table 1. Hyperparameters of YOLO training models.

Hyperparameter	Value
epochs	200
batch	32
imgsz	640
optimizer	‘SGD’
momentum	0.937
weight_decay	0.0005
lr₀	0.01
lrf	0.01
seed	0
augment	True
workers	8
patience	0
device	‘cuda’

Table 2. Hardware and software configuration primary workstation used for model training and evaluation.

Component	Specification
Processor (CPU)	AMD Ryzen 9 9950X (Advanced Micro Devices, Inc., Sunnyvale, CA, USA)
Graphics Processing Unit (GPU)	ASUS GeForce RTX 5080 PRIME OC, 16 GB (ASUSTek Computer Inc., Taipei, Taiwan)
Motherboard	Gigabyte B850 AORUS ELITE WIFI7 AM5 (GIGA-BYTE Technology Co., Ltd., New Taipei City, Taiwan)
Memory (RAM)	Corsair Vengeance, DDR5 2 × 32 GB (64 GB) (Corsair Memory, Inc., Milpitas, CA, USA)
Storage	SSD Lexar NM790 2 TB (Lexar Co., Limited, Shatin, Hong Kong)
Software	Version
Microsoft Windows	11 Pro (build 26200, 64-bit)
Python	3.11.13
PyTorch	2.8.0
CUDA	12.8
cuDNN	9.1.0.2 (NVIDIA build 91002)
Ultralytics	8.3.13

Table 3. Comparative evaluation of YOLO trained on different image modalities.

Image Modality	Model	Precision	Recall	mAP₅₀	mAP_50:95	F₁-Score
RGB	YOLOv8s	0.997	0.996	0.995	0.975	0.997
	YOLO11s	0.996	0.998	0.995	0.975	0.997
	YOLO12s	0.996	0.996	0.995	0.973	0.996
NIR	YOLOv8s	0.997	0.997	0.995	0.984	0.997
	YOLO11s	0.997	0.996	0.995	0.986	0.997
	YOLO12s	0.998	0.998	0.995	0.986	0.998
RE	YOLOv8s	0.996	0.999	0.995	0.985	0.998
	YOLO11s	0.997	0.997	0.995	0.985	0.997
	YOLO12s	0.998	0.998	0.995	0.983	0.998
R	YOLOv8s	0.996	0.996	0.995	0.941	0.996
	YOLO11s	0.995	0.994	0.995	0.949	0.995
	YOLO12s	0.996	0.996	0.995	0.952	0.996
G	YOLOv8s	0.988	0.999	0.995	0.973	0.999
	YOLO11s	0.999	0.998	0.995	0.974	0.999
	YOLO12s	0.998	0.999	0.995	0.972	0.998

Table 4. Cross-Modality Performance of YOLO Models Trained on the RGB Image Modality.

Training Image Modality	Testing Image Modality	Model	Precision	Recall	mAP₅₀	mAP_50:95	F₁-Score
RGB	NIR	YOLOv8s	0.922	0.803	0.887	0.737	0.859
		YOLO11s	0.236	0.272	0.285	0.226	0.253
		YOLO12s	0.849	0.382	0.573	0.509	0.527
	RE	YOLOv8s	0.925	0.822	0.904	0.773	0.871
		YOLO11s	0.875	0.287	0.472	0.370	0.432
		YOLO12s	0.985	0.951	0.989	0.905	0.968
	R	YOLOv8s	0.757	0.395	0.483	0.279	0.519
		YOLO11s	0.774	0.264	0.408	0.253	0.393
		YOLO12s	0.867	0.677	0.813	0.512	0.760
	G	YOLOv8s	0.897	0.619	0.743	0.553	0.733
		YOLO11s	0.757	0.218	0.344	0.232	0.339
		YOLO12s	0.960	0.909	0.979	0.784	0.934

Table 5. Cross-Modality Performance of YOLO Models Trained on the NIR Image Modality.

Training Image Modality	Testing Image Modality	Model	Precision	Recall	mAP₅₀	mAP_50:95	F₁-Score
NIR	RGB	YOLOv8s	0.172	0.104	0.069	0.062	0.130
		YOLO11s	0.698	0.482	0.547	0.422	0.570
		YOLO12s	0.497	0.424	0.450	0.337	0.458
	RE	YOLOv8s	0.994	0.996	0.995	0.972	0.995
		YOLO11s	0.997	0.997	0.995	0.978	0.997
		YOLO12s	0.996	0.996	0.995	0.978	0.996
	R	YOLOv8s	0.573	0.355	0.403	0.304	0.438
		YOLO11s	0.854	0.426	0.560	0.384	0.568
		YOLO12s	0.691	0.549	0.613	0.392	0.612
	G	YOLOv8s	0.828	0.605	0.718	0.624	0.699
		YOLO11s	0.937	0.790	0.907	0.756	0.857
		YOLO12s	0.930	0.918	0.971	0.822	0.924

Table 6. Cross-Modality Performance of YOLO Models Trained on the RE Image Modality.

Training Image Modality	Testing Image Modality	Model	Precision	Recall	mAP₅₀	mAP_50:95	F₁-Score
RE	RGB	YOLOv8s	0.801	0.358	0.426	0.380	0.495
		YOLO11s	0.757	0.529	0.599	0.487	0.623
		YOLO12s	0.732	0.644	0.725	0.606	0.685
	NIR	YOLOv8s	0.994	0.998	0.995	0.982	0.996
		YOLO11s	0.995	0.995	0.995	0.981	0.995
		YOLO12s	0.998	0.997	0.995	0.981	0.997
	R	YOLOv8s	0.874	0.492	0.640	0.514	0.630
		YOLO11s	0.883	0.464	0.589	0.444	0.608
		YOLO12s	0.857	0.727	0.842	0.632	0.787
	G	YOLOv8s	0.963	0.874	0.949	0.875	0.917
		YOLO11s	0.949	0.907	0.959	0.877	0.927
		YOLO12s	0.994	0.984	0.995	0.927	0.989

Table 7. Cross-Modality Performance of YOLO Models Trained on the R Image Modality.

Training Image Modality	Testing Image Modality	Model	Precision	Recall	mAP₅₀	mAP_50:95	F₁-Score
R	RGB	YOLOv8s	0.848	0.932	0.941	0.857	0.888
		YOLO11s	0.951	0.949	0.985	0.902	0.950
		YOLO12s	0.864	0.926	0.948	0.860	0.894
	NIR	YOLOv8s	0.908	0.727	0.857	0.727	0.808
		YOLO11s	0.932	0.752	0.856	0.718	0.833
		YOLO12s	0.942	0.856	0.922	0.712	0.897
	RE	YOLOv8s	0.913	0.885	0.949	0.841	0.899
		YOLO11s	0.950	0.911	0.964	0.880	0.930
		YOLO12s	0.952	0.860	0.927	0.808	0.904
	G	YOLOv8s	0.993	0.994	0.995	0.951	0.994
		YOLO11s	0.998	0.994	0.994	0.950	0.996
		YOLO12s	0.997	0.996	0.995	0.949	0.996

Table 8. Cross-Modality Performance of YOLO Models Trained on the G Image Modality.

Training Image Modality	Testing Image Modality	Model	Precision	Recall	mAP₅₀	mAP_50:95	F₁-Score
G	RGB	YOLOv8s	0.928	0.876	0.928	0.851	0.901
		YOLO11s	0.922	0.937	0.968	0.879	0.930
		YOLO12s	0.888	0.956	0.973	0.905	0.921
	NIR	YOLOv8s	0.993	0.991	0.995	0.958	0.992
		YOLO11s	0.990	0.990	0.994	0.955	0.990
		YOLO12s	0.994	0.993	0.995	0.960	0.994
	RE	YOLOv8s	0.994	0.993	0.995	0.976	0.994
		YOLO11s	0.998	0.995	0.995	0.974	0.996
		YOLO12s	0.996	0.995	0.995	0.971	0.995
	R	YOLOv8s	0.985	0.991	0.995	0.861	0.988
		YOLO11s	0.984	0.977	0.993	0.848	0.981
		YOLO12s	0.996	0.987	0.995	0.862	0.991

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kapłan, M.; Buczyński, K.; Jarosz, Z. Detection of Floricane Raspberry Shrubs from Unmanned Aerial Vehicle Imagery Using YOLO Models. Agriculture 2026, 16, 664. https://doi.org/10.3390/agriculture16060664

AMA Style

Kapłan M, Buczyński K, Jarosz Z. Detection of Floricane Raspberry Shrubs from Unmanned Aerial Vehicle Imagery Using YOLO Models. Agriculture. 2026; 16(6):664. https://doi.org/10.3390/agriculture16060664

Chicago/Turabian Style

Kapłan, Magdalena, Kamil Buczyński, and Zbigniew Jarosz. 2026. "Detection of Floricane Raspberry Shrubs from Unmanned Aerial Vehicle Imagery Using YOLO Models" Agriculture 16, no. 6: 664. https://doi.org/10.3390/agriculture16060664

APA Style

Kapłan, M., Buczyński, K., & Jarosz, Z. (2026). Detection of Floricane Raspberry Shrubs from Unmanned Aerial Vehicle Imagery Using YOLO Models. Agriculture, 16(6), 664. https://doi.org/10.3390/agriculture16060664

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Detection of Floricane Raspberry Shrubs from Unmanned Aerial Vehicle Imagery Using YOLO Models

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection

2.1.1. Drone

2.1.2. Overview of the Research Area

2.2. Data Preprocessing

2.2.1. Image Scaling

2.2.2. Image Labeling

2.2.3. Preparing Datasets

2.3. Models Training

2.3.1. YOLO Models

2.3.2. Training Parameters

2.4. Evaluation of Trained Models

2.5. Hardware Configuration

3. Results

3.1. Evaluation of YOLO Models Trained and Tested on Corresponding Image Modalities

3.2. Cross-Domain Evaluation of YOLO Models Trained and Tested on Different Image Modalities

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI