A Taxonomy of Machine Learning for UAV-Enabled Precision Agriculture: A Structured Survey

Bae, Wan D.; Alkobaisi, Shayma; Safdar, Muhammad Farhan; Chouhan, Prachitee

doi:10.3390/agriengineering8060249

Open AccessReview

A Taxonomy of Machine Learning for UAV-Enabled Precision Agriculture: A Structured Survey

¹

Department of Computer Science, Seattle University, Seattle, WA 98122, USA

²

College of Information Technology, United Arab Emirates University, Sheik Khalifa Bin Zayed Street, Al Ain P.O. Box 15551, Abu Dhabi, United Arab Emirates

³

Faculty of Electronics and Information Technology, Warsaw University of Technology, Pl. Politechniki 1, 00-661 Warsaw, Poland

^*

Author to whom correspondence should be addressed.

AgriEngineering 2026, 8(6), 249; https://doi.org/10.3390/agriengineering8060249

Submission received: 21 February 2026 / Revised: 4 June 2026 / Accepted: 5 June 2026 / Published: 18 June 2026

(This article belongs to the Section Computer Applications and Artificial Intelligence in Agriculture)

Download

Browse Figures

Versions Notes

Abstract

Precision agriculture increasingly relies on machine learning applied to high-resolution data acquired by unmanned aerial vehicles (UAVs) to support crop monitoring, stress detection, and yield forecasting. This survey presents a structured review of machine learning methods for UAV-enabled precision agriculture and organizes over 100 peer-reviewed studies within a unified four-dimensional taxonomy defined by sensing modality, data type, model family, and analytical task. The taxonomy enables systematic comparison across RGB, multispectral, hyperspectral, LiDAR, and IoT data sources and across classical machine learning, deep learning, hybrid sequential models, and emerging transformer-based architectures. We analyze how modeling choices interact with data characteristics to influence robustness, cross-environment generalization, computational efficiency, and deployment feasibility on UAV and edge platforms. Recurring challenges include limited labeled data, domain shift across seasons and fields, multimodal heterogeneity, occlusion, and real-time processing constraints. We identify emerging research directions, including data-efficient learning, representation-level multimodal fusion, domain adaptation, lightweight architectures for embedded deployment, and uncertainty aware decision support. By formalizing the landscape through a unified taxonomy, this survey provides a foundation for designing scalable, robust, and deployable machine learning systems for next-generation precision agriculture.

Keywords:

precision agriculture; unmanned aerial vehicles (UAVs); agricultural sensing systems; machine learning; crop monitoring; yield prediction; multimodal data fusion

1. Introduction

Precision agriculture (PA) increasingly depends on unmanned aerial vehicles (UAVs) and data-driven modeling systems to enable high-resolution crop monitoring, stress detection, and yield forecasting. Advances in sensing platforms, including RGB, multispectral, hyperspectral, thermal, and LiDAR (Light Detection and Ranging) sensors, combined with machine learning and deep learning techniques, have transformed how agricultural data are collected, processed, and interpreted. These technologies support scalable data-driven decision making leading to improved productivity, resource efficiency, and environmental sustainability.

Beyond serving as an application domain, PA presents a set of characteristics that actively shape the development of artificial intelligence (AI) methods. Agricultural environments expose AI systems to strong non-stationarity across seasons, cultivars, and management practices; extreme intra class variability caused by growth stages and phenology; and severe data imbalance between healthy and stressed conditions. These properties challenge assumptions commonly made in benchmark-driven computer vision and time-series modeling, where data distributions are often static and labels are abundant.

Existing AI approaches in PA are often limited in scope, focusing on specific sensing modalities, individual crops, or isolated analytical tasks such as segmentation, pest and disease detection, fruit and bloom count, and yield prediction. Many surveys emphasize either model architectures or sensing technologies but do not systematically connect data characteristics, model design, and downstream agricultural applications. As a result, it remains difficult to compare approaches across studies or to understand how methodological choices interact with real-world deployment constraints in UAV-enabled precision agriculture. In view of these limitations, the following research questions were formulated:

How do different AI model families (e.g., convolutional neural networks (CNNs), transformers, recurrent neural networks (RNNs), and classical models) perform across UAV-based sensing modalities and agricultural tasks?
What factors limit the reliability of AI-based predictions under varying real-world agricultural conditions?
How can UAV-based AI systems be translated into actionable insights for crop monitoring and interventional management?

Unlike previous surveys that primarily organize the literature around individual sensing platforms, crop types, or model architectures, this review introduces a unified four-dimensional taxonomy that jointly connects sensing modality, data type, model family, and analytical task within a single analytical framework. This structure enables systematic comparison across studies that would otherwise remain disconnected, including segmentation, detection, counting, and yield prediction pipelines operating under different sensing and deployment conditions. The proposed taxonomy also provides practical guidance for designing scalable and operationally feasible UAV-enabled agricultural analytics systems.

The key contributions of this survey are as follows:

We propose a unified taxonomy for AI-driven precision agriculture that integrates four complementary dimensions: sensing modality, data type, model family, and analytical task, enabling systematic cross-study comparison.
We identify structural challenges including data scarcity, annotation cost, domain shift, limited cross-modal integration, and deployment constraints that limit robustness and scalability.
We outline emerging research directions, including data-efficient learning, domain adaptation, representation-level multimodal fusion, synthetic data generation, and lightweight architectures for real-time deployment.

The literature reviewed in this survey was collected primarily from IEEE Xplore, ACM Digital Library, Scopus, Web of Science, ScienceDirect, and MDPI databases using combinations of keywords related to UAVs, precision agriculture, machine learning, deep learning, crop monitoring, disease detection, segmentation, fruit counting, and yield prediction. The survey focuses primarily on peer-reviewed journal and conference papers published between 2015 and 2025, with emphasis on studies involving UAV-enabled sensing and AI-driven analytical pipelines. Papers were selected based on their relevance to at least one of the four taxonomy dimensions introduced in this survey: sensing modality, data type, model family, or analytical task. More than 100 studies were analyzed across segmentation, pest and disease detection, bloom and fruit counting, and yield prediction tasks, spanning RGB, multispectral, hyperspectral, thermal, LiDAR, satellite, and IoT-based sensing modalities.

The remainder of this paper is organized as follows. Section 2 provides background on precision agriculture, discusses key challenges, and introduces our taxonomy. Section 3 reviews sensing modalities, data-acquisition strategies, and data-preparation techniques. Section 4 summarizes segmentation methods. Section 5 reviews pest and disease detection. Section 6 covers bloom detection, fruit counting, and yield prediction. Section 8 synthesizes cross-cutting challenges and emerging research directions, and Section 9 concludes the survey.

2. Background, and Taxonomy

2.1. Background and Research Gap

The evolution of information and communication technologies, remote sensing, and internet of things (IoT) has dramatically expanded the range of data available for precision agriculture [1,2]. UAVs and ground-based sensors provide high-resolution measurements of crop and soil conditions, while satellite platforms offer broad spatial and temporal coverage. Although these technologies enable data-driven decision making, they also introduce several challenges that directly impact the design and deployment of deep learning models.

Data acquisition is central to both deep learning (DL) models and IoT systems in agriculture. Sensors, UAVs, and other devices collect diverse data on pests, soil properties, canopy structure, crop yields, and environmental variables such as temperature, light, and humidity [2,3]. Studies on smart agriculture systems [4,5,6] emphasize the importance of reliable sensing infrastructures and communication networks for real-time monitoring and control.

High-quality annotation is equally critical. Accurate labels enable supervised DL models to learn robust representations for tasks such as pest detection, disease diagnosis, and crop health monitoring [7,8]. For UAV imagery, the annotation process is often labor intensive and requires domain expertise, particularly when dealing with subtle symptoms or occluded plant parts.

Several works highlight the difficulties of vegetation stress detection using UAVs, where models must capture both intra field variability and temporal dynamics [9,10]. Large-scale, well-annotated datasets are rarely available, and models trained on small datasets tend to overfit or fail to generalize to new fields. To mitigate these issues, researchers employ data augmentation and regularization strategies such as geometric transformations and dropout layers [11]. The use of high-resolution imagery and appropriate preprocessing steps is particularly important for building reliable prediction models [12].

Data quality issues remain a recurring challenge. Agricultural data vary widely across crop types, regions, and seasons, leading to covariate shifts that degrade model performance. Incremental or phased adoption of advanced sensing and annotation technologies has been proposed as a practical strategy to improve data quality over time and support the sustainable deployment of PA systems [13].

In summary, the following research gap continues to shape the development of AI-based solutions in UAV-enabled precision agriculture:

Data variability and generalization: Although UAV sensing enables high-resolution monitoring, significant variability across environments, crop varieties, and management practices limits the generalizability of machine learning (ML) models.
Data quality and limited annotated datasets: Many deep learning approaches rely on large, consistently annotated datasets, which remain scarce and costly to obtain, particularly for fine-grained agricultural tasks.
Operational deployment and scalability: Practical deployment requires the robust integration of sensing platforms, data infrastructures, and analytical models, which is still evolving in many precision agriculture systems.

This preliminary information about the research gap motivates the methodological choices discussed in subsequent sections and underscores the need for a structured taxonomy that connects sensing modalities, data types, model families, and analytical tasks.

2.2. Taxonomy of UAV-Based Sensing and Machine Learning in Precision Agriculture

The diversity of sensing platforms, data types, model architectures, and agricultural applications makes it challenging to compare existing approaches in a systematic manner. To provide a unifying perspective, we introduce a taxonomy that organizes UAV-based sensing and machine learning methods along four complementary dimensions: (1) sensing modality, (2) data type, (3) model family, and (4) analytical task. This taxonomy links upstream sensing and data preparation steps in Section 3 with downstream segmentation, detection, counting, and prediction methods discussed in later sections. It also provides the structural foundation for the survey, with each subsequent section mapped explicitly to one or more of these four dimensions.

Several recent surveys review deep learning applications in agriculture or remote sensing independently; however, they typically organize the literature along a single dimension, such as model architecture, crop type, or sensing platform. In contrast, the taxonomy proposed in this survey integrates four complementary dimensions including sensing modality, data type, model family, and analytical task into a unified framework. This multidimensional structure enables systematic comparison across studies that would otherwise appear disconnected, such as UAV-based object detection and satellite-driven yield forecasting.

Unlike model centric surveys that emphasize architectural trends, or remote sensing reviews that focus primarily on sensor characteristics, our taxonomy explicitly links upstream sensing infrastructure and data preparation choices to downstream learning models and decision support objectives. This integrated perspective reveals underexplored combinations of sensing modalities and model families and provides a structured foundation for analyzing and designing scalable, end-to-end data-driven systems in precision agriculture.

Compared to previous reviews that focus primarily on algorithms or crop-specific case studies, this survey integrates sensing infrastructure, data structure, modeling approaches, and analytical objectives within a unified engineering framework. By bridging sensing technologies and machine learning methodologies, the proposed taxonomy supports the design and evaluation of deployable, robust, and scalable agricultural systems.

1.: Sensing modality: AI-enabled precision agriculture relies on heterogeneous sensing platforms that vary widely in spatial, spectral, and temporal resolution. UAV-mounted RGB, multispectral, hyperspectral, thermal, and LiDAR sensors provide high spatial detail for canopy structure, plant health assessment, and fine-scale monitoring [3,14]. Satellite platforms extend temporal and regional coverage, offering multi-temporal vegetation indices and spectral diagnostics used extensively for crop health and yield modeling [2]. Ground-based IoT sensors, including soil moisture probes, weather stations, and in-field cameras provide micro-environmental and crop-level observations that complement airborne and satellite sensing [15]. Together, these modalities form the multi-resolution sensing backbone of modern agricultural analytics.
2.: Data type: The raw data produced by these sensing modalities differ in structure, dimensionality, and temporal properties. We categorize data types into five groups: (i) single frame 2D imagery, (ii) multi-temporal image sequences used for phenological monitoring and yield prediction [16], (iii) 3D point clouds derived from LiDAR or structure from motion photogrammetry [14], (iv) tabular sensor and environmental streams from IoT systems, and (v) fused multimodal datasets combining imagery, spectral indices, and ground measurements [17]. Each data type imposes different requirements on preprocessing pipelines, feature extraction, and model selection.
3.: Model family: A wide range of machine learning and deep learning models have been applied to agricultural imagery and sensor data. Classical methods such as support vector machines (SVMs), random forests, and fuzzy rule-based systems remain competitive when datasets are small or interpretability is required. Deep learning models dominate imagery-driven tasks, including convolutional neural networks (CNNs) [18], residual networks (ResNet) [19], encoder–decoder architectures such as U-Net [20], region-based detectors including Faster region-CNN (Faster R-CNN) and Mask Region-CNN (Mask R-CNN) [21,22], and one-stage detectors such as you only look once (YOLO) variants [23,24]. Sequential models (long short-term memory (LSTM) [25], Convolutional LSTM (ConvLSTM) [26]) and hybrid architectures (CNN–SVM, CNN–LSTM, and stacking ensembles) further extend these capabilities. Recent advances in attention-based and transformer architectures further expand this model family dimension, offering alternatives to convolutional and recurrent networks for modeling long-range spatial and temporal dependencies in agricultural data.
4.: Analytical task: The final dimension of the taxonomy organizes methods by their analytical objective: segmentation (e.g., canopy, leaf, or disease region delineation), object detection (e.g., pests, fruits, and flowers), classification (e.g., disease categories, maturity stages), counting (e.g., bloom or fruit load), and regression-based yield prediction. These task categories correspond directly to the main sections of the paper, where each analytical task is reviewed within the context of its typical sensing modalities, data types, and model families.

The four-dimensional taxonomy of UAV-enabled machine learning in precision agriculture is illustrated in Figure 1. Each reviewed study can be systematically mapped onto this framework, enabling a structured comparison across sensing modalities, data types, model families, and analytical tasks. A clear pattern emerges from the surveyed literature; CNN- and YOLO-based architectures performance depends on UAV-based RGB imagery applications for real-time detection and segmentation tasks, as reported in studies on pest and disease monitoring [7,27,28]. In contrast, transformer-based and hybrid architectures are more frequently explored in multispectral and hyperspectral settings, where global contextual modeling is beneficial but still limited by data availability and computational cost [29].

Across sensing modalities, RGB UAV imagery remains the most widely adopted due to its high spatial resolution and ease of acquisition, particularly for detection and counting tasks, whereas multispectral and hyperspectral data are more strongly associated with stress analysis and yield-related prediction tasks due to their spectral richness [30,31]. Similarly, lightweight CNN and YOLO variants are consistently preferred for edge and UAV deployment scenarios, while computationally intensive transformer models are typically evaluated in offline or cloud-based settings.

This comparative mapping also highlights a recurring limitation in the literature; for example, most studies rely on single modality pipelines, whereas limited work explores multimodal integration across UAV, satellite, and IoT data streams. Existing multimodal approaches are often restricted to late fusion or feature concatenation strategies, indicating a clear gap in representation-level fusion frameworks capable of cross-scale learning [32,33]. These evidence-based patterns justify the taxonomy structure and highlight key research gaps alongside its experimental benefits for future investigation.

Table 1 translates the four-dimensional taxonomy into practical model selection guidance. It shows that analytical task, sensing modality, and deployment constraints jointly determine appropriate model families. Lightweight models and classical approaches remain attractive when onboard computation, power, and bandwidth are limited, whereas transformer-based, multimodal, and temporal models are more suitable when richer data and greater computational resources are available. This matrix also reveals underexplored combinations, including lightweight transformer models for hyperspectral UAV imagery and multimodal fusion models that jointly integrate UAV, satellite, and IoT observations.

2.3. End-to-End Framework of UAV-Based Machine Learning in Precision Agriculture

While the taxonomy introduced in Section 2.2 organizes existing research along four complementary dimensions, practical precision agriculture systems operate as integrated, end-to-end pipelines. To complement the taxonomy, we present a conceptual framework that illustrates how the sensing infrastructure, data-processing stages, and analytical models interact within a unified operational workflow.

Figure 2 depicts the overall analytical pipeline from sensing to task-specific modeling of UAV-based machine learning systems in precision agriculture. The framework begins with heterogeneous sensing modalities, including UAV-mounted RGB, multispectral, hyperspectral, thermal, and LiDAR sensors, as well as satellite imagery and ground-based IoT measurements. These sensing systems generate heterogeneous raw data streams that require preprocessing, normalization, feature extraction, dimensionality reduction, and multimodal integration.

Segmentation often serves as an intermediate representation stage, partitioning imagery into canopy, leaf, fruit, or lesion regions that facilitate downstream reasoning. Task-specific models are subsequently applied, including pest and disease detection, bloom identification, fruit counting, and yield prediction. Depending on the task, models range from classical machine learning approaches to deep convolutional, recurrent, and attention-based architectures.

Finally, model outputs feed into decision support systems that enable real-time UAV deployment, edge-based inference, autonomous spraying, robotic harvesting, or farm-level yield forecasting. Across all stages, system-level constraints such as limited labeled data, domain shift across seasons and regions, computational limits on UAV platforms, and multimodal data heterogeneity influence model design and deployment feasibility.

This end-to-end perspective highlights the interdependencies among sensing infrastructure, data characteristics, modeling choices, and operational objectives, reinforcing the need for integrated system design rather than isolated algorithmic improvements.

3. Data Acquisition and Preprocessing

Robust AI pipelines in precision agriculture depend heavily on the quality, diversity, and structure of the input data. As outlined in Section 2.2, sensing modalities and data types directly influence downstream model design, feature extraction strategies, and analytical capabilities. This section reviews the major sensing platforms used in AI-driven agriculture and synthesizes common preprocessing and data preparation techniques that support segmentation, detection, counting, and prediction tasks. The sensing and data preparation components of the taxonomy and the overall pipeline are illustrated in Figure 1 and in Figure 2, respectively.

3.1. Data Acquisition

3.1.1. UAV-Based Sensing

UAVs have become central to precision agriculture due to their ability to capture high-resolution, flexible, and timely imagery. Modern platforms support RGB, multispectral, hyperspectral, thermal, and LiDAR payloads, enabling detailed characterization of canopy structure, crop vigor, and micro-environmental variability [1,3]. High-resolution RGB imagery is widely used for individual tree crown detection and orchard mapping, often combined with semi-supervised or weakly supervised models to leverage limited ground truth [34].

LiDAR and structure from motion (SfM) photogrammetry provide 3D reconstructions of tree height, canopy volume, and stand density. For example, Ref. [14] fused LiDAR point clouds with multispectral imagery using a PointNet++ architecture for tree species and health classification. UAVs have also been deployed as mobile data collectors within wireless sensor networks (WSNs), extending the spatial reach of ground sensors and supporting integrated monitoring frameworks [35].

3.1.2. Satellite-Based Sensing

Satellite imagery complements UAV data by providing broader spatial coverage and longer temporal continuity. Multispectral and hyperspectral platforms support computation of vegetation indices such as NDVI and EVI, which are used for crop health monitoring, stress detection, and yield estimation. Multi-temporal imagery enables the modeling of phenological trends. For instance, Ref. [36] applied deep learning to WorldView-3 and PlanetScope data for field-scale yield prediction, while Ref. [16] used a hybrid LSTM-1D CNN model to estimate rice yields from satellite-derived time series.

3.1.3. Ground-Based Sensors and IoT

Ground-based IoT sensors provide high-frequency, field-level measurements of soil moisture, temperature, humidity, and radiation variables that are often not captured directly by UAV or satellite platforms. These measurements offer localized context for interpreting imagery and support applications such as stress detection, irrigation scheduling, and microclimate assessment. IoT deployments range from low-cost infrared and near-infrared probes [37] to multilayer systems featuring wireless sensor networks, edge devices, and cloud-connected infrastructures for real-time monitoring and actuation [15]. When integrated with aerial and satellite imagery, IoT measurements improve the completeness and robustness of datasets used in downstream ML and DL models [38,39].

Together, UAV, satellite, and IoT sensing systems provide complementary spatial, temporal, and spectral information, enabling multiscale, multimodal datasets that support the full range of analytical tasks reviewed in this survey. These raw data streams require substantial preprocessing, normalization, and feature engineering before they can be effectively used by ML and DL models as described in the next section.

Different sensing modalities provide complementary information and are suited to specific agricultural tasks. RGB imagery is widely used for detection and counting tasks due to its high spatial resolution and availability, while multispectral and hyperspectral data are more effective for stress detection and disease analysis by capturing spectral signatures beyond the visible range. LiDAR data, in contrast, provide structural information that is particularly useful for canopy modeling. In practice, combining these modalities can improve performance but requires careful alignment and fusion strategies to ensure consistency across spatial and temporal scales.

3.2. Data Preparation and Preprocessing

Data preparation is a critical stage (shown as a third layer of the framework in Figure 2) that transforms heterogeneous raw inputs into formats suitable for learning algorithms. Preprocessing workflows typically include normalization, feature extraction, noise reduction, augmentation, dimensionality reduction, and multimodal integration. Table 2 summarizes the representative techniques and associated studies.

3.2.1. Normalization and Feature Extraction

Normalization reduces variability introduced by illumination changes, sensor characteristics, and flight configurations. UAV spectral data normalization has been used for land-cover and crop discrimination [40], while pixel-level normalization improves crop weed classification in row crops [41]. Applications in orchards and vineyards benefit from color-based normalization techniques that correct shadows and phenological variation [42,43]. In deep neural networks, batch normalization [50] is routinely applied to stabilize training and accelerate convergence.

Feature extraction transforms raw imagery or sensor data into representations that emphasize relevant biological or structural cues. Vegetation indices (e.g., NDVI and ExG) are widely used to distinguish vegetation from soil and to characterize canopy vigor [40,41]. Geometric and texture features capture canopy shape, fruit morphology, and spatial patterns in orchards and vineyards [42,44]. Deep models, including CNNs, ResNets, and region proposal networks, automatically learn hierarchical feature representations and have demonstrated strong performance in fruit detection, weed mapping, and disease diagnosis [10,19,21].

3.2.2. Data Cleaning, Augmentation, and Dimensionality Reduction

Data cleaning mitigates sensor noise, shadows, occlusion, and background clutter. Common approaches include segmentation-based masking, Gaussian and median filtering, and thresholding [40,42,45,46]. Augmentation strategies such as geometric transformations, photometric adjustments, and synthetic sample generation are essential for counteracting small or imbalanced datasets and improving generalization. Regularization techniques like dropout [11] further reduce overfitting in deep architectures.

High-dimensional data sources, especially multispectral and hyperspectral imagery, often require dimensionality reduction. Principal component analysis (PCA) and related methods reduce computational cost and highlight the most informative spectral features [47,48], improving both efficiency and downstream model accuracy.

3.2.3. Data Integration

Multimodal integration combines UAV imagery, satellite observations, and ground-based sensor streams to provide a holistic view of crop and environmental conditions. Fusion frameworks support applications such as stress detection, irrigation management, and yield estimation. For instance, Refs. [17,49] demonstrate the benefits of combining imagery with IoT or field-level sensor data for robust decision making in smart agriculture systems. Integrated datasets are particularly valuable for temporal modeling tasks and for capturing interactions between environmental conditions and crop responses.

Recent approaches focus on semi-supervised and data-efficient learning, particularly for hyperspectral UAV data, where labeled samples are limited but large volumes of unlabeled data are available. A transformer-adapted approach called Low-rank adaptation Local Attention Spectral Vision Transformer was proposed in [51] for low-data regimes, which combines a three-dimensional convolutional spectral front end with a local window-based self-attention mechanism. The study results revealed 99% accuracy, demonstrating the effectiveness of low-label-data regimes with substantially fewer parameters.

The effectiveness of machine learning models is strongly influenced by the characteristics of the sensing data. Hyperspectral imagery, for example, contains high-dimensional spectral information, which often requires dimensionality reduction or band selection to mitigate redundancy and improve computational efficiency, followed by models capable of learning joint spectral–spatial features [51,52]. In contrast, LiDAR data provide structural information in the form of point clouds [14], which must be aligned and fused with optical imagery to enable meaningful interpretation of crop geometry and canopy structure.

Multimodal data integration further introduces challenges related to spatial and temporal alignment, where simple feature-level fusion is often insufficient. In such cases, attention-based and transformer-driven models offer advantages by learning relationships across heterogeneous data sources. Therefore, the fusion may occur at the following three levels:

Early (input-level) fusion that combines data sources after preprocessing (e.g., stacking UAV imagery with IoT measurements). It is sensitive to spatial resolution mismatch and temporal misalignment between sensing modalities.
Representation-level fusion learns joint feature spaces across modalities. Recent approaches use attention-based mechanisms, such as cross-attention, to align UAV imagery with satellite time series or IoT signals, improving robustness to scale and temporal differences.
Late (decision) fusion combines outputs from independent models trained on different modalities. This approach is less affected by registration errors but does not capture deep cross-modal interactions.

Fusion pipelines highlight how these strategies are applied in practice. For example, cross-attention-based architectures align UAV imagery with satellite data at the feature level to address resolution mismatch. Hierarchical spatiotemporal pipelines integrate UAV observations with IoT time series by first aligning temporal signals and then refining spatial features. Graph-based fusion models represent fields, sensors, and observations as nodes, enabling flexible integration across heterogeneous data sources while handling missing or misaligned inputs.

Overall, the data-acquisition (Level 2 of Figure 2) and preprocessing (Level 3 of Figure 2) steps, reviewed here, form the foundation for the segmentation, detection, counting, and yield prediction methods examined in the following sections. They also directly support the four dimensions of the taxonomy introduced in Section 2.2, linking sensing modalities, data types, and model selection to analytical objectives.

4. Segmentation Methods and Architectures

Segmentation is a foundational analytical task in our taxonomy and plays a critical role in many downstream applications, including disease detection, fruit and bloom counting, canopy characterization, and yield modeling. By partitioning imagery into meaningful regions, segmentation produces structured spatial representations that enable object-level and pixel-level reasoning.

In UAV-based precision agriculture, segmentation approaches can be examined at two complementary levels: (i) methodological paradigms that define how image regions are delineated based on spectral, spatial, or learned features, and (ii) architectural implementations that operationalize these paradigms through specific neural or classical model designs.

Within the taxonomy in Figure 1, segmentation methods represent a key component linking sensing data to downstream analytical tasks. We first review representative methodological categories, followed by representative architectural instantiations widely adopted in agricultural applications.

4.1. Segmentation Methodological Paradigms

Segmentation methodologies in agricultural imagery span a spectrum from classical rule-based approaches to modern deep neural frameworks. For clarity, we organize the literature into five representative categories: threshold-based, color-based, texture- and shape-based, deep learning-based semantic and instance segmentation, and transformer-based segmentation models. This categorization highlights the methodological evolution from handcrafted feature extraction to learned hierarchical and attention-based representations.

4.1.1. Threshold-Based Segmentation

Threshold-based segmentation represents one of the earliest and most computationally efficient approaches in agricultural image analysis. These methods separate foreground objects from background regions using global or adaptive thresholds derived from pixel-intensity histograms or spectral index distributions. Otsu’s method [53] and related histogram-driven threshold selection strategies have been applied to fruit detection, canopy extraction, and soil vegetation separation tasks [8,48], where Ref. [48] reported 92.5% accuracy on deep learning oriented techniques.

Vegetation index thresholding using Normalized Difference Vegetation Index (NDVI), Excess Green (ExG), or related spectral metrics is particularly common in UAV-based crop background segmentation and early canopy mapping [2,3]. In orchard environments, adaptive thresholding with automatic parameter tuning has been proposed to improve fruit detection robustness under varying illumination and background conditions [54], where the authors achieved a final F₁ score of 93.1% and 99.3% in apple and pepper detection, respectively.

Although threshold-based approaches are attractive due to low computational cost and ease of deployment on embedded systems, their performance is highly sensitive to illumination variability, shadowing, soil reflectance, and canopy heterogeneity. Adaptive and locally optimized binarization techniques partially mitigate these issues [54,55], yet robustness across seasons and sensing modalities remains limited. Consequently, threshold-based segmentation is increasingly used as a preprocessing step rather than as a standalone solution in modern UAV-driven pipelines.

4.1.2. Color-Based Segmentation

Color-based segmentation leverages differences in Red–Green–Blue (RGB), Hue–Saturation–Intensity (HSI), and Hue–Saturation–Value (HSV) color spaces to distinguish vegetation, fruiting bodies, flowers, or water surfaces from surrounding backgrounds. Unlike intensity-only thresholding, chromatic transformations isolate vegetation-specific spectral responses and reduce sensitivity to grayscale illumination changes.

Hue-histogram-based threshold detection has been applied to UAV captured cropped fields to improve vegetation–soil separation under varying lighting conditions [56], revealed mean accuracy of 87.29% and standard deviation of 12.5%. Similarly, color index-based thresholding methods using Excess Green (ExG), Excess Red (ExR), and normalized RGB ratios have demonstrated effectiveness for the background–foreground segmentation of plant imagery [57], where results showed segmentation error of 6.62 ± 5.85% and a classification ratio of 1.93 ± 0.05. These approaches are particularly useful in crop–weed discrimination and early-stage canopy extraction.

Compared to global intensity thresholding, color space transformations improve discrimination under moderate lighting variation. However, chromatic distributions shift substantially with time of day, cloud cover, sensor calibration, and soil background variability. Consequently, color-based segmentation often requires normalization, radiometric calibration, or adaptive histogram equalization to maintain consistency across flights and growing conditions. While computationally efficient, purely color-driven approaches remain sensitive to environmental variability and are increasingly complemented by learned feature representations that capture structural and contextual cues.

4.1.3. Texture- and Shape-Based Segmentation

Texture- and shape-based segmentation methods extend beyond simple spectral cues by incorporating spatial patterns and geometric priors. Classical texture descriptors such as Local Binary Patterns (LBPs), Gabor filters, Gray-Level Co-occurrence Matrices (GLCMs), and Haralick features have been widely applied to plant extraction and vegetation segmentation in field imagery [58]. These descriptors capture micro-patterns and repetitive structures that distinguish foliage from soil, weeds, or diseased regions, particularly when color contrast alone is insufficient.

More recent surveys on aerial vegetation and microplot segmentation highlight the continued relevance of texture-driven representations in structured agricultural layouts [59]. In UAV imagery, texture cues can help delineate crop rows, canopy gaps, and stress patterns that exhibit consistent spatial repetition across plots.

Shape-based approaches, including Circular Hough Transform, contour-based filtering, watershed segmentation, and morphological operations, incorporate geometric constraints to improve the detection of approximately circular fruits such as apples, citrus, and tomatoes. By leveraging structural priors, these methods reduce false positives in moderately cluttered scenes and improve boundary delineation.

However, handcrafted texture and geometric descriptors are sensitive to scale variation, occlusion, and irregular canopy geometry. Performance degrades in highly heterogeneous field environments, motivating the transition toward deep neural architectures capable of learning hierarchical spatial features directly from data.

4.1.4. Deep Learning-Based Semantic and Instance Segmentation

Deep learning architectures have become the dominant paradigm for segmentation in precision agriculture due to their ability to learn multiscale spatial and contextual representations directly from raw imagery. Semantic segmentation models assign pixel-level class labels, whereas instance segmentation frameworks additionally distinguish individual plant objects within a scene. Encoder–decoder networks such as U-Net and its variants are widely adopted for leaf delineation, weed mapping, disease region segmentation, and crop row detection [7,33], with 90–97% accuracy demonstrated on average. Through hierarchical feature extraction and skip connections, these architectures preserve fine boundary detail while capturing broader contextual information.

Lightweight and task-specific adaptations improve suitability for UAV and edge deployment, emphasizing computational efficiency while maintaining competitive segmentation performance [60,61]. Their results showed 4.6% mean average precision improvement and 31.5 as average precision. Multi-task and feature fusion frameworks, including UniSteamNet, jointly optimize segmentation and recognition objectives to enhance structural coherence and reduce redundant computation [62]. Transfer learning pipelines and region proposal mechanisms further improve robustness under limited labeled data by leveraging pretrained visual backbones and hierarchical feature reuse [63], achieving 0.94 mean average precision and 0.89 as F₁ score.

Compared with handcrafted texture- and shape-based approaches, deep models demonstrate stronger resilience to heterogeneous backgrounds, illumination variability, and canopy complexity. Nevertheless, performance remains contingent on dataset scale, annotation fidelity, and domain alignment across seasons, cultivars, and sensing configurations. Computational cost and deployment constraints also pose practical challenges in real-time UAV and edge scenarios.

4.1.5. Transformer-Based Architecture and Segmentation

Recent advances in transformer-based architectures introduce new opportunities for segmentation in agricultural imagery by modeling long-range spatial dependencies and global contextual relationships. Unlike convolutional neural networks, which rely primarily on local receptive fields, transformers employ self-attention mechanisms that allow each image region to interact with all others [64]. This capability is particularly relevant for agricultural imagery, where canopy structures, disease patterns, and crop rows often exhibit spatial relationships that extend beyond local neighborhoods and vary substantially across scales.

Vision Transformers (ViT) [65], having reported accuracy of 97%, and hierarchical variants such as the Swin Transformer [66] with 87.3% accuracy and 53.5 mean Intersection Over Union (mIOU), have been explored as backbones for semantic and instance segmentation in remote sensing and agricultural contexts. Window-based self-attention and multiscale feature hierarchies enable these models to process high-resolution UAV imagery more efficiently than naïve global attention mechanisms. Transformer-based segmentation frameworks, including hybrid CNN–Transformer architectures such as SegFormer [29], which achieved 50.3% mIoU, have demonstrated strong performance in complex outdoor scenes characterized by heterogeneous backgrounds, occlusion, and variable illumination—conditions commonly encountered in orchards and field environments [67]. Recent work, such as Convolutional Meets Transformer Network (CMTNet) [52], demonstrates the effectiveness of hybrid CNN–Transformer architectures for UAV-based hyperspectral crop classification, enabling improved spectral–spatial feature representation across three datasets, including WHU-Hi-LongKou, WHU-Hi-HanChuan, and WHU-Hi-HongHu. The study results showed that the proposed CMTNet model achieved an accuracy of 99.58%, surpassing state-of-the-art methods, such as CMTixer. Likewise, a work in [68] integrated UAV-based semantic and Super-resolution reconstruction for tobacco fields, aiming to evaluate recent architectures, including Mamba-based models and transformers. The ensemble approach combining the transformer and mamba architectures achieved the highest mean IoU of 90.7%.

Despite their promise, transformer-based segmentation models remain comparatively underexplored in precision agriculture. High data requirements, computational cost, and the limited availability of large-scale labeled agricultural datasets pose practical challenges for widespread adoption. Hybrid architectures that combine convolutional feature extraction with transformer-based attention mechanisms represent a promising compromise, particularly in scenarios where global context is beneficial but computational resources are limited. Future research is likely to focus on data-efficient training strategies, multimodal fusion, and lightweight transformer variants suitable for deployment on UAVs and edge devices, aligning with the operational constraints of real-world agricultural systems.

While the preceding subsections categorize segmentation approaches according to methodological principles, practical performance in UAV-based precision agriculture ultimately depends on architectural design choices. Different network backbones, efficiency-oriented variants, feature fusion mechanisms, and transfer learning strategies operationalize these methodological paradigms in distinct ways. We therefore next examine representative segmentation architectures that instantiate these paradigms across diverse agricultural applications.

4.2. Representative Segmentation Architectures in UAV-Based Precision Agriculture

While methodological paradigms define segmentation principles, architectural implementations determine how these principles are operationalized in practice. Segmentation performance in UAV-based precision agriculture is closely tied to architectural design choices, particularly under conditions of occlusion, variable illumination, heterogeneous backgrounds, and limited annotated data. Table 3 summarizes representative deep learning architectures and their agricultural applications across crop monitoring, row recognition, fruit segmentation, and disease severity estimation.

4.2.1. Encoder–Decoder Architectures

Encoder–decoder networks remain the dominant backbone for agricultural segmentation tasks. U-Net and its variants have demonstrated strong boundary delineation capabilities across diverse applications, including cotton inter-row navigation [70] with reported mIoU of 96.85%, wheat farmland segmentation [71] with reported mIoU of 89.80%, fruit segmentation in orchard environments [69] with mIoU of 90.03%, and leaf disease delineation [72] with reported mIoU between 94 and 96%.

Multi-task extensions, such as the unified crop recognition and stem localization framework in [62], illustrate how segmentation can be jointly optimized with recognition objectives to enhance structural coherence and downstream localization accuracy. Specialized architectural modifications further improve robustness in disease monitoring scenarios. DF-U-Net integrates dynamic feature fusion and multispectral inputs to enhance wheat yellow rust severity segmentation in UAV imagery [73]. Such adaptations illustrate how architectural innovations are increasingly tailored to crop-specific spectral and structural characteristics.

4.2.2. Transformer and CNN Architecture

In comparison with CNN-based segmentation models, transformer architectures offer advantages in capturing global context and long-range dependencies, which are beneficial in complex field conditions with irregular crop patterns and background variability [29,82]. However, these improvements generally depend on the availability of large training datasets and significant computational resources. In many agricultural applications, where labeled data are limited, CNN-based models remain more practical due to their lower training cost and stable performance in data-constrained settings. In contrast, CNNs are inherently limited by local receptive fields and may struggle to capture long-range semantic relationships.

The attention mechanisms in transformers, including self-attention and multi-head attention, enable the model to learn relationships between distant regions in an image more effectively. As highlighted in [83], transformer-based encoder–decoder structures can better model global feature interactions compared to purely convolutional designs. In practice, transformer-based models are most beneficial in scenarios involving large-scale datasets, high-resolution imagery, or complex spatial patterns, while hybrid CNN–Transformer architectures provide a more feasible solution for typical agricultural settings with limited data and computational resources.

4.2.3. Lightweight and Efficiency-Oriented Architectures

To address UAV and edge deployment constraints, lightweight convolutional variants have been introduced to balance segmentation accuracy with computational efficiency. Efficient Dense modules of Asymmetric Convolution (EDANet) and related dynamic alignment strategies have been applied to crop lodging recognition and small-target detection scenarios [60,74], where authors reported 85–90% mIoU and 4.6% mean average precision improvements, respectively.

Similarly, ERFNet (Efficient Residual Factorized ConvNet) models have been adapted for crop row and instance-level field segmentation tasks, supporting real-time phenotyping and structural analysis in UAV imagery [61,77], with the reported mIoU of around 90%. These approaches emphasize architectural efficiency while preserving spatial fidelity. Recent studies have explored lightweight and hybrid architectures for UAV deployment, focusing on reducing computational overhead while maintaining accuracy, particularly for real-time hyperspectral and disease detection tasks. Ciem C. et al. [84] proposed the Online Hyperspectral Simple Linear Iterative Clustering (OHSLIC) framework, a lightweight architecture that achieves a dice score of 0.72 and processes 82 frames per second. Likewise, Zhang T. et al. [85] proposed a novel architecture based on Multiscale CNN State with feature fusion and Visual State Space that extract and integrate features hierarchically and at multiple levels, achieving pixel-level accuracy of 94.21% and a mean IoU of 91.52%.

4.2.4. Region Proposal and Transfer Learning Frameworks

Region Proposal Network (RPN) mechanisms and transfer learning strategies further enhance segmentation performance under limited agricultural datasets. Three-stage RPN-based frameworks and Mask R-CNN adaptations have been applied to crop and fruit segmentation, leveraging pretrained visual backbones to improve feature robustness and localization accuracy [63,78,79], with a reported F₁ score of 89% and a mean average precision of 96–97.5%. Feature reuse and fine tuning enable improved generalization across varying canopy structures and orchard layouts.

4.2.5. Hybrid and Alternative Architectures

Beyond canonical convolutional models, alternative architectures such as multilayer perceptron (MLP)-based segmentation and multi-sensor fusion frameworks have been explored for structured field environments [80,81], where the authors reported 86.2% mIoU and 92% accuracy, respectively. These models demonstrate that carefully engineered feature representations can remain competitive in constrained or semi-structured agricultural contexts.

Collectively, these representative architectures illustrate how model family selection interacts with sensing modality, data characteristics, and analytical objectives, reinforcing the multidimensional taxonomy introduced in Section 2.2. While convolutional encoder–decoder models remain dominant, emerging transformer-based segmentation frameworks (Section 4.1.5) introduce global attention mechanisms that may further enhance cross-scale contextual modeling in high-resolution UAV imagery.

Taken together, the methodological categories reviewed in Section 4.1 reveal a clear progression in segmentation strategies for UAV-based precision agriculture. Threshold- and color-based approaches emphasize computational simplicity and remain suitable for controlled or high-contrast environments. Texture- and shape-based methods introduce structural priors that improve robustness under moderate variability but remain constrained by handcrafted feature design. Deep learning paradigms substantially enhance resilience to heterogeneous backgrounds and complex canopy geometries, while transformer-based models extend contextual reasoning across broader spatial scales through attention mechanisms.

At the architectural level (Section 4.2), these paradigms are instantiated through encoder–decoder networks, lightweight efficiency-oriented variants, region proposal frameworks, and hybrid fusion models. Architectural selection determines how effectively methodological principles translate into operational performance under real-world UAV constraints, including limited onboard computation, variable illumination, and sparse annotations. The selection of segmentation strategy therefore reflects an integrated trade-off among computational efficiency, data availability, sensing modality, model complexity, and deployment constraints in UAV-enabled agricultural systems.

Overall, segmentation serves as a crucial intermediate representation within the broader analytics pipeline. It provides a structured representation of agricultural scenes by isolating crops, leaves, and regions of interest from complex backgrounds. These segmented outputs reduce noise and enable more precise localization of relevant features. Building on this representation, detection models can more effectively identify pests and disease symptoms within the extracted regions. This transition reflects the progression from pixel-level understanding to object-level analysis in UAV-based agricultural workflows. By transforming raw UAV imagery into structured spatial units, segmentation enables downstream tasks such as pest and disease detection, bloom and fruit counting, canopy characterization, and yield prediction, which are examined in the following sections.

5. Pest and Disease Detection Models

Pest and disease detection represents a core analytical task within our taxonomy, relying primarily on single frame RGB imagery from UAVs, ground mounted cameras, or IoT-enabled imaging systems. These tasks are typically formulated as object detection, pixel-level lesion segmentation, or image-level classification problems and are dominated by deep learning model families such as region-based detectors, one-stage detectors, encoder–decoder architectures, and fine-tuned convolutional networks. The detection tasks correspond to the analytical task dimension in the taxonomy shown in Figure 1. The following subsections review representative approaches for pest detection and disease detection separately, illustrating how model families align with specific sensing modalities, data types, and application objectives.

5.1. Pest Detection

Deep learning-based object detectors have become the standard for automated pest monitoring, driven by their ability to localize small objects under challenging outdoor conditions. UAV imagery, in-field cameras, and low power embedded systems supply the visual data, while model families such as Faster R-CNN, Mask R-CNN, and YOLO variants form the dominant detection backbone. These approaches illustrate the interplay between sensing modality (high-resolution UAV images), data type (object centric RGB frames), and analytical task (object detection) in the taxonomy. Ching-Ju et al. [27] reported an accuracy of 90% using Faster/Mask R-CNN models. Similarly, Ref. [28] achieved a mean average precision of 0.93 with YOLO variants. An F₁ score of 0.92 was reported by [86], while Ref. [30] obtained an F₁ score of 0.81. In addition, Saranya T. et al. [7] reported an accuracy of 96.58% using fine-tuned models. Table 4 summarizes representative pest detection models and their corresponding agricultural use cases.

5.1.1. Faster R-CNN and Mask R-CNN

Region-based detectors remain strong performers for small object detection due to their explicit region proposal mechanism. Faster R-CNN integrates a Region Proposal Network (RPN) with classification and regression heads, enabling the precise localization of small pests in complex orchard environments. Refs. [33,87] demonstrated its effectiveness on the Pest24 dataset, achieving an AP of 98.6%. Mask R-CNN extends this architecture with pixel-level segmentation branches, supporting tasks where both detection and lesion delineation are needed. For example, Ref. [27] used Mask R-CNN within an Artificial Intelligence of Things (AIoT) pipeline to detect and segment lesions on coffee leaves, enabling fine-grained health monitoring.

5.1.2. YOLO-Based Detectors

YOLO-based one-stage detectors prioritize speed and are well suited for UAVs and edge devices. YOLOv3 has been deployed for real-time pest identification in integrated AIoT systems [27] with 90% accuracy, while Tiny YOLOv3 was demonstrated on embedded drone platforms for fruit tree pest monitoring [28], with a reported 0.93 as mean average precision. Lightweight variants such as Ag-YOLO combine ShuffleNet-v2 backbones with YOLO heads to achieve high F1 score (92.05%) for precision spraying in field conditions [86]. Additional enhancements, such as DenseNet backbones [30] and YOLOv5 architectures [88] with mean average precision of 0.92, further improve robustness under occlusion and variable lighting.

5.1.3. VGG, ResNet, and Fine-Tuned CNNs

Convolutional backbones also remain widely used for pest classification when bounding boxes are not required. Fine-tuned VGG16 architectures have achieved strong performance for multiclass pest categorization, reaching 96.58% accuracy in [90]. ResNet-based classifiers have been incorporated as the final stage of multi-step detection pipelines [31], and hyperparameter optimized VGG variants have shown strong generalization across multiple pest classes [7] with the highest reported accuracy of 96.58%. These methods highlight how classical CNN families continue to complement object detection pipelines, particularly under limited training data. Table 5 summarizes representative disease detection models, linking model families to common agricultural use cases.

Model selection in UAV-based precision agriculture is closely tied to the nature of the analytical task and deployment constraints. Encoder–decoder architectures such as U-Net are particularly effective for crop and canopy segmentation due to their ability to preserve spatial resolution and capture fine grained pixel-level details, which are essential for delineating plant structures. Therefore, an improved performance of 90–95% was achieved in [71,72] with U-Net-based architecture. In contrast, one-stage detectors such as the YOLO series (including YOLOv3, Tiny YOLO) are better suited for real-time pest and disease detection, as they provide a favorable trade-off between detection accuracy of 90–93% and inference speed of 35–40 frames per second (FPS) [28,86], making them practical for onboard UAV deployment. Two-stage detectors such as Faster R-CNN generally achieve higher localization accuracy but require greater computational resources, i.e., inference speed of 5 fps with the method presented in [21], which limits their use in real-time applications and makes them more suitable for offline analysis. This indicates that model selection is not solely driven by accuracy but by the balance between precision, speed, and operational constraints.

5.2. Disease Detection

Disease detection encompasses both pixel-level lesion segmentation and image-level disease classification. The choice of model family often depends on the sensing modality: leaf-level imagery from handheld or in-field cameras favors encoder–decoder architectures, whereas canopy-scale UAV imagery motivates hybrid CNNs or transformer-based models. Table 5 summarizes representative disease detection models across segmentation and classification tasks, linking model families to common agricultural use cases. As in pest detection, these methods map directly onto the “model family’’ and “analytical task’’ dimensions of the taxonomy, with data types ranging from high-resolution RGB leaf images to multispectral UAV frames.

5.2.1. Inception ResNet-v2 and Hybrid Architectures

Hybrid deep networks combining inception modules and residual connections capture multiscale and hierarchical lesion patterns. In [89], an Inception ResNet-v2 architecture achieved 86.1% accuracy for coconut tree disease detection, demonstrating robustness to complex backgrounds and heterogeneous lighting. Such hybrid architectures are especially suitable for canopy-level monitoring where lesions appear at varying spatial scales.

5.2.2. U-Net and Encoder–Decoder Models

Encoder–decoder networks remain the dominant approach for precise lesion segmentation. U-Net and its variants have been widely applied to leaf-level disease mapping, spike or panicle segmentation, and early anomaly detection [7,33], reporting 96.58% accuracy with 0.5% loss. These models isolate diseased regions for downstream classification and quantification. For example, [91] employed U-Net for sorghum panicle segmentation, while U-Net variants achieved precision above 94% across diverse disease datasets [92,93]. The strong performance of encoder–decoder architectures reinforces their alignment with the segmentation-focused analytical tasks identified in our taxonomy.

5.2.3. 2D CNNs and VGG-Based Feature Extractors

2D CNNs have been used for disease classification and lesion localization in crops such as coconut and soybean [94], where accuracy reaches at 93.82%. VGG-19-based models, often combined with ensemble classifiers or PLS regression, provide strong baselines for small datasets or variable imaging conditions [95,96]. Mobile-ready implementations, such as those in [97,98], further demonstrate the practicality of CNN-based disease detection for real-time field diagnostics, with accuracy of 99.5% and 91.5%, respectively.

5.2.4. Classical Machine Learning Models

Although deep learning dominates contemporary work, classical ML models remain useful where data scarcity or interpretability is a priority. Support vector machine trained on HOG features have shown competitive performance for tomato and papaya leaf disease classification [99], showing 92.15% F₁ score. These approaches highlight that model families beyond deep learning still play a role, particularly in low-resource agricultural environments.

Pest and disease detection illustrate how different model families, sensing modalities, and data types align with the analytical tasks defined in our taxonomy. The outputs of pest and disease detection models provide critical inputs for higher-level agricultural tasks. Identifying affected regions and plant conditions supports subsequent analysis such as bloom detection, fruit counting, and yield estimation. These tasks rely not only on accurate detection but also on consistent spatial and temporal interpretation of field conditions. This progression highlights the shift from detection to quantitative assessment in precision agriculture. The resulting detection outputs also serve as critical inputs to downstream processes such as agriculture monitoring, and yield prediction, discussed in the next section.

6. Bloom Detection, Fruit Counting, and Yield Prediction

Bloom detection, fruit counting, and yield prediction form a sequential analytical pipeline in precision agriculture, with flowering intensity and fruit load serving as intermediate indicators of eventual yield [32,100]. These tasks map directly onto the taxonomy introduced in Section 2.2: they rely primarily on image-based and multi-temporal data types (similar to Figure 2) and draw on model families ranging from classical machine learning to deep convolutional and hybrid sequential networks that currently dominate operational implementations. The following applications represent downstream analytical tasks given in the typical framework illustrated in Figure 2.

6.1. Bloom Detection

Bloom detection supports phenological monitoring and early season yield forecasting. Deep learning models have significantly improved robustness under heterogeneous orchard conditions involving occlusion, clutter, and variable illumination. For instance, Ref. [101] applied DeepLab-ResNet with atrous convolutions and spatial pyramid pooling for multispecies bloom segmentation, followed by a region growing refinement (RGR) step to improve boundary localization.

Hybrid pipelines that pair CNN-based feature extraction with classical ML remain effective when annotated data are limited. Ref. [102] demonstrated that a fine-tuned CNN combined with an SVM classifier achieved an F1 score of 93.4% for apple bloom detection, outperforming HSV+SVM baselines. Consistent patterns appear across crops: CNN-based bloom stage classifiers achieved over 95% accuracy in lettuce fields [103], while SVMs trained on handcrafted features remain competitive in low-data scenarios [104].

6.2. Fruit Counting

Fruit counting supports in-season yield estimation, inventory planning, and thinning decisions. Earlier approaches relied on handcrafted color and shape cues. Ref. [105] used RGB/HSI segmentation with connected component analysis for apple counting, achieving

R^{2} = 0.85

and root mean squared error (RMSE) of 20 fruits per tree. Ref. [106] integrated SVMs, the Hough Transform, and spatial enhancement to detect green oranges with 97% accuracy.

Modern deep detectors now dominate orchard scale counting. YOLOv5 paired with Deep SORT achieved 99% accuracy for green tomatoes and 85% for red tomatoes in UAV imagery [107]. In mango orchards, MangoYOLO augmented with Kalman filtering and Hungarian matching addressed occlusions by tracking fruits across frames, yielding 62% agreement with harvest counts and outperforming dual view baselines [108]. CNN-based counting pipelines for lettuce and tomato routinely exceed 98% accuracy [103,109].

Recent work integrates detection with geometric modeling. Ref. [110] introduced a UAV-based workflow using HSV filtering, Density-Based Spatial Clustering of Applications with Noise (DBSCAN) clustering, and sphere fitting to infer counts from 3D structure; large clusters were refined via a secondary K-means step. Although this approach highlights the potential of geometric cues, performance remains highly sensitive to hyperparameters, illumination, and canopy density factors that are less limiting for modern deep detectors.

Unsupervised clustering methods such as K-means offer lightweight solutions when no labels are available, as shown for maize in [111], but they struggle with occlusion and overlapping fruit, restricting their scalability to orchard environments.

6.3. Fruit Yield Prediction

Yield prediction integrates spatial, temporal, and environmental information and remains one of the most challenging tasks in agricultural analytics. Hybrid temporal models such as LSTM-1D CNN architectures have demonstrated strong performance; for example, Ref. [16] achieved

R^{2} = 0.859

for rice yield prediction using multi-temporal satellite indices and temperature data. Classical ML approaches remain competitive when labeled data are scarce: fuzzy rule-based systems (FRBCSs) reached 94.29% accuracy for tomato ripeness estimation [112], and SVMs using canopy and fruit features produced

R^{2} = 0.7242

for apple yield prediction [113].

Tree-based models are broadly used for their ability to capture nonlinear interactions and spatial heterogeneity. Ref. [114] reported that Random Forest achieved

R^{2} = 0.71

for apple yield prediction, outperforming the mechanistic Carnegie–Ames–Stanford Approach (CASA) model, while XGBoost reached

R^{2} = 0.938

for wild blueberry yield prediction [115]. Linear models remain effective for small datasets, as shown by [116], who achieved

R^{2} = 0.6

for olive yield using UAV-derived NDVI, slope, and canopy features.

More recent studies explore multimodal and ensemble learning. A stacking ensemble combining ConvLSTM and SVR achieved

R^{2} = 0.8491

for strawberry yield and price forecasting [117], while BPNNs integrating vegetation indices, texture metrics, and 3D canopy morphology achieved

R^{2} = 0.83

–

0.88

[118]. Deep architectures also scale effectively to large datasets: for corn, ERT, RF, and deep networks reached RMSE values of 0.75–0.85 t/ha [87].

Table 6 summarizes representative modeling approaches for yield-related tasks, linking methodological categories to typical agricultural applications. Table 7 synthesizes the alignment between major yield prediction challenges and effective model families reported in the literature. These tables underscore the importance of aligning model choice with data characteristics and task requirements.

Table 8 and Table 9 synthesize representative approaches across a broad set of crops and analytical tasks, linking model families to reported performance outcomes in bloom detection, fruit counting, and yield prediction. This cross-crop perspective illustrates how the taxonomy’s four dimensions, sensing modality, data type, model family, and analytical task, interact in practical deployments. Across crops, deep detection architectures consistently dominate fruit counting tasks due to their robustness to occlusion and complex canopy structure, whereas yield prediction exhibits greater methodological diversity, reflecting its stronger dependence on temporal dynamics, environmental variability, and multimodal integration. These patterns suggest that task complexity and data structure, not merely model innovation, drive methodological selection in precision agriculture.

Taken together, bloom detection, fruit counting, and yield prediction demonstrate how sensing choices, data structures, and model architectures interact across the agricultural analytics pipeline. They also highlight recurring challenges including occlusion, spectral variability, limited labeled data, and domain shift across orchards and seasons that motivate the research directions discussed in Section 8.

Although convolutional and recurrent architectures currently dominate these tasks in operational settings, emerging attention-based and transformer models may offer new opportunities for modeling long-range spatial and temporal dependencies as larger multimodal agricultural datasets become available.

Performance improvements across agricultural tasks are governed by a combination of data quality, task complexity, and model data compatibility. In segmentation tasks (Section 4), accuracy is largely dependent on the availability of high-quality annotations and the spatial resolution of UAV imagery, as precise boundary delineation is required. Detection tasks (Section 5.1 and Section 5.2) are more sensitive to object scale, inference speed, occlusion, and background variability, where models must balance localization accuracy with computational efficiency, particularly for real-time applications. In contrast, yield prediction (Section 6.3) relies on temporal consistency and the integration of multiple data sources, including environmental and phenological information, making it more dependent on sequential modeling, i.e., LSTM and multimodal fusion [16]. These differences indicate that no single model is universally optimal, and performance gains are achieved by hybrid model capabilities with task-specific requirements and data characteristics.

7. System-Level Evaluation and Deployment Considerations

While Section 4, Section 5 and Section 6 reviewed machine learning approaches across individual analytical tasks including segmentation, pest and disease detection, and yield-related prediction, these methods are often evaluated in isolation. However, practical precision agriculture systems operate as integrated pipelines, where model performance must be considered alongside sensing conditions, data characteristics, and deployment feasibility.

To bridge this gap, this section provides a system-level synthesis of model families across tasks and examines their readiness for real-world deployment. Table 9 summarizes how major model families discussed throughout Section 4, Section 5 and Section 6 align with analytical tasks, sensing modalities, and representative architectures. Rather than focusing on individual studies, this synthesis highlights recurring design patterns and trade-offs across model classes, providing a unified view of the methodological landscape. Building on this synthesis, we then assess the extent to which current studies report deployment relevant metrics and identify limitations in translating algorithmic performance into operational systems.

Table 10 provides a model-centric synthesis of machine learning approaches across segmentation, detection, and prediction tasks. Several key observations emerge.

First, convolutional neural networks and their variants remain the dominant model family across most tasks, particularly for image-driven applications such as segmentation and disease detection. Lightweight CNN variants are commonly used when real-time or edge deployment is required, while transformer-based and hybrid architectures are increasingly explored for capturing complex spatial and spectral dependencies.

Second, model selection is closely tied to sensing modality; RGB-based UAV imagery dominates detection and segmentation tasks, whereas multispectral, hyperspectral, and multimodal data are more frequently associated with prediction and stress analysis. This reflects the trade-off between spatial resolution and spectral richness in agricultural sensing systems.

Third, each model family exhibits distinct strengths and limitations in terms of accuracy, computational complexity, data requirements, and deployment suitability. These trade-offs highlight the need to evaluate models beyond predictive performance, motivating the deployment focused analysis presented in the following subsection.

Deployment Assessment and Reporting Gaps

To assess the practical readiness of existing approaches, we reviewed about 15 representative studies discussed in this survey that has promosing research directions w.r.t. performance and methodology, covering tasks such as crop disease detection, UAV-based monitoring, and segmentation. The aim was to examine the reporting of deployment-related metrics, including inference speed, model complexity, memory footprint, power consumption, and bandwidth along with system-level performance. The analysis shows that a small fraction provides an indication of inference speed (typically FPS), while key indicators such as model size, Floating-point Operations Per Second (FLOPs), memory usage, and energy consumption are almost absent. System-level metrics, including throughput and end-to-end latency, are reported in fewer studies, primarily in IoT-based implementations. This highlights a clear gap between algorithmic development and real-world deployment considerations. Therefore, the authors propose a three-tier reporting standard for AI-based agricultural systems based on the overall assessment:

1.

Tier 1 (Minimum)

Model footprint that includes Parameter count, Model file size, Input resolution used during inference and FLOPs or Multiply–Accumulate operations (MACs) computed at the given resolution.
Inference speed including latency per image (reported as mean ± standard deviation), FPS derived from latency, and batch size and framework used for evaluation.

2.

Tier 2 (Recommended)

Memory and deployment that has Peak GPU virtual-RAM usage during inference, CPU RAM usage for edge- or CPU-based deployment, and quantization or optimization applied.
Real-world throughput that focuses on processing rate (images per hour), field coverage rate (e.g., hectares per hour for UAV/robot systems), Platform specifications such as UAV speed and altitude, and end-to-end pipeline latency (from data capture to final action).

3.

Tier 3 (Aspirational)

Energy and power that contains GPU or device power consumption during inference, energy usage per frame, and estimated battery life for mobile or field platforms.
Reproducibility that relies on the availability of public code and trained model weights, profiling tools used (e.g., ptflops) and reasonable comparison with baselines under identical hardware conditions.

8. Discussion, Challenges and Future Directions

The surveyed literature demonstrates substantial progress in AI and UAV-enabled precision agriculture across the full analytics pipeline, from sensing and preprocessing to segmentation, detection, counting, and yield prediction. However, beyond incremental performance gains, the field faces broader challenges related to robustness, scalability, and system integration. This section synthesizes methodological patterns across tasks, identifies systemic bottlenecks that limit generalization and deployment, and outlines research directions toward resilient operational systems.

8.1. Synthesis Across the Analytics Pipeline

A clear methodological stratification has emerged across analytical tasks, largely driven by differences in data structure and task requirements; convolutional architectures and region-based detectors dominate image-intensive operations such as segmentation, pest detection, bloom identification, and fruit counting. CNN backbones (e.g., VGG and ResNet), encoder–decoder models (e.g., U-Net variants), and one-stage detectors (e.g., YOLO families) provide a practical balance between accuracy and computational efficiency, particularly for UAV-based deployment. In contrast, yield prediction exhibits greater architectural diversity, frequently combining tree-based models, boosting methods, hybrid LSTM-CNN architectures, and ensemble strategies to capture nonlinear interactions and temporal dynamics [16,114,115].

In response to Research Question 1, CNN-based models remain effective for spatial tasks such as segmentation and detection, whereas transformers show advantages in modeling complex patterns in high-resolution and multimodal data. RNN-based architectures are primarily suited for temporal prediction tasks. Consequently, architectural selection is increasingly driven by task-specific data characteristics and deployment constraints rather than by generic model superiority.

Segmentation functions as a structural bridge within the pipeline. Reliable delineation of canopies, leaves, fruits, and lesions improves downstream detection, counting, and yield estimation [32,33]. However, segmentation models are frequently trained under limited environmental variability, raising questions about their robustness under domain shift across seasons, lighting conditions, and crop phenology.

An emerging shift in modeling philosophy involves the gradual adoption of attention-based and transformer architectures. Transformers enable broader spatial and temporal relational modeling, which is particularly relevant for agricultural imagery exhibiting long-range dependencies such as canopy structure and disease spread. Although these models remain comparatively underexplored due to data and computational requirements, hybrid CNN–Transformer architectures represent a promising direction for integrating local feature extraction with contextual reasoning.

8.2. Structural Challenges to Robust Deployment

Despite strong reported performance, several structural limitations constrain generalization and operational scalability. These challenges emerge across multiple layers of the pipeline shown in Figure 1.

Data limitations, annotations and domain generalization remain major barriers to reliable deployment. Pixel-level annotation for segmentation and lesion mapping is especially resource intensive and dependent on expert knowledge [7,9]. In addition, small and imbalanced datasets increase the risk of overfitting, particularly for rare crop conditions and early stress detection tasks. Consequently, models trained on specific orchards, cultivars, sensor configurations, or seasonal conditions often fail to generalize across new environments. Environmental variability across climate, soil conditions, and management practices further amplifies these limitations.

Although domain adaptation techniques have been explored [32,33], systematic cross-region or cross-season evaluation remains limited, creating a persistent gap between experimental validation and field-level reliability. These findings indicate that model reliability is strongly influenced by environmental variability, limited and imbalanced datasets, and differences in sensing configurations across regions and seasons.

Fragmented multimodal fusion also limits system-level coherence. Although UAV imagery is often combined with vegetation indices or environmental variables, unified architectures integrating UAV, satellite, IoT, and management data remain uncommon. Existing fusion strategies are typically feature level or post hoc rather than representation level, limiting the ability to learn shared cross-modal abstractions across spatial and temporal scales.

Deployment constraints represent a critical barrier to practical adoption. Many models are evaluated offline on high-performance hardware, whereas agricultural operations require real-time inference on UAVs, robots, or edge devices with limited power and memory. Efficient architecture design, model compression, and hardware aware optimization therefore remain essential yet comparatively underexplored relative to accuracy improvements [27,28,86].

Several studies have demonstrated field-level implementations using UAVs for crop monitoring [1,35,120], pest detection [28,82,92,94], and yield estimation [87,113,121,122], often relying on onboard or edge-based inference. However, balancing predictive accuracy with computational efficiency remains challenging due to battery, memory, and bandwidth limitations [1,35,120]. Consequently, many systems rely on lightweight models [29,99,113,123], partial onboard processing, and hybrid edge–cloud frameworks to maintain operational feasibility.

Evaluation and reproducibility limitations hinder comparative progress. Inconsistent metrics, heterogeneous experimental protocols, and limited public benchmarks restrict cross-study synthesis. These are largely based on average accuracy or mAP, which may not reflect performance under class imbalance or rare events common in agricultural data. The lack of standardized, large-scale, and publicly accessible benchmarks further limits reproducibility and direct comparison across studies. Many existing datasets remain geographically localized, crop specific, or privately collected, with inconsistent annotation protocols and limited support for cross-season or cross-field evaluation. Moreover, most studies do not report uncertainty estimates (i.e., intervals) for risk-aware decision making, limiting understanding of out-of-distribution generalization and operational reliability.

8.3. Emerging Research Directions

Addressing these structural limitations requires methodological advances aligned with practical deployment constraints.

Data-efficient learning represents a central priority. Self-supervised, semi-supervised, and contrastive pretraining strategies can leverage abundant unlabeled UAV and satellite imagery, reducing dependence on expensive annotations while improving robustness across crops and environments.

Domain adaptation and continual learning offer mechanisms for mitigating covariate shift. Approaches such as adversarial feature alignment, meta learning, and sensor-aware normalization may improve cross-region generalization. Continual learning frameworks are particularly relevant for agriculture, where environmental conditions evolve seasonally and interannually.

Representation-level multimodal fusion constitutes another critical frontier. Integrating UAV imagery, satellite time series, IoT measurements, and management metadata within unified architectures potentially integrating convolutional, recurrent, and transformer-based modules may enable hierarchical modeling across spatial, temporal, and spectral scales.

Transformer-based architectures are likely to play an expanding role as larger and more diverse datasets become available. Their capacity for modeling long-range spatial dependencies and cross-modal attention may be particularly beneficial for hyperspectral imagery, multi-temporal forecasting, and sensor integration. However, advances in efficient training and lightweight attention mechanisms will be necessary for practical deployment.

Edge-aware modeling and compression must also become first-class design considerations. Techniques such as pruning, quantization, knowledge distillation, and neural architecture search tailored to agricultural workloads can facilitate real-time inference under power and bandwidth limitations. Recent studies (2024–2025) have demonstrated the practical value of model compression for agricultural deployment while balancing accuracy and computational efficiency. For example, pruning and quantization have been applied to UAV-based weed detection systems [124], reducing model size by approximately 70% while maintaining a detection accuracy of about 90%. Similarly, knowledge distillation has been used for lightweight pest and disease identification [125], enabling deployment on resource constrained devices with reported accuracies between 94 and 96%. More recently, Yu Haiefang et al. [126] combined pruning and knowledge distillation for rapeseed pest detection, reducing the model size from 11.2 MB to 4.4 MB and floating-point operations from 28.3 G to 10.01 G on a Jetson Nano edge device, while achieving 93.2% accuracy and 92.7% recall. These findings suggest that compression strategies can substantially improve deployment efficiency without a proportional loss in predictive performance, making them promising for future UAV and edge-based agricultural systems.

Lastly, uncertainty aware and decision centric evaluation is essential for practical adoption. Evaluations should extend beyond predictive accuracy to include operational outcomes, such as reduction in input usage, yield improvement, and cost of false alarms. These measures are critical for translating model performance into practical agricultural value. Metrics such as macro-F₁, recall for minority classes, and PR-AUC provide more informative assessment in long-tailed scenarios.

8.4. Implications for Practice and Data Infrastructure

Advancing AI-driven precision agriculture requires coordinated progress in algorithms, sensing infrastructure, data governance, and interdisciplinary collaboration. Standardized data schemas, shared benchmarks, and open datasets would enable reproducible comparison and cross-regional studies [13,33]. The incremental adoption of sensing technologies, combined with robust data management pipelines, can facilitate sustainable integration into agricultural workflows.

A practical direction for future research is the integration of complementary learning strategies into a unified framework. For instance, self-supervised pretraining on large-scale unlabeled UAV data can be combined with semi-supervised fine tuning models to address limited annotations. Domain adaptation techniques, such as feature alignment across seasons and sensing conditions, can further improve model robustness. Likewise, for deployment, lightweight optimization strategies including pruning, quantization, and knowledge distillation can be incorporated to enable efficient inference on UAV platforms. Such a hybrid framework provides a feasible pathway toward robust and scalable AI systems in precision agriculture.

Overall, the field is transitioning from isolated proof-of-concept studies toward integrated, operational systems. Sustained progress will depend not only on architectural innovation but also on principled system design that accounts for multimodal data integration, domain variability, and deployment feasibility.

9. Conclusions

This survey reviewed more than one hundred studies on AI and UAV-enabled precision agriculture, synthesizing advances in sensing modalities, data types, model families, and analytical tasks through a unified taxonomy. While deep learning has achieved strong performance in segmentation, pest and disease detection, fruit counting, and yield prediction, many approaches remain constrained by small, localized datasets and limited cross-season or cross-region generalization. Fragmented multimodal fusion strategies and deployment constraints on UAV and edge platforms further hinder large-scale operational adoption.

Emerging methodological directions including self-supervised learning, domain adaptation, representation-level multimodal fusion, and lightweight architecture design offer promising pathways toward more robust and deployable systems. Real-world impact will depend not only on predictive accuracy but also on uncertainty awareness, interpretability, computational efficiency, and integration with decision support workflows. Progress therefore requires coordinated advances in sensing infrastructure, modeling frameworks, and interdisciplinary collaboration among AI researchers, agronomists, and practitioners.

From a broader AI perspective, precision agriculture exposes the structural limitations of current learning paradigms when confronted with non-stationary environments, sparse supervision, multimodal heterogeneity, and stringent deployment constraints. Addressing these challenges demands advances in continual adaptation, data-efficient training, multimodal representation learning, and model compression. Agriculture should thus be regarded not merely as an application domain for AI but as a catalyst for methodological innovation relevant to other dynamic, resource-constrained, and safety-critical real-world systems.

Future research should prioritize data-efficient and generalizable models, particularly by combining CNN-based architectures and transformer-based models for capturing complex spatial dependencies. RNN-based hybrid approaches remain suitable for temporal tasks such as yield prediction, while hybrid CNN–Transformer models offer a practical balance for multimodal agricultural data and YOLOv5 with Deep Sort outperformed on fruit counting. Progress depends on the development of large-scale, diverse benchmark datasets and standardized evaluation protocols, especially across land regions and growing agricultural conditions. Emerging directions such as multimodal transformers, semi-supervised learning, and edge AI architectures are expected to play a key role in enabling scalable and deployable precision agriculture systems.

Author Contributions

Conceptualization, W.D.B. and P.C.; methodology, W.D.B., S.A. and M.F.S.; software, P.C.; validation, W.D.B., S.A. and M.F.S.; formal analysis, W.D.B., S.A. and M.F.S.; investigation, W.D.B. and P.C.; resources, W.D.B.; data curation, P.C.; writing—original draft preparation, W.D.B. and P.C.; writing—review and editing, W.D.B., S.A. and M.F.S.; visualization, W.D.B. and M.F.S.; supervision, W.D.B.; project administration, W.D.B.; funding acquisition, W.D.B. and S.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The authors would like to thank the research team members for their contributions to this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Radoglou-Grammatikis, P.; Sarigiannidis, P.; Lagkas, T.; Moscholios, I. A compilation of UAV applications for precision agriculture. Comput. Netw. 2020, 172, 107148. [Google Scholar] [CrossRef]
Hunt, E.R., Jr.; Daughtry, C.S.T. What good are unmanned aircraft systems for agricultural remote sensing and precision agriculture? Int. J. Remote Sens. 2018, 39, 5345–5376. [Google Scholar]
Tsouros, D.C.; Bibi, S.; Sarigiannidis, P.G. A Review on UAV-Based Applications for Precision Agriculture. Information 2019, 10, 349. [Google Scholar] [CrossRef]
Khan, M.A.; Alqahtani, A.; Khan, A.; Alsubai, S.; Binbusayyis, A.; Ch, M.M.I.; Yong, H.S.; Cha, J. Cucumber leaf diseases recognition using multi level deep entropy-ELM feature selection. Appl. Sci. 2022, 12, 593. [Google Scholar] [CrossRef]
Aishwarya, R.; Yogitha, R.; Lakshmanan, L.; Maheshwari, M.; Suji Helen, L.; Nagarajan, G. Smart agriculture framework implemented using the internet of things and deep learning. In Biologically Inspired Techniques in Many Criteria Decision Making: Proceedings of BITMDM 2021; Springer: Berlin/Heidelberg, Germany, 2022; pp. 639–648. [Google Scholar]
Alrowais, F.; Asiri, M.M.; Alabdan, R.; Marzouk, R.; Hilal, A.M.; Gupta, D. Hybrid leader based optimization with deep learning driven weed detection on internet of things enabled smart agriculture environment. Comput. Electr. Eng. 2022, 104, 108411. [Google Scholar] [CrossRef]
Saranya, T.; Deisy, C.; Sridevi, S.; Anbananthen, K.S.M. A comparative study of deep learning and Internet of Things for precision agriculture. Eng. Appl. Artif. Intell. 2023, 122, 106034. [Google Scholar] [CrossRef]
Meshram, V.; Patil, K.; Meshram, V.; Hanchate, D.; Ramkteke, S. Machine learning in agriculture domain: A state-of-art survey. Artif. Intell. Life Sci. 2021, 1, 100010. [Google Scholar] [CrossRef]
Shin, J.; Mahmud, M.S.; Rehman, T.U.; Ravichandran, P.; Heung, B.; Chang, Y.K. Trends and prospect of machine vision technology for stresses and diseases detection in precision agriculture. AgriEngineering 2022, 5, 20–39. [Google Scholar] [CrossRef]
Sa, I.; Ge, Z.; Dayoub, F.; Upcroft, B.; Perez, T.; McCool, C. Deepfruits: A fruit detection system using deep neural networks. Sensors 2016, 16, 1222. [Google Scholar] [CrossRef] [PubMed]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Aggelopoulou, A.; Bochtis, D.; Fountas, S.; Swain, K.C.; Gemtos, T.; Nanos, G. Yield prediction in apple orchards based on image processing. Precis. Agric. 2011, 12, 448–456. [Google Scholar]
Kent Shannon, D.; Clay, D.E.; Sudduth, K.A. An introduction to precision agriculture. In Precision Agriculture Basics; American Society of Agronomy: Madison, WI, USA, 2018; pp. 1–12. [Google Scholar]
Briechle, S.; Krzystek, P.; Vosselman, G. Classification of tree species and standing dead trees by fusing UAV-based lidar data and multispectral imagery in the 3D deep neural network PointNet++. In Proceedings of the ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences; ISPRS: Hannover, Germany, 2020; Volume V-2-2020, pp. 203–210. [Google Scholar]
Jabraeil Jamali, M.A.; Bahrami, B.; Heidari, A.; Allahverdizadeh, P.; Norouzi, F. Some cases of smart use of the IoT. In Towards the Internet of Things: Architectures, Security, and Applications; Springer: Cham, Switzerland, 2020; pp. 85–129. [Google Scholar]
Jeong, S.; Ko, J.; Yeom, J.M. Predicting rice yield at pixel scale through synthetic use of crop and deep learning models with satellite data in South and North Korea. Sci. Total Environ. 2022, 802, 149726. [Google Scholar] [CrossRef] [PubMed]
Babu, S. A software model for precision agriculture for small and marginal farmers. In Proceedings of the 2013 IEEE Global Humanitarian Technology Conference: South Asia Satellite (GHTC-SAS); IEEE: New York, NY, USA, 2013; pp. 352–355. [Google Scholar]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 2002, 86, 2278–2324. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2015; Volume 9351, pp. 234–241. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [PubMed]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.c. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Proceedings of the Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2015; pp. 802–810. [Google Scholar]
Chen, C.J.; Huang, Y.Y.; Li, Y.S.; Chang, C.Y.; Huang, Y.M. An AIoT based smart agricultural system for pests detection. IEEE Access 2020, 8, 180750–180761. [Google Scholar] [CrossRef]
Chen, C.J.; Huang, Y.Y.; Li, Y.S.; Chen, Y.C.; Chang, C.Y.; Huang, Y.M. Identification of fruit tree pests with deep learning on embedded drone to achieve accurate pesticide spraying. IEEE Access 2021, 9, 21986–21997. [Google Scholar] [CrossRef]
Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
Tian, Y.; Yang, G.; Wang, Z.; Li, E.; Liang, Z. Detection of apple lesions in orchards based on deep learning methods of CycleGAN and YOLOv3-dense. J. Sens. 2019, 2019, 7630926. [Google Scholar] [CrossRef]
Li, L.; Zhang, S.; Wang, B. Plant disease detection and classification by deep learning—A review. IEEE Access 2021, 9, 56683–56698. [Google Scholar] [CrossRef]
He, L.; Fang, W.; Zhao, G.; Wu, Z.; Fu, L.; Li, R.; Majeed, Y.; Dhupia, J. Fruit yield prediction and estimation in orchards: A state-of-the-art comprehensive review for both direct and indirect methods. Comput. Electron. Agric. 2022, 195, 106812. [Google Scholar] [CrossRef]
Zualkernan, I.; Abuhani, D.A.; Hussain, M.H.; Khan, J.; ElMohandes, M. Machine learning for precision agriculture using imagery from unmanned aerial vehicles (uavs): A survey. Drones 2023, 7, 382. [Google Scholar] [CrossRef]
Weinstein, B.G.; Marconi, S.; Bohlman, S.; Zare, A.; White, E. Individual tree-crown detection in RGB imagery using semi-supervised deep learning neural networks. Remote Sens. 2019, 11, 1309. [Google Scholar] [CrossRef]
Popescu, D.; Stoican, F.; Stamatescu, G.; Ichim, L.; Dragana, C. Advanced UAV–WSN system for intelligent monitoring in precision agriculture. Sensors 2020, 20, 817. [Google Scholar] [CrossRef] [PubMed]
Sagan, V.; Maimaitijiang, M.; Bhadra, S.; Maimaitiyiming, M.; Brown, D.R.; Sidike, P.; Fritschi, F.B. Field-scale crop yield prediction using multi-temporal WorldView-3 and PlanetScope satellite data and deep learning. ISPRS J. Photogramm. Remote Sens. 2021, 174, 265–281. [Google Scholar] [CrossRef]
Aden, S.T.; Bialas, J.P.; Champion, Z.; Levin, E.; McCarty, J.L. Low cost infrared and near infrared sensors for UAVs. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2014, 40, 1–7. [Google Scholar] [CrossRef][Green Version]
Raghunandan, G.; Namratha, S.; Nanditha, S.; Swathi, G. Comparative analysis of different precision agriculture techniques using wireless sensor networks. In Proceedings of the 2017 4th International Conference on Electronics and Communication Systems (ICECS); IEEE: New York, NY, USA, 2017; pp. 129–133. [Google Scholar]
Quebrajo, L.; Perez-Ruiz, M.; Pérez-Urrestarazu, L.; Martínez, G.; Egea, G. Linking thermal imaging and soil remote sensing to enhance irrigation management of sugar beet. Biosyst. Eng. 2018, 165, 77–87. [Google Scholar] [CrossRef]
Rodríguez-Garlito, E.C.; Paz-Gallardo, A. Efficiently mapping large areas of olive trees using drones in Extremadura, Spain. IEEE J. Miniaturization Air Space Syst. 2021, 2, 148–156. [Google Scholar] [CrossRef]
Pérez-Ortiz, M.; Peña, J.; Gutiérrez, P.A.; Torres-Sánchez, J.; Hervás-Martínez, C.; López-Granados, F. A semi-supervised system for weed mapping in sunflower crops using unmanned aerial vehicles and a crop row detection method. Appl. Soft Comput. 2015, 37, 533–544. [Google Scholar] [CrossRef]
Natividade, J.; Prado, J.; Marques, L. Low-cost multi-spectral vegetation classification using an Unmanned Aerial Vehicle. In Proceedings of the 2017 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC); IEEE: New York, NY, USA, 2017; pp. 336–342. [Google Scholar]
Wan, P.; Toudeshki, A.; Tan, H.; Ehsani, R. A methodology for fresh tomato maturity detection using computer vision. Comput. Electron. Agric. 2018, 146, 43–50. [Google Scholar] [CrossRef]
Payne, A.B.; Walsh, K.B.; Subedi, P.; Jarvis, D. Estimation of mango crop yield using image analysis–segmentation method. Comput. Electron. Agric. 2013, 91, 57–64. [Google Scholar] [CrossRef]
Bulanon, D.M.; Kataoka, T.; Ota, Y.; Hiroma, T. AE—Automation and emerging technologies: A segmentation algorithm for the automatic recognition of Fuji apples at harvest. Biosyst. Eng. 2002, 83, 405–412. [Google Scholar] [CrossRef]
Hernandez, A.; Murcia, H.; Copot, C.; De Keyser, R. Towards the development of a smart flying sensor: Illustration in the field of precision agriculture. Sensors 2015, 15, 16688–16709. [Google Scholar] [CrossRef] [PubMed]
Alves, G.M.; Cruvinel, P.E. Big data environment for agricultural soil analysis from CT digital images. In Proceedings of the 2016 IEEE Tenth International Conference on Semantic Computing (ICSC); IEEE: New York, NY, USA, 2016; pp. 429–431. [Google Scholar]
Darwin, B.; Dharmaraj, P.; Prince, S.; Popescu, D.E.; Hemanth, D.J. Recognition of bloom/yield in crop images using deep learning models for smart agriculture: A review. Agronomy 2021, 11, 646. [Google Scholar] [CrossRef]
Singh, D.; Jain, N.; Jain, P.; Kayal, P.; Kumawat, S.; Batra, N. PlantDoc: A Dataset for Visual Plant Disease Detection. In Proceedings of the 7th ACM IKDD CoDS and 25th COMAD, Hyderabad, India, 5–7 January 2020; pp. 249–253. [Google Scholar]
Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
Zidi, F.A.; Boukhari, D.E.; Sellam, A.Z.; Ouafi, A.; Distante, C.; Bekhouche, S.E.; Taleb-Ahmed, A. LoLA-SpecViT: Local attention SwiGLU vision transformer with LoRA for hyperspectral imaging. Int. J. Appl. Earth Obs. Geoinf. 2025, 144, 104924. [Google Scholar] [CrossRef]
Guo, X.; Feng, Q.; Guo, F. CMTNet: A hybrid CNN-transformer network for UAV-based hyperspectral crop classification in precision agriculture. Sci. Rep. 2025, 15, 12383. [Google Scholar] [CrossRef] [PubMed]
Otsu, N. A Threshold Selection Method from Gray-Level Histograms. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef]
Zemmour, E.; Kurtser, P.; Edan, Y. Automatic Parameter Tuning for Adaptive Thresholding in Fruit Detection. Sensors 2019, 19, 2130. [Google Scholar] [CrossRef] [PubMed]
Cisternas, I.; Velásquez, I.; Caro, A.; Rodríguez, A. Systematic literature review of implementations of precision agriculture. Comput. Electron. Agric. 2020, 176, 105626. [Google Scholar] [CrossRef]
Hassanein, M.; Lari, Z.; El-Sheimy, N. A New Vegetation Segmentation Approach for Cropped Fields Based on Threshold Detection from Hue Histograms. Sensors 2018, 18, 1253. [Google Scholar] [CrossRef] [PubMed]
Castillo-Martínez, M.Á.; Gallegos-Funes, F.J.; Carvajal-Gámez, B.E.; Urriolagoitia-Sosa, G.; Rosales-Silva, A.J. Color index based thresholding method for background and foreground segmentation of plant images. Comput. Electron. Agric. 2020, 178, 105783. [Google Scholar] [CrossRef]
Hamuda, E.; Glavin, M.; Jones, E. A survey of image processing techniques for plant extraction and segmentation in the field. Comput. Electron. Agric. 2016, 125, 184–199. [Google Scholar] [CrossRef]
Mardanisamani, S.; Eramian, M. Segmentation of vegetation and microplots in aerial agriculture images: A survey. Plant Phenome J. 2022, 5, e20042. [Google Scholar] [CrossRef]
Zhu, G.; Zhu, F.; Wang, Z.; Yang, S.; Li, Z. EDANet: Efficient Dynamic Alignment of Small Target Detection Algorithm. Electronics 2025, 14, 242. [Google Scholar] [CrossRef]
Weyler, J.; Magistri, F.; Seitz, P.; Behley, J.; Stachniss, C. In-Field Phenotyping Based on Crop Leaf and Plant Instance Segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2022; pp. 2725–2734. [Google Scholar] [CrossRef]
Zhang, X.; Li, N.; Ge, L.; Xia, X.; Ding, N. A Unified Model for Real-Time Crop Recognition and Stem Localization Exploiting Cross-Task Feature Fusion. In Proceedings of the IEEE International Conference on Real-Time Computing and Robotics (RCAR); IEEE: New York, NY, USA, 2020; pp. 327–332. [Google Scholar] [CrossRef]
Zhao, W.; Yamada, W.; Li, T.; Digman, M.; Runge, T. Augmenting Crop Detection for Precision Agriculture with Deep Visual Transfer Learning—A Case Study of Bale Detection. Remote Sens. 2021, 13, 23. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 3–7 May 2021. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
Aleissaee, A.A.; Kumar, A.; Anwer, R.M.; Khan, S.; Cholakkal, H.; Xia, G.S.; Khan, F.S. Transformers in remote sensing: A survey. Remote Sens. 2023, 15, 1860. [Google Scholar] [CrossRef]
Tao, J.; Qiao, Q.; Song, J.; Sun, S.; Chen, Y.; Wu, Q.; Liu, Y.; Xue, F.; Wu, H.; Zhao, F. Deep Learning-Driven Automatic Segmentation of Weeds and Crops in UAV Imagery. Sensors 2025, 25, 6576. [Google Scholar] [CrossRef] [PubMed]
Yu, C.; Lin, D.; He, C. ASE-UNet: An orange fruit segmentation model in an agricultural environment based on deep learning. Opt. Mem. Neural Netw. 2023, 32, 247–257. [Google Scholar] [CrossRef]
Hao, J.; Tohti, G.; Geni, M. Cotton Crop Inter-row Navigation Path Recognition Method Based on an Improved U-Net Model. In Proceedings of the 2024 3rd International Conference on Cloud Computing, Big Data Application and Software Engineering (CBASE); IEEE: New York, NY, USA, 2024. [Google Scholar] [CrossRef]
Liu, G.; Bai, L.; Zhao, M.; Zang, H.; Zheng, G. Segmentation of wheat farmland with improved U-Net on drone images. J. Appl. Remote Sens. 2022, 16, 034511. [Google Scholar] [CrossRef]
Singh, G.; Al-Huqail, A.A.; Almogren, A.; Kaur, S.; Joshi, K.; Singh, A.; Bharany, S.; Hussen, S.; Rehman, A.U. Enhanced Leaf Disease Segmentation Using U-Net Architecture for Precision Agriculture: A Deep Learning Approach. Food Sci. Nutr. 2025, 13, e70594. [Google Scholar] [CrossRef] [PubMed]
Zhang, T.; Yang, Z.; Xu, Z.; Li, J. Wheat Yellow Rust Severity Detection by Efficient DF-UNet and UAV Multispectral Imagery. IEEE Sens. J. 2022, 22, 9057–9068. [Google Scholar] [CrossRef]
Su, Z.; Wang, Y.; Xu, Q.; Gao, R.; Kong, Q. LodgeNet: Improved rice lodging recognition using semantic segmentation of UAV high-resolution remote sensing images. Comput. Electron. Agric. 2022, 196, 106873. [Google Scholar] [CrossRef]
Jia, W.; Liu, J.; Lu, Y.; Liu, Q.; Zhang, T.; Dong, X. Polar-Net: Green fruit instance segmentation in complex orchard environment. Front. Plant Sci. 2022, 13, 1054007. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Zhang, F.; Wang, J.; Yang, H.; Zhang, W.; Li, J. Orchard Chestnut Visual Harvest Maturity Detection and Segmentation Using an Improved YOLO-Based Method. Agriculture 2026, 16, 456. [Google Scholar] [CrossRef]
Liu, X.; Qi, J.; Zhang, W.; Bao, Z.; Wang, K.; Li, N. Recognition method of maize crop rows at the seedling stage based on MS-ERFNet model. Comput. Electron. Agric. 2023, 211, 107964. [Google Scholar] [CrossRef]
Sun, G.; Wang, S.; Dong, L.; Du, Y. Crop image segmentation method based on improved mask RCNN. In Proceedings of the 4th International Conference on Management Science and Industrial Engineering, Chiang Mai, Thailand, 28–30 April 2022; pp. 436–442. [Google Scholar]
Wang, S.; Wei, L.; Zhang, D.; Chen, L.; Huang, W.; Du, D.; Lin, K.; Zheng, Z.; Duan, J. Real-time and resource-efficient banana bunch detection and localization with YOLO-BRFB on edge devices. Front. Plant Sci. 2025, 16, 1650012. [Google Scholar] [CrossRef] [PubMed]
Kang, H.; Wang, X. Semantic segmentation of fruits on multi-sensor fused data in natural orchards. Comput. Electron. Agric. 2023, 204, 107569. [Google Scholar] [CrossRef]
Mumtaz, S.; Alshehri, M.; Alqahtani, Y.; Alshahrani, A.; Alabdullah, B.; Alhasson, H.F.; Liu, H. Advanced Leaf Classification Using Multi-Layer Perceptron for Smart Crop Management. IEEE Access 2025, 13, 105579–105589. [Google Scholar] [CrossRef]
Xu, R.; Yu, J.; Ai, L.; Yu, H.; Wei, Z. Farmland pest recognition based on Cascade RCNN Combined with Swin-Transformer. PLoS ONE 2024, 19, e0304284. [Google Scholar] [CrossRef] [PubMed]
Yang, Z.; Li, J.; Shi, X.; Xu, Z. Dual flow transformer network for multispectral image segmentation of wheat yellow rust. In Proceedings of the International Conference on Computer, Artificial Intelligence, and Control Engineering (CAICE 2022), Zhuhai, China, 25–27 February 2022; SPIE: Bellingham, WA, USA, 2022; Volume 12288, pp. 119–125. [Google Scholar]
Cornelissen, C.; Leroux, S.; Simoens, P. Adaptive clustering for efficient phenotype segmentation of UAV hyperspectral data. In Proceedings of the Winter Conference on Applications of Computer Vision, Tucson, AZ, USA, 28 February–4 March 2025; pp. 459–468. [Google Scholar]
Zhang, T.; Wang, D.; Chen, W. Multiscale CNN-state space model with feature fusion for crop disease detection from UAV imagery. Front. Plant Sci. 2025, 16, 1733727. [Google Scholar] [CrossRef] [PubMed]
Qin, Z.; Wang, W.; Dammer, K.H.; Guo, L.; Cao, Z. Ag-YOLO: A Real-Time Low-Cost Detector for Precise Spraying with Case Study of Palms. Front. Plant Sci. 2021, 12, 753603. [Google Scholar] [CrossRef] [PubMed]
Kim, N.; Lee, Y.W. Machine Learning Approaches to Corn Yield Estimation Using Satellite Images and Climate Data: A Case of Iowa State. J. Korean Soc. Surv. Geod. Photogramm. Cartogr. 2016, 34, 383–390. [Google Scholar] [CrossRef]
Jintasuttisak, T.; Edirisinghe, E.; Elbattay, A. Deep neural network based date palm tree detection in drone imagery. Comput. Electron. Agric. 2022, 192, 106560. [Google Scholar] [CrossRef]
Ai, Y.; Sun, C.; Tie, J.; Cai, X. Research on recognition model of crop diseases and insect pests based on deep learning in harsh environments. IEEE Access 2020, 8, 171686–171693. [Google Scholar] [CrossRef]
Tian, D.; Han, Y.; Wang, B.; Guan, T.; Gu, H.; Wei, W. Review of object instance segmentation based on deep learning. J. Electron. Imaging 2022, 31, 041205. [Google Scholar]
Lin, Z.; Guo, W. Sorghum panicle detection and counting using unmanned aerial system images and deep learning. Front. Plant Sci. 2020, 11, 534853. [Google Scholar] [CrossRef] [PubMed]
Nanni, L.; Manfè, A.; Maguolo, G.; Lumini, A.; Brahnam, S. High performing ensemble of convolutional neural networks for insect pest image detection. Ecol. Inform. 2022, 67, 101515. [Google Scholar] [CrossRef]
Su, W.H.; Zhang, J.; Yang, C.; Page, R.; Szinyei, T.; Hirsch, C.D.; Steffenson, B.J. Automatic evaluation of wheat resistance to fusarium head blight using dual mask-RCNN deep learning frameworks in computer vision. Remote Sens. 2020, 13, 26. [Google Scholar] [CrossRef]
Tetila, E.C.; Machado, B.B.; Astolfi, G.; de Souza Belete, N.A.; Amorim, W.P.; Roel, A.R.; Pistori, H. Detection and classification of soybean pests using deep learning with UAV images. Comput. Electron. Agric. 2020, 179, 105836. [Google Scholar] [CrossRef]
Saeed, F.; Khan, M.A.; Sharif, M.; Mittal, M.; Goyal, L.M.; Roy, S. Deep neural network features fusion and selection based on PLS regression with an application for crops diseases classification. Appl. Soft Comput. 2021, 103, 107164. [Google Scholar] [CrossRef]
Majid, A.; Khan, M.A.; Alhaisoni, M.; Tariq, U.; Hussain, N.; Nam, Y.; Kadry, S. An Integrated Deep Learning Framework for Fruits Diseases Classification. Comput. Mater. Contin. 2022, 71, 1387. [Google Scholar] [CrossRef]
Ferentinos, K.P. Deep learning models for plant disease detection and diagnosis. Comput. Electron. Agric. 2018, 145, 311–318. [Google Scholar] [CrossRef]
Yang, K.; Zhong, W.; Li, F. Leaf segmentation and classification with a complicated background using deep learning. Agronomy 2020, 10, 1721. [Google Scholar] [CrossRef]
Liu, G.; Mao, S.; Kim, J.H. A mature-tomato detection algorithm using machine learning and color analysis. Sensors 2019, 19, 2023. [Google Scholar] [CrossRef] [PubMed]
Anderson, N.T.; Walsh, K.B.; Wulfsohn, D. Technologies for forecasting tree fruit load and harvest timing—From ground, sky and time. Agronomy 2021, 11, 1409. [Google Scholar] [CrossRef]
Dias, P.A.; Tabb, A.; Medeiros, H. Multispecies fruit flower detection using a refined semantic segmentation network. IEEE Robot. Autom. Lett. 2018, 3, 3003–3010. [Google Scholar] [CrossRef]
Dias, P.A.; Tabb, A.; Medeiros, H. Apple flower detection using deep convolutional networks. Comput. Ind. 2018, 99, 17–28. [Google Scholar] [CrossRef]
Bauer, A.; Bostrom, A.G.; Ball, J.; Applegate, C.; Cheng, T.; Laycock, S.; Rojas, S.M.; Kirwan, J.; Zhou, J. Combining computer vision and deep learning to enable ultra-scale aerial phenotyping and precision agriculture: A case study of lettuce production. Hortic. Res. 2019, 6, 70. [Google Scholar] [CrossRef] [PubMed]
Shaikh, T.A.; Rasool, T.; Lone, F.R. Towards leveraging the role of machine learning and artificial intelligence in precision agriculture and smart farming. Comput. Electron. Agric. 2022, 198, 107119. [Google Scholar] [CrossRef]
Zhou, R.; Damerow, L.; Sun, Y.; Blanke, M.M. Using colour features of cv.‘Gala’apple fruits in an orchard in image processing to predict yield. Precis. Agric. 2012, 13, 568–580. [Google Scholar] [CrossRef]
Maldonado, W., Jr.; Barbosa, J.C. Automatic green fruit counting in orange trees using digital images. Comput. Electron. Agric. 2016, 127, 572–581. [Google Scholar] [CrossRef]
Egi, Y.; Hajyzadeh, M.; Eyceyurt, E. Drone-computer communication based tomato generative organ counting model using YOLO V5 and deep-sort. Agriculture 2022, 12, 1290. [Google Scholar] [CrossRef]
Wang, Z.; Walsh, K.; Koirala, A. Mango fruit load estimation using a video based MangoYOLO—Kalman filter—Hungarian algorithm method. Sensors 2019, 19, 2742. [Google Scholar] [CrossRef] [PubMed]
Mavridou, E.; Vrochidou, E.; Papakostas, G.A.; Pachidis, T.; Kaburlasos, V.G. Machine vision systems in precision agriculture for crop farming. J. Imaging 2019, 5, 89. [Google Scholar] [CrossRef] [PubMed]
Hobart, M.; Pflanz, M.; Tsoulias, N.; Weltzien, C.; Kopetzky, M.; Schirrmann, M. Fruit Detection and Yield Mass Estimation from a UAV Based RGB Dense Cloud for an Apple Orchard. Drones 2025, 9, 60. [Google Scholar] [CrossRef]
Bonfante, A.; Monaco, E.; Manna, P.; De Mascellis, R.; Basile, A.; Buonanno, M.; Cantilena, G.; Esposito, A.; Tedeschi, A.; De Michele, C.; et al. LCIS DSS—An irrigation supporting system for water use efficiency improvement in precision agriculture: A maize case study. Agric. Syst. 2019, 176, 102646. [Google Scholar] [CrossRef]
Goel, N.; Sehgal, P. Fuzzy classification of pre-harvest tomatoes for ripeness estimation–An approach based on automatic rule learning using decision tree. Appl. Soft Comput. 2015, 36, 45–56. [Google Scholar] [CrossRef]
Hong, C.; Damerow, L.; Blanke, M.M.; Sun, Y. Early Yield Estimation of ‘Gala’ Apple Trees Using Image Processing Combined with Support Vector Machine. Trans. Chin. Soc. Agric. Mach. 2015, 46, 270–277. [Google Scholar]
Bai, X.; Li, Z.; Li, W.; Zhao, Y.; Li, M.; Chen, H.; Wei, S.; Jiang, Y.; Yang, G.; Zhu, X. Comparison of machine-learning and casa models for predicting apple fruit yields from time-series planet imageries. Remote Sens. 2021, 13, 3073. [Google Scholar] [CrossRef]
Obsie, E.Y.; Qu, H.; Drummond, F. Wild blueberry yield prediction using a combination of computer simulation and machine learning algorithms. Comput. Electron. Agric. 2020, 178, 105778. [Google Scholar] [CrossRef]
Stateras, D.; Kalivas, D. Assessment of olive tree canopy characteristics and yield forecast model using high resolution UAV imagery. Agriculture 2020, 10, 385. [Google Scholar] [CrossRef]
Okwuchi, I. Machine Learning Based Models for Fresh Produce Yield and Price Forecasting for Strawberry Fruit. Master’s Thesis, University of Waterloo, Waterloo, ON, Canada, 2020. [Google Scholar]
Sun, G.; Wang, X.; Yang, H.; Zhang, X. A canopy information measurement method for modern standardized apple orchards based on UAV multimodal information. Sensors 2020, 20, 2985. [Google Scholar] [CrossRef] [PubMed]
Vijayakumar, V.; Ampatzidis, Y.; Costa, L. Tree-level citrus yield prediction utilizing ground and aerial machine vision and machine learning. Smart Agric. Technol. 2023, 3, 100077. [Google Scholar] [CrossRef]
Tendolkar, A.; Choraria, A.; Pai, M.M.; Girisha, S.; Dsouza, G.; Adithya, K. Modified crop health monitoring and pesticide spraying system using NDVI and Semantic Segmentation: An AGROCOPTER based approach. In Proceedings of the 2021 IEEE International Conference on Autonomous Systems (ICAS); IEEE: New York, NY, USA, 2021; pp. 1–5. [Google Scholar]
Ferencz, C.; Bognár, P.; Lichtenberger, J.; Hamar, D.; Tarcsai, G.; Timár, G.; Molnár, G.; Pásztor, S.; Steinbach, P.; Székely, B.; et al. Crop yield estimation by satellite remote sensing. Int. J. Remote Sens. 2004, 25, 4113–4149. [Google Scholar] [CrossRef]
Singh, R.; Goyal, R.C.; Saha, S.; Chhikara, R. Use of satellite spectral data in crop yield estimation surveys. Int. J. Remote Sens. 1992, 13, 2583–2592. [Google Scholar] [CrossRef]
Ramesh, S.; Hebbar, R.; Niveditha, M.; Pooja, R.; Shashank, N.; Vinod, P. Plant disease detection using machine learning. In Proceedings of the 2018 International Conference on Design Innovations for 3Cs Compute Communicate Control (ICDI3C); IEEE: New York, NY, USA, 2018; pp. 41–45. [Google Scholar]
Abdalla, A.; Mohammed, M.M.; Adedeji, O.; Dotray, P.; Guo, W. Toward resource-efficient UAV systems: Deep learning model compression for onboard-ready weed detection in UAV imagery. Smart Agric. Technol. 2025, 12, 101086. [Google Scholar] [CrossRef]
Zhang, X.; Liang, K.; Zhang, Y. Plant pest and disease lightweight identification model by fusing tensor features and knowledge distillation. Front. Plant Sci. 2024, 15, 1443815. [Google Scholar] [CrossRef] [PubMed]
Yu, H.; Luo, Q.; Peng, W.; Zheng, L.; Ju, J.; Zhuo, H. PKD-YOLOv8: A Collaborative Pruning and Knowledge Distillation Framework for Lightweight Rapeseed Pest Detection. Sensors 2025, 25, 5004. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Taxonomy of machine learning for UAV-enabled precision agriculture, organized by sensing modality, data type, model family, and analytical task.

Figure 2. End-to-end framework of UAV-based sensing and machine learning in precision agriculture, illustrating the flow from heterogeneous sensing sources through preprocessing and segmentation to task-specific modeling and deployment.

Table 1. Practical decision matrix linking analytical tasks, sensing modalities, model families, and deployment constraints in UAV-enabled precision agriculture.

Analytical Task	Common Modality	Suitable Model Family	Deployment Consideration
Canopy/crop segmentation	RGB, multispectral, hyperspectral	Thresholding, U-Net, lightweight CNNs, hybrid CNN–Transformer models	Thresholding and lightweight CNNs are suitable for edge deployment; hyperspectral and transformer-based models require higher computation.
Pest and disease detection	RGB, multispectral, IoT-enabled imagery	Faster R-CNN, Mask R-CNN, YOLO variants, CNN classifiers	YOLO variants are suitable for real-time UAV or embedded deployment; region-based models provide higher localization precision but often require more computation.
Bloom and fruit counting	RGB, multispectral	Object detectors, instance segmentation, CNN-based counting models	High-resolution imagery improves counting accuracy but increases inference cost and bandwidth requirements.
Yield prediction	UAV, satellite, IoT, weather and environmental data	Random Forest, XGBoost, SVM, LSTM, CNN–LSTM, multimodal fusion models	Classical ML models remain practical for small datasets; temporal and multimodal models are useful when multi-season or multi-source data are available.
Stress monitoring/crop health assessment	Multispectral, hyperspectral, thermal, IoT	SVM, Random Forest, CNNs, transformer-based spectral–spatial models	Spectral models improve sensitivity to crop stress but require calibration, dimensionality reduction, and careful cross-field validation.

Table 2. Data-preparation techniques in UAV-based precision agriculture.

Technique	Approach	Application	Refs.
Normalization	Spectral normalization	UAV classification	[40]
	Pixel-level normalization	Crop weed detection	[41]
	Color normalization	Fruit maturity detection	[42,43]
Feature extraction	Vegetation indices (NDVI, ExG)	Vegetation mapping	[40,41]
	Geometric/texture features	Orchard mapping	[42,44]
	CNN hierarchical features	Fruit detection	[10,19,21]
Data cleaning	Filtering/masking	Noise reduction	[42,45,46]
Dimensionality reduction	principal component analysis (PCA)	Hyperspectral processing	[40,47,48]
Integration	Multimodal fusion	Smart agriculture systems	[17,49]

Table 3. Representative segmentation architectures and their reported applications in UAV-based precision agriculture.

Model/Architecture	Application	Refs.
U-Net	Crop/fruit segmentation	[62,69]
	Field segmentation	[70,71]
	Leaf disease segmentation	[72,73]
EDANet	Crop lodging identification	[60,74]
	Fruit segmentation	[75,76]
ERFNet	Field object segmentation	[61,77]
RPN	Crop/fruit segmentation	[63,78,79]
MLP	Field segmentation	[80,81]

Table 4. Representative pest detection models and applications in UAV-based agriculture.

Model Family	Use Case	Sensing Modality	Refs.
Faster/Mask R-CNN	Small pest detection; lesion segmentation	Satellite RGB + climate data [87]	[27,33,87]
YOLO variants	Real-time pest detection; embedded UAV deployment; autonomous spraying	UAV RGB (camera) + IoT sensors [27,28]	[27,28,30,86,88]
Inception ResNet-v2	Crop disease and pest recognition	RGB (Camera)	[89]
ResNet-based classifiers	Final-stage pest classification	Multispectral (main focus)	[31]
Fine-tuned VGG16 (HPO)	Tomato pest detection (8 classes)	RGB, Multispectral [7]	[7]

Table 5. Summary of disease detection models and representative applications.

Model	Application	Refs.
Inception ResNet-v2	Disease detection in coconut trees and other crops	[89]
U-Net and variants	Segmentation of diseased regions on leaves	[7,33]
	Crop segmentation supporting disease assessment	[91]
	High-precision disease segmentation	[92,93]
2D CNN	Disease segmentation in coconut leaves and other crops	[94]
VGG-19 and variants	Feature extraction and disease classification; optimized DL frameworks	[95,96,97,98]
Random Forest (HOG)	Detection of tomato and papaya leaf diseases	[99]

Table 6. Representative techniques and models for yield-related tasks in precision agriculture.

Approach	Model	Task	Refs.
Sequential modeling	Hybrid LSTM + 1D CNN	Yield prediction	[16]
Fuzzy logic	FRBCS + decision trees	Ripeness/early yield	[112]
Nonlinear features	SVM	Fruit count/yield	[113]
Tree-based models	Random Forest	Yield prediction	[114,115]
Mechanistic modeling	CASA (NPP based)	Yield prediction	[114]
Boosting	XGBoost	Yield prediction	[115]
Linear regression	MLR	Yield prediction	[116]
Deep learning	CNN-LSTM; ConvLSTM; ensembles	Yield prediction	[87,117]
Morphological features	BPNN	Canopy-based estimation	[118]
Randomized trees	ERT	Yield prediction	[87]

Table 7. Model family strengths for major yield prediction challenges.

Challenge	Effective Model Families	Refs.
Nonlinear relationships	RF, XGBoost—nonlinear dependency modeling	[87,114,115]
Temporal dependencies	LSTM, CNN-LSTM—sequential pattern modeling	[16,87,117]
Spatial variability	MLR, RF, PLSR—spatial heterogeneity capture	[114,116,119]
Small datasets	FRBCS, MLR, SVM—data-efficient learning	[112,113,116]
Complex datasets	Deep learning, XGBoost—large-scale modeling	[87,115,117]

Table 8. Representative approaches for bloom detection and fruit counting across crops.

Crop	Task	Model	Metric (Example)	Ref.
Lettuce	Bloom detection	CNN	Accuracy > 95%	[103]
	Bloom detection	SVM	High accuracy	[104]
	Fruit counting	CNN	Accuracy > 98%	[103]
Maize	Fruit counting	K-means	Qualitative segmentation	[111]
Mango	Fruit counting	MangoYOLO + tracking	RMSE = 5.0/2.1	[108]
Apple	Fruit counting	Color based	RMSE = 20–24 fruits/tree	[105]
	Fruit counting	SVM	Accuracy = 92%	[109]
Citrus	Fruit counting	Random Forest	Accuracy ≈ 94%	[109]
Tomato	Fruit counting	CNN	RMSE ≈ 5 fruits/tree	[109]

Table 9. Representative yield prediction models across crops.

Crop	Model	Metric	Value	Ref.
Wild blueberry	XGBoost	RMSE	343.0 kg/ha	[115]
	Random Forest	RMSE	430.9 kg/ha
	Boosted trees	RMSE	489.7 kg/ha
	Linear regression	RMSE	661.9 kg/ha
Rice	LSTM + 1D CNN	RMSE	0.605 Mg/ha	[16]
	1D CNN	RMSE	0.734 Mg/ha
	LSTM	RMSE	0.89 t/ha	[109]
Corn	SVM	RMSE	0.852 t/ha	[87]
	Random Forest	RMSE	0.767 t/ha
	ERT	RMSE	0.756 t/ha
	Deep learning	RMSE	0.787 t/ha
Grapes	ANN	RMSE	1.23 t/ha	[109]
Wheat	XGBoost	RMSE	2.45 t/ha	[109]

Table 10. Cross-task synthesis of major model families in UAV-enabled precision agriculture. The table summarizes how model families align with analytical tasks, sensing modalities, and representative architectures, and highlights their key strengths and limitations in real-world agricultural settings.

Model Family	Primary Tasks	Typical Architectures	Sensing Modality	Strengths/Limitations
CNN-based models	Segmentation, detection, classification	U-Net, ResNet, VGG	UAV RGB, multispectral	High accuracy and strong spatial feature learning; sensitive to domain shift and requires moderate labeled data
Lightweight CNNs	Real-time detection, edge deployment	YOLO, MobileNet, EDANet	UAV RGB	Efficient for UAV/edge deployment; reduced accuracy and limited capacity for complex scenes
Region-based detectors	Pest/disease detection, localization	Faster R-CNN, Mask R-CNN	UAV RGB, RGB + IoT	High localization accuracy; computationally expensive and slower inference
Transformers	Segmentation, classification	ViT, Swin Transformer	UAV RGB, hyperspectral	Capture global context; high data and computational requirements
CNN–Transformer hybrids	Hyperspectral analysis, complex scenes	CNN + attention fusion	UAV hyperspectral	Combine local and global features; increased complexity and limited deployment studies
Temporal models	Yield prediction, forecasting	LSTM, CNN–LSTM, ConvLSTM	Satellite, UAV time-series	Effective for temporal dynamics; sensitive to data quality and sequence length
Multimodal fusion models	Monitoring, prediction, decision support	Early/late fusion, attention fusion	UAV + satellite + IoT	Integrate heterogeneous data; challenges in alignment and scalability
Classical ML models	Classification, regression	SVM, Random Forest	Multispectral, IoT	Effective for small datasets and interpretable; limited scalability and performance

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bae, W.D.; Alkobaisi, S.; Safdar, M.F.; Chouhan, P. A Taxonomy of Machine Learning for UAV-Enabled Precision Agriculture: A Structured Survey. AgriEngineering 2026, 8, 249. https://doi.org/10.3390/agriengineering8060249

AMA Style

Bae WD, Alkobaisi S, Safdar MF, Chouhan P. A Taxonomy of Machine Learning for UAV-Enabled Precision Agriculture: A Structured Survey. AgriEngineering. 2026; 8(6):249. https://doi.org/10.3390/agriengineering8060249

Chicago/Turabian Style

Bae, Wan D., Shayma Alkobaisi, Muhammad Farhan Safdar, and Prachitee Chouhan. 2026. "A Taxonomy of Machine Learning for UAV-Enabled Precision Agriculture: A Structured Survey" AgriEngineering 8, no. 6: 249. https://doi.org/10.3390/agriengineering8060249

APA Style

Bae, W. D., Alkobaisi, S., Safdar, M. F., & Chouhan, P. (2026). A Taxonomy of Machine Learning for UAV-Enabled Precision Agriculture: A Structured Survey. AgriEngineering, 8(6), 249. https://doi.org/10.3390/agriengineering8060249

Article Menu

A Taxonomy of Machine Learning for UAV-Enabled Precision Agriculture: A Structured Survey

Abstract

1. Introduction

2. Background, and Taxonomy

2.1. Background and Research Gap

2.2. Taxonomy of UAV-Based Sensing and Machine Learning in Precision Agriculture

2.3. End-to-End Framework of UAV-Based Machine Learning in Precision Agriculture

3. Data Acquisition and Preprocessing

3.1. Data Acquisition

3.1.1. UAV-Based Sensing

3.1.2. Satellite-Based Sensing

3.1.3. Ground-Based Sensors and IoT

3.2. Data Preparation and Preprocessing

3.2.1. Normalization and Feature Extraction

3.2.2. Data Cleaning, Augmentation, and Dimensionality Reduction

3.2.3. Data Integration

4. Segmentation Methods and Architectures

4.1. Segmentation Methodological Paradigms

4.1.1. Threshold-Based Segmentation

4.1.2. Color-Based Segmentation

4.1.3. Texture- and Shape-Based Segmentation

4.1.4. Deep Learning-Based Semantic and Instance Segmentation

4.1.5. Transformer-Based Architecture and Segmentation

4.2. Representative Segmentation Architectures in UAV-Based Precision Agriculture

4.2.1. Encoder–Decoder Architectures

4.2.2. Transformer and CNN Architecture

4.2.3. Lightweight and Efficiency-Oriented Architectures

4.2.4. Region Proposal and Transfer Learning Frameworks

4.2.5. Hybrid and Alternative Architectures

5. Pest and Disease Detection Models

5.1. Pest Detection

5.1.1. Faster R-CNN and Mask R-CNN

5.1.2. YOLO-Based Detectors

5.1.3. VGG, ResNet, and Fine-Tuned CNNs

5.2. Disease Detection

5.2.1. Inception ResNet-v2 and Hybrid Architectures

5.2.2. U-Net and Encoder–Decoder Models

5.2.3. 2D CNNs and VGG-Based Feature Extractors

5.2.4. Classical Machine Learning Models

6. Bloom Detection, Fruit Counting, and Yield Prediction

6.1. Bloom Detection

6.2. Fruit Counting

6.3. Fruit Yield Prediction

7. System-Level Evaluation and Deployment Considerations

Deployment Assessment and Reporting Gaps

8. Discussion, Challenges and Future Directions

8.1. Synthesis Across the Analytics Pipeline

8.2. Structural Challenges to Robust Deployment

8.3. Emerging Research Directions

8.4. Implications for Practice and Data Infrastructure

9. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI