1. Introduction
Meeting the increasing food demand of the world population in the coming decades is one of the central challenges facing agriculture. Recent synthesis studies indicate that total global food demand is expected to rise substantially by 2050, although the magnitude of this increase depends on assumptions regarding population growth, income, diets, and climate. A meta-analysis by van Dijk et al. [
1] projected that the total global food demand would increase by 35% to 56% between 2010 and 2050 across five representative socio-economic scenarios, with the range shifting to 30% to 62% when climate change is incorporated. In a later reassessment, Falcon et al. [
2] projected a 50% to 60% increase in total global food demand between 2019 and 2050, with a preferred estimate of about 57% under assumptions on population and income growth, which also link rising demand to a global population approaching about 9.7 to 9.8 billion by 2050.
Meeting this growing demand cannot rely on production expansion alone, because agricultural intensification must also be reconciled with environmental sustainability, resource limitations, and climate variability [
3,
4]. In this context, precision agriculture has emerged as a core Agriculture 4.0 approach based on observing, measuring, and responding to temporal and spatial variability rather than managing entire fields using average conditions [
5]. As emphasized by Reina [
5], improving crop production under future food demand will require the adoption of new technologies and artificial intelligence to support more sustainable, site-specific, and timely agronomic decisions. Such an approach is intended to improve production efficiency while reducing unnecessary use of labour, water, fertilizers, and agrochemicals.
Precision agriculture refers to the integrated use of hardware and software technologies that enable farmers to make informed and differentiated decisions for agricultural operations, including planting, fertilization, pest control, and harvesting [
6]. It relies on timely, spatially referenced measurements and targeted interventions, typically implemented under practical constraints such as labour availability, input costs, and environmental externalities. Robotics is increasingly positioned as an enabling technology for precision agriculture because many agricultural operations must be executed with spatial and temporal selectivity under constraints on labour availability, input costs, and environmental impact [
5]. At the same time, agricultural robotic systems operate in environments that are only partially structured, highly variable, and often degraded by occlusion, illumination changes, weak or absent global navigation satellite system (GNSS) signals, and terrain irregularity. Reviews published in recent years therefore show that agricultural robotics cannot be understood only through algorithmic performance but must also be interpreted through platform constraints, sensing conditions, and operational context [
7].
Field robotics is studied in this context because robots can couple repeated sensing to localised intervention at scales ranging from plants to rows, including weed control, variable-rate application, and harvesting operations. This is particularly important because the same autonomy module may behave very differently when deployed on unmanned aerial vehicles (UAVs), unmanned ground vehicles (UGVs), gantry and fixed-structure systems, or greenhouse robots. Recent field-robot surveys and systematic reviews report that agricultural robots are being investigated across monitoring, scouting, weed control, plant protection, and harvesting, reflecting both practical demand and the maturity of sensing and mechatronics needed for these tasks [
8,
9]. Moreover, studies on precision agriculture information gathering in open-field settings using unmanned ground vehicles (UGVs) and unmanned aerial vehicles (UAVs) show that autonomous multi-robot systems are promising for fine spatiotemporal data collection, but practical deployment must balance information gain with energy consumption and data security constraints [
6].
Field conditions differ from industrial environments in ways that change both autonomy design and the meaning of reported performance. In open fields and orchards, the task-relevant objects—plants and produce—are deformable, seasonally changing, and frequently self-occluding; illumination varies with time of day and weather; the ground surface introduces slip, sinkage, and vibration that affect both control and sensor stability. Phenotyping-system reviews note that crop growth stages and canopy closure can constrain robot traversal and sensing geometry and that lodging can force intervention strategies or manual operation even when nominal autonomy succeeds [
10]. Zapotezny-Anderson et al. [
11] identified occlusion handling (e.g., leaves or branches concealing targets) as a common and non-trivial challenge in agricultural robotics, largely because farming environments are unstructured and cluttered compared with typical industrial robot settings. Similarly, reviews such as [
12,
13,
14] identified perception, sensing, and control limitations as major constraints on agricultural robot performance, including fruit localisation errors caused by detection failures, path planning inaccuracies, and unresolved manipulator kinematics, as well as low deep learning inference speeds that limit real-time harvesting. They also reported perception failure modes characterised by the absence of detectable features (e.g., sensor failure or field departure) and steering failures arising from errors in steering command computation, steering actuation, or both while highlighting the need for improved materials and methods for universal modular sensors, along with dynamic recognition and rapid decision-making in complex natural environments.
This review focuses on four tasks, monitoring, weeding, spraying, and harvesting, because they jointly cover the core “sense–decide–act” pathways of precision agriculture. Monitoring encompasses scouting and phenotyping, where the output is a spatially referenced estimate of crop state or anomalies. Weeding and spraying represent selective intervention modalities, differing primarily in actuation (i.e., physical removal or suppression versus controlled deposition of plant protection products). Harvesting represents contact-rich, quality-sensitive manipulation in which perception and planning must culminate in a physical interaction that preserves produce integrity. This task set is consistent with task taxonomies in recent agricultural robotics reviews and enables cross-task synthesis of shared autonomy-stack requirements [
7,
15].
Platform-based classification is used because platform constraints shape feasible sensing, actuation, autonomy, and evaluation. UGVs can carry tools and payloads to support physical intervention but must navigate row corridors and manage soil–terrain interaction. UAVs provide rapid coverage and a canopy-level viewpoint that supports broad monitoring, but they face endurance limits and regulatory constraints that bound operational envelopes and influence mapping resolution. Gantry and fixed-structure systems trade mobility for repeatability and controlled sensor pose, which can simplify calibration and enable dense temporal sampling, while greenhouse robots exploit partial structure but often operate under GNSS denial and confined-space safety constraints. These platform differences influence both autonomy-stack design and the external validity of evaluations across crops and settings [
16,
17]. Unmanned aerial systems (UASs) are included explicitly because they are not peripheral to precision agriculture but central to field-scale robotic monitoring and increasingly relevant to selective intervention workflows. Beyond direct sensing roles, UASs materially influence the design of heterogeneous robotic systems by providing georeferenced observations, prescriptions, or priority maps that guide subsequent UGV operations. Their inclusion is therefore necessary not only to represent a major agricultural robotic platform but also to capture the monitor-then-act logic that increasingly underpins UAV–UGV coordination in agricultural environments.
Across tasks and platforms, recurring bottlenecks appear across the autonomy stack. Perception robustness is frequently limited by domain shift across season, cultivar, illumination, and acquisition protocol; robustness-focused crop imaging reviews argue that credible claims require evaluation designs that hold out sites and seasons rather than relying on random splits [
18,
19]. Localisation can degrade under canopy, dust, and repeated textures, motivating sensor fusion approaches that explicitly handle failure modes [
20]. Planning and control are complicated by occlusion and deformability: Robots act under incomplete state information and plant motion induced by wind or by tool interaction. Safety and human–robot interaction impose additional constraints, and safety standards for autonomous agricultural machinery emphasise formal risk assessment and verification/validation beyond task success alone [
21,
22].
Prior surveys provide valuable catalogues of systems and enabling methods, but they often remain organised by a single dominant lens, which can limit interpretability across platforms and deployment settings. Reviews structured primarily by sensing modality or by application lists may understate how platform constraints shape feasible autonomy and how evaluation setting conditions generalisability, while subdomain-focused reviews, such as those centred on UAV applications or harvesting perception, provide limited cross-task comparison of design tradeoffs and evaluation practices [
9,
15,
17,
23]. Building on this literature, the present review makes four specific contributions. First, it organises the field jointly through task, platform, autonomy-stack, and evaluation-setting lenses. Second, it maps each task onto a sense–decide–act pipeline to expose bottlenecks and failure modes across autonomy modules. Third, it synthesises evaluation metrics, protocol design, benchmark progression, and reporting guidance to improve comparability and reproducibility. Fourth, it strengthens the engineering perspective of this review by explicitly incorporating implement/tool specification and mechanical reliability into the interpretation of actuation outcomes in weeding, spraying, and harvesting. It is worth noting that, unless otherwise stated, the conceptual figures and synthesis tables in this review are author-generated summaries based on the reviewed literature, with direct source citations provided in the main text and literature-summary tables where appropriate.
This review is closely aligned with the United Nations Sustainable Development Goals (SDGs), particularly SDG 2 (Zero Hunger), which calls for improved agricultural productivity and sustainable food production systems, and SDG 12 (Responsible Consumption and Production), which emphasises efficient resource use and the reduction in food losses and chemical waste [
24]. By examining robotics in precision agriculture through both task-based and platform-based perspectives, this paper highlights how monitoring, weeding, spraying, and harvesting robots can contribute to these goals through more targeted interventions, reduced input overuse, and improved operational efficiency [
25]. In addition, this review is relevant to SDG 13 (Climate Action), because agricultural robotics has been identified as a pathway toward net-zero agriculture through improved nitrogen-use efficiency, reduced waste, electrified robotic vehicles, and lower environmental impact per unit of production [
26].
This paper first defines the review methodology, taxonomy, and coding scheme. It then synthesises platform archetypes and task-centric autonomy pipelines, followed by cross-cutting enabling technologies. A dedicated section analyses evaluation and benchmarking practices and proposes a reporting checklist. This paper concludes with a synthesis of design tradeoffs, a decision framework for selecting platform/toolchains, and a staged research agenda.
2. Review Methodology and Taxonomy
2.1. Search Strategy
The literature search was designed to capture the interdisciplinary nature of agricultural robotics, where relevant work is disseminated across robotics, agricultural engineering, sensing, and remote-sensing venues. To balance coverage and precision, we queried multidisciplinary indexing services (e.g., Web of Science and Scopus) alongside robotics-focused digital libraries (e.g., IEEE Xplore, MDPI, and the ACM Digital Library). In addition, scholar-indexed search engines were used as a complementary step to improve recall in cases where terminology varies across communities or where relevant studies are inconsistently indexed. The search covered the literature indexed up to 5 March 2026, and no explicit lower bound on publication year was imposed.
Search queries were formulated by combining three term groups. The first group targeted agricultural tasks: monitoring (including scouting and phenotyping), weeding (including weed control), spraying (including plant protection and variable-rate application), and harvesting (including picking). The second group targeted robotic platforms: unmanned ground vehicles (UGVs) and field robots, unmanned aerial vehicles (UAVs/drone), gantry/rail/cable-driven systems, and greenhouse robots. The third group targeted autonomy and enabling technologies: simultaneous localisation and mapping (SLAM), perception for detection/segmentation, motion planning, and compliant control/manipulation. Boolean operators and field restrictions (title/abstract/keywords) were applied where supported, and query variants were used to account for common synonyms and spelling differences. This review was scoped to crop-production precision-agriculture robotics in arable and horticultural systems; livestock robotics (e.g., milking, feeding, barn inspection, and animal monitoring) was excluded because its task structure, sensing environment, welfare and biosecurity requirements, and evaluation protocols differ substantially from those of crop-field robotics and therefore warrant a separate dedicated synthesis.
Study selection followed a transparent, staged screening process. After consolidating retrieved records and removing duplicates, titles and abstracts were screened for relevance to agricultural robotic systems, addressing at least one target task and including substantive technical content (e.g., platform description, sensing/perception, navigation, planning/control, actuation/tooling, or field evaluation). Full-text screening was then conducted to confirm eligibility and extract data for synthesis.
Figure 1 summarises the literature selection workflow using a PRISMA-style flow diagram reporting records identified, duplicates removed, records screened, full texts assessed, and exclusions with reasons, consistent with PRISMA 2020 guidance [
27].
2.2. Taxonomy Definition
To ensure consistent categorization across heterogeneous studies, this review adopts an explicit taxonomy along four axes: task, platform, autonomy stack, and evaluation setting. The task categories are defined as follows. Monitoring covers robotic sensing, mapping, and estimation of crop and field state to support agronomic decision-making. Weeding encompasses selective weed suppression or removal using mechanical, thermal, or chemical micro-dosing interventions. Spraying includes controlled deposition of plant protection products or other agrochemicals, with emphasis on variable-rate application, targeting accuracy, and drift mitigation. Harvesting refers to robotic picking and handling operations that require integrated perception-to-grasp pipelines and strategies to preserve produce quality.
Robotic platforms are categorized as unmanned ground vehicles (UGVs), unmanned aerial vehicles (UAVs), gantry/fixed-structure systems, greenhouse robots, and hybrid or multi-robot systems combining two or more platform types and/or coordinated teams. The autonomy stack is decomposed into the functional components most frequently reported in the literature: perception, localisation, planning, control, actuation/tooling, safety, and human–robot interaction (HRI). Evaluation context is coded independently of platform and task to avoid conflating capability with the test environment; settings are classified as lab/bench, greenhouse, open-field (single season), and open-field (multi-season or multi-site).
Table 1 summarises the scope and taxonomy of this review and defines the categories used to code the selected studies, including task type, platform class, autonomy-stack component, and evaluation setting. This taxonomy is aligned with conventions used in prior agricultural robotics and sensing-in-agriculture reviews that separate application classes, mobility/platform types, and sensing modalities while additionally enabling explicit mapping of each study to autonomy components for cross-task synthesis [
15,
20].
A structured coding scheme was applied to support consistent comparison across studies, given that claims of “autonomy” vary widely in operator involvement, sensing redundancy, and safety constraints. For each included paper, we extracted and coded the crop and operating environment (crop type; greenhouse versus open-field; terrain and canopy characteristics where reported), the sensor suite (e.g., RGB/multispectral/thermal imaging, LiDAR, GNSS/RTK, IMU, force/torque), the localisation method (e.g., GNSS/RTK, LiDAR- or vision-based SLAM, fiducial-based, map-assisted), the perception approach (e.g., classical vision, CNN-based detection/segmentation, transformer-based models, multimodal fusion), the implemented tool or end-effector (e.g., mechanical/thermal/chemical weeding implement, sprayer/nozzle configuration, gripper/cutter type and compliance strategy), the evidenced autonomy level (teleoperation, supervised autonomy, mission-level autonomy, or fully autonomous as supported by the described protocol), and the evaluation protocol and metrics (including dataset split design where applicable, field-trial procedures, success criteria, throughput, accuracy, crop damage, chemical reduction, energy use, and safety-related outcomes). Openness was coded separately to record the availability of data, code, models, and implementation details sufficient for replication.
To avoid success-only summarisation, each record included a dedicated limitations field capturing explicitly stated constraints and failure modes (e.g., illumination sensitivity, occlusion, seasonal domain shift, localisation loss, actuator clogging, bruising, safety stop triggers). This enables mechanistic synthesis by linking observed performance to operating conditions and system design choices. In line with robustness-focused crop imaging reviews that highlight the influence of acquisition protocol, metadata, and evaluation split design on reported performance, these elements were explicitly captured to support later assessment of generalisability [
19].
2.3. Autonomy Levels
In this review, autonomy is defined operationally in terms of operator involvement across the perception–planning–safety loop rather than by informal labels (e.g., fully autonomous). Specifically, autonomy level is characterised by which functions require human initiation, supervision, or intervention and under what conditions responsibility is transferred back to the operator (e.g., perception failure, localisation degradation, safety triggers). This framing is important because safety guidance for automated agricultural machinery typically emphasises not only design principles but also verification and validation activities spanning partially automated through autonomous operation; therefore, any autonomy claim should be coupled to the presence and testing of safety functions (e.g., hazard detection, safe-stop behaviour, fault handling) and to evidence demonstrating their performance in representative conditions. In parallel, evaluation rules used in field-robot competitions provide a useful operational lens on autonomy by enforcing constraints that mirror deployment realities, such as limits on external localisation aids, mission time bounds, and penalties for crop damage, thereby clarifying what “autonomous” entails in practice when navigation, task execution, and safety must be achieved under explicit rules and measurable outcomes [
9,
22]. The autonomy levels used in this review are summarised in
Table 2, ranging from fully teleoperated systems to unattended autonomy in restricted domains supported by verification and validation. Intermediate levels capture increasing perception robustness, planning adaptability, exception handling, and safety assurance, enabling consistent comparison across systems and deployment contexts.
3. Platforms in Agricultural Robotics
Platform choice shapes the entire autonomy stack because it constrains sensing geometry, computational and energy budgets, tool integration, and safety envelopes. Surveys of field robots commonly distinguish aerial platforms for broad monitoring from ground platforms for close-range sensing and physical intervention, while phenotyping-system reviews further highlight the prevalence of rail, gantry, and other fixed-structure alternatives where repeatability is prioritised [
8,
10].
These platform archetypes, summarised in
Figure 2, are defined in this review by mobility constraints, sensing/actuation viewpoint, and infrastructure dependence. In this sense, aerial platforms support broad overhead sensing, ground platforms support close-range sensing and physical intervention, and fixed-structure systems prioritise repeatable sensor pose, consistent with distinctions commonly used in prior agricultural robotics and phenotyping-system reviews [
15,
20]. Greenhouse and hybrid systems are included as distinct classes because they introduce GNSS-denied constrained operation and coordinated aerial–ground workflows, respectively, which carry different autonomy and evaluation implications [
28].
Accordingly, this section examines unmanned ground vehicles (UGVs), unmanned aerial vehicles (UAVs), gantry and fixed-structure systems, greenhouse robots, and hybrid configurations, with emphasis on how each platform class shapes sensing and actuation capabilities, localisation and planning requirements, and safety constraints. This section concludes by discussing hybrid systems and multi-robot coordination, where complementary aerial–ground workflows can improve coverage and operational efficiency but introduce additional challenges in task allocation, communication, and system-level reliability. In order to standardize comparisons across studies, the evaluation metrics and reporting items are summarized in
Table 3.
Table 3 serves as a reference for the quantitative and qualitative criteria used throughout this review. The presented quantitative values are literature-reported representative figures, not results from a single standardised benchmark. Therefore, cost and endurance metrics should be interpreted in the context of the original study conditions and assumptions. These results are supported by both synthesis reviews and representative experimental or field-evaluation studies for the major platform classes. In addition to synthesis reviews,
Table 3 is explicitly anchored in representative primary platform studies, including UGV mapping and phenotyping systems [
32,
33], UAV field surveying and variable-rate spraying systems [
34,
35], and a hybrid UAV–UGV coordination workflow validated in orchard conditions [
45]. For the fixed-structure and greenhouse classes, the table is also consistent with representative primary systems already discussed in this section, including automated field phenotyping and greenhouse navigation platforms.
To improve cross-study comparability, quantitative platform indicators should be expressed, where possible, using both absolute and normalized units. Operational cost should be reported as currency per hour and, for area-based operations, additionally as currency per hectare together with effective field capacity (ha h
−1) and the included cost components. Energy endurance should be reported as operating time per charge or fuel tank (min or h) under stated payloads and duty cycles and complemented by an area- or task-normalized intensity metric, such as kWh ha
−1 for electric platforms or L ha
−1 for fuel-based systems. Relative claims (e.g., lower cost or longer endurance) should also identify the comparator, task, and operating conditions to avoid overinterpretation of the literature values reported under different assumptions [
9,
22,
29].
Environmental conditions strongly shape feasible platform designs and limit the direct comparability of platform-level performance across studies. Open-field arable systems typically favour larger platforms with greater endurance and working width to cover large areas efficiently, although these features may increase soil compaction and reduce manoeuvrability. In contrast, horticultural and orchard environments impose tighter geometric constraints due to row spacing, canopy structure, and crop proximity, thereby increasing the importance of compact designs, suitable sensor viewpoints, and safer, more compliant tools. Greenhouse systems add further constraints, including GNSS-denied operation, narrow aisles, worker proximity, and exposure to humidity and condensation, which often favour small platforms and infrastructure-assisted navigation. These environment-dependent constraints should therefore be considered part of the operational design domain when comparing platform classes across studies.
3.1. Unmanned Ground Vehicles
UGVs commonly use wheeled or tracked locomotion to trade off stability on uneven soil against manoeuvrability within narrow row corridors. Their operation is strongly constrained by crop geometry, as row spacing and canopy closure evolve across growth stages and thereby limit the admissible vehicle envelope while shaping sensor viewpoint and coverage. Performance is further influenced by terrain–soil interaction, where wheel slip and sinkage degrade localisation accuracy and increase energy demand, and platform vibration reduces perception stability and calibration fidelity. Consistent with these constraints, navigation-oriented evaluation settings explicitly penalise crop damage and emphasise robust row-following and recovery behaviours under conditions such as missing plants and irregular row structure [
9].
3.2. Unmanned Aerial Vehicles
UAVs provide rapid field coverage and a top–down sensing geometry that is well suited for monitoring and mapping, most commonly using red-green-blue (RGB) and multispectral imaging. UAV-focused reviews report that practical deployments are often limited by flight endurance, payload capacity, and calibration procedures required to maintain consistent radiometric and geometric outputs across repeated missions [
17].
Operational feasibility is additionally shaped by aviation regulations. In the European Union (EU) open category, the European Union Aviation Safety Agency (EASA) specifies constraints such as operation within visual line of sight (VLOS) and a nominal 120 m height limit, while in the United States, the Federal Aviation Administration (FAA) outlines Part 107 requirements, including keeping the UAV within sight and other operational limitations [
36,
37]. Collectively, these constraints restrict routine beyond visual line of sight (BVLOS) operation and high altitude mapping, and they therefore influence whether UAV outputs are used for near real-time decisions or as offline decision support products.
3.3. Gantry and Fixed-Structure Systems
Gantry and rail systems typically sacrifice mobility to achieve high positional accuracy and measurement repeatability. The Field Scanalyzer rail platform is a representative example, as it uses fixed infrastructure to translate a multisensor payload over experimental plots with controlled pose, enabling high-throughput repeated measurements and dense temporal sampling across the crop lifecycle [
16]. Related fixed-structure and cable-driven configurations are also reported in greenhouse and urban farming contexts, where suspended architectures increase workspace coverage while maintaining end effector pose control and reducing interaction with the ground [
38]. The main limitation is scalability and cost, since the required infrastructure restricts deployment to instrumented plots or dedicated facilities, and outcomes may not readily generalise to broader farm environments without comparable structure [
16].
3.4. Greenhouse Robots
Greenhouse robots operate in GNSS-denied, spatially constrained, worker-proximate environments with partial structural regularities such as aisles, benches, and, in some cases, rails. This distinguishes them from open-field and orchard robots, which typically operate over larger and less confined spaces and more often rely on GNSS-supported navigation. Consequently, greenhouse navigation studies commonly adopt light detection and ranging (LiDAR)- and vision-based mapping and localisation pipelines while placing greater emphasis on confined-space manoeuvring, obstacle avoidance, and safe operation in close proximity to workers, sometimes exploiting environmental features such as rails for guidance [
42,
43]. Greenhouse deployment also introduces sensing artefacts, including condensation and specular reflections, and requires hardware resilience to elevated humidity and potential chemical exposure [
42].
3.5. Hybrid Systems and Multi-Robot Coordination
Hybrid agricultural systems frequently pair unmanned aerial vehicles (UAVs) for field-scale mapping with unmanned ground vehicles (UGVs) for localised intervention, enabling a monitor-then-act workflow in which aerial products guide targeted ground operations. Reviews of integrated workflows identify calibration and cross-sensor harmonisation as persistent bottlenecks and note that the perception of execution pipelines often relies on interoperability with farm information systems to translate maps into actionable prescriptions [
46]. This coordination layer deserves more explicit treatment, because agricultural UAV–UGV systems differ not only in sensing and actuation roles but also in how aerial and ground decisions are coupled.
Multi-robot coordination further imposes requirements on communication reliability and consistent shared localisation frames; in practice, agricultural deployments more often adopt division of labour and loosely coupled task allocation rather than tightly coupled formation control, reflecting limitations in connectivity, as well as the need for conservative safety assurance in shared workspaces [
46]. UAV–UGV coordination architectures can be organised as centralised, decentralised, or hybrid/hierarchical. In centralised coordination, a supervisory layer such as a farm management information system aggregates aerial and ground observations to compute shared task allocations or routes, improving global consistency at the cost of infrastructure dependence and possible communication latency [
47,
48]. In decentralised coordination, each platform retains greater local decision authority and exchanges only mission-critical state information, which can improve responsiveness and scalability but may reduce global optimality and shared situational awareness [
47,
48]. Hybrid schemes combine these principles using shared map-level or mission-level planning together with onboard local autonomy, and they are therefore often the most realistic pattern for agricultural deployments [
28,
49].
Agriculture-specific studies are also moving beyond simple map handoff toward proactive coordination, in which UAV observations are converted into georeferenced exclusion zones or route constraints before UGV execution; recent orchard experiments show that such centralised UAV–UGV planning can reduce detours and support safer navigation in dynamic environments [
50]. The same monitor–plan–act logic is also relevant to targeted harvesting and other selective interventions, where aerial scouting can prioritise blocks, rows, or candidate targets for subsequent ground manipulation, although fully integrated UAV–UGV harvesting workflows remain an emerging direction rather than a mature benchmarked application [
28].
4. Task-Centric Review
In this task-centric review, the literature is organised around four core tasks in agricultural robotics: monitoring and phenotyping, weeding, precision spraying, and harvesting. These task categories follow the taxonomy defined in
Section 2.2 and summarised in
Table 1, which is used consistently throughout the remainder of this manuscript. For each task, an end-to-end autonomy pipeline is considered, from sensing and calibration to perception, localisation, safety-constrained planning, actuation, and outcome logging. The four task pipelines are summarised in
Figure 3 to clarify recurring bottlenecks and to support consistent interpretation of reported evaluation evidence.
Figure 3 is a conceptual synthesis by the authors, derived from representative task-pipeline patterns reported in prior studies on robotic monitoring and phenotyping, robotic weeding, precision spraying, and robotic harvesting [
32,
51,
52,
53]. Throughout
Section 4, the discussion follows the four-task taxonomy adopted in this review, namely monitoring, weeding, spraying, and harvesting, as defined in
Table 1. For clarity, intervention tasks are additionally classified by actuation modality and targeting granularity, as summarized in
Table 4 and
Table 5. Specifically,
Table 4 groups weeding systems into inter-row and in-row mechanical cultivation, thermal or laser spot treatment, and chemical micro-dosing, while
Table 5 groups spraying systems into map-based variable-rate application, real-time canopy-sensed variable-rate application, and plant-level micro-spraying.
4.1. Monitoring and Phenotyping
In this review, monitoring refers to robotic sensing, mapping, and estimation of crop or field state to support agronomic decision-making, including scouting and phenotyping tasks as listed in
Table 1. Monitoring and phenotyping robots aim to estimate crop and field state with sufficient spatial registration to inform management decisions, with deployment contexts ranging from breeding trials that prioritize repeatability and dense temporal sampling to production fields where coverage and timeliness dominate. Unmanned aerial vehicles (UAVs) are widely used for monitoring through red-green-blue (RGB) and multispectral imagery, producing trait and anomaly maps that are often evaluated as spatial segmentation problems rather than through downstream actuation outcomes. Large annotated datasets such as Agriculture-Vision exemplify this paradigm, enabling assessment using overlap-based metrics that quantify the spatial localisation of field anomalies [
76].
In contrast, unmanned ground vehicles (UGVs) extend monitoring under canopy and support additional sensing modalities, including RGB-depth (RGB-D) cameras and light detection and ranging (LiDAR), to capture plant structure traits at close range. Field phenotyping reviews further indicate that UGV operation is strongly constrained by crop geometry and growth stage, which jointly affect traversability and sensing viewpoint [
10]. Multi-site deployments of small under-canopy phenotyping robots have also been reported in breeding settings, highlighting the potential of repeated measurements while exposing practical challenges in reliability and user workflow during sustained field operation [
76].
Sensors for monitoring are typically selected to maximise information content while meeting field constraints such as cost, power, and ease of deployment. Multispectral cameras support vegetation indices and stress indicators, light detection and ranging (LiDAR) supports three-dimensional reconstruction and canopy structure measurement, and red-green-blue depth (RGB-D) sensing can provide local three dimensional perception under canopy. In plant phenotyping, multispectral imagery is commonly reduced to vegetation indices that represent greenness, chlorophyll content, canopy structure, or stress. Widely used examples include NDVI for greenness and biomass, NDRE and GNDVI for chlorophyll-related traits, EVI for dense canopies, SAVI and MSAVI when the soil background is visible, and PRI for physiological stress assessment. Because these indices remain sensitive to illumination conditions and sensor calibration, reliable phenotyping depends on radiometric correction and consistent image acquisition settings.
For georeferencing, monitoring platforms often rely on the global navigation satellite system (GNSS) with real-time kinematic (RTK) corrections; however, in under-canopy or protected environments, navigation commonly shifts toward simultaneous localisation and mapping (SLAM) combined with sensor fusion. A key challenge is the interaction between domain shift and calibration: Changes in illumination, season, and acquisition settings alter radiometric and geometric properties, which can cause learned trait estimators to drift without clear failure signs. Robustness-focused reviews therefore recommend evaluation protocols that explicitly test spatial and temporal generalisation—for example, by holding out entire farms or sites and seasons—to support credible performance claims [
18].
Evaluation metrics commonly include trait estimation errors relative to manual ground truth, consistency across runs, and coverage/time proxies, but reported protocols vary in how they describe acquisition timing, environmental conditions, and calibration. A recurring omission is validation of decision impact: monitoring studies often demonstrate sensing accuracy but do not test whether the produced maps or trait estimates lead to improved intervention planning under farm constraints [
16].
Table 6 summarises representative agricultural robotic sensing and phenotyping approaches, together with their sensing modalities, technical characteristics, and evaluation contexts. To make the phenotyping and modelling content more explicit,
Table 6 also identifies representative outputs and traits reported by these systems, including vegetation indices such as NDVI and NDRE, physiological proxies such as leaf area index (LAI) and chlorophyll-related measures, and structural traits such as plant counts, canopy height, and canopy width/volume [
32,
77,
78].
4.2. Weeding
Weeding robots aim to suppress weeds while preventing crop injury, particularly in row crops and high-value systems where labour shortages and reduced herbicide use are important drivers. The literature commonly distinguishes three main approaches: mechanical weeding using intra-row or inter-row cultivation tools; thermal weeding with an emphasis on laser-based targeting; and chemical micro-dosing or spot spraying. These modalities impose different autonomy requirements. Mechanical tools operating close to crops require high-precision localisation and robust crop weed discrimination, whereas inter-row tools can often rely more on row detection. Laser systems demand accurate pointing together with stringent safety mechanisms, while micro-dosing requires tightly synchronised perception to nozzle timing [
84].
Sensors for weeding are frequently RGB cameras due to plant-scale resolution needs; RGB-D and LiDAR are added when depth or structure improves separation. Public datasets recorded from field robots over growth stages, including sugar beet datasets spanning plant emergence through later stages, provide benchmarks for crop/weed segmentation and mapping and illustrate how occlusion and growth stage complicate discrimination [
85]. Laser-weeding surveys discuss the dependence of effectiveness on crop spacing, weed density, and safety design, indicating that performance claims must be contextualised by operational conditions and actuation constraints [
85].
From a mechanical-design perspective, the engineering relevance of laser-weeding modules depends on reporting actuator-level parameters rather than only naming the modality. In particular, laser type and wavelength, output power, spot diameter at the target plane, standoff distance, exposure time, and delivered energy or energy density should be specified, since these quantities govern treatment selectivity and weed-destruction effectiveness. Representative recent systems range from a 5 W, 450 nm diode laser with an approximately 4 mm beam width at the stem and 2 s exposure to a 50 W [
51], 1940 nm thulium fiber laser with an approximately 1.5 mm beam diameter [
60], while review synthesis indicates spot diameters of roughly 0.6–5 mm and power levels of 4–50 W across different laser types [
51,
60,
86]. Reporting these parameters would improve the engineering interpretability and reproducibility of laser-weeding studies.
Localisation and navigation in weeding robots typically exploit row structure, but performance can degrade with missing plants, non-parallel rows, and shadow-induced perception loss. Reported evaluation metrics therefore vary widely, including weed removal or kill rate, proxies for crop damage, area-level productivity measures, and reductions in chemical or other inputs. A key system bottleneck is the tight coupling between perception uncertainty and irreversible actuation: Because weeding tools can injure crops, uncertainty often forces conservative decision thresholds, which reduces weed removal efficacy or operating speed. This creates a precision–throughput–crop-safety tradeoff that must be addressed at the system level rather than by perception accuracy alone [
84].
Table 4 summarises representative weeding approach categories and their reported evaluation characteristics, including typical sensing modalities, control strategies, implements, operational strengths and weaknesses, and commonly used performance metrics. To make intervention-relevant state variables more explicit,
Table 4 also identifies representative target-characterisation variables that condition robotic weeding performance, including weed species or functional group, growth stage, crop/weed position, protected zones, and density- or coverage-based quantities used to define treatment opportunity [
51,
73,
87,
88].
4.3. Spraying
Precision spraying aims to achieve effective deposition on target plant structures while reducing off-target drift and over-application, with important differences between orchard or vine systems and arable crops. In this review, spraying robots are classified into three categories, as summarized in
Table 5: (i) map-based variable-rate application, typically based on prescription maps derived from UAV or remote sensing data; (ii) real-time canopy-sensed variable-rate spraying, which relies on canopy measurements, such as LiDAR or ultrasonic sensing, coupled with PWM or flow control; and (iii) plant-level micro-spraying or spot spraying, which is typically enabled by vision-based nozzle control. Relevant standards define field measurement procedures for spray distribution in tree and bush crops and for drift assessment, emphasizing that deposition quality is inherently distributional and that meteorological conditions and measurement distances should be recorded to enable meaningful comparison across trials [
89,
90].
Variable-rate spraying systems commonly combine canopy sensing—for example, using light detection and ranging (LiDAR) or ultrasonic ranging—with flow and nozzle control. This couples perception of canopy presence or volume with localisation along rows, planning of dosage schedules that account for actuation delays, and execution through nozzle gating, droplet control, and, in some cases, shielding. Recent orchard studies report LiDAR-based variable-rate sprayers and corresponding evaluation approaches, typically emphasizing deposition distributions and drift-related proxies, although the completeness of reported protocols varies across the literature [
53,
71].
Sensing in precision and variable-rate spraying typically combines light detection and ranging (LiDAR) or ultrasonic ranging for canopy geometry with wheel encoders and an inertial measurement unit (IMU) for motion estimation, and it may add cameras when plant detection or disease targeting is required. Common failure modes include wind and turbulence that deflect droplets and canopy occlusion that leads to under- or over-estimation of target structure. A key mechanistic bottleneck is that deposition depends on coupled airflow, platform motion, and droplet dynamics; therefore, improved target detection alone does not guarantee improved deposition unless actuation and airflow control compensate for wind and motion effects [
91]. Cross-study comparability is also limited when evaluations omit critical parameters such as nozzle configuration, boom height, meteorological conditions, or standards-aligned measurement distances [
90]. As shown in
Table 5, representative spraying approaches differ in sensing and control structure, leading to distinct tradeoffs in dosing responsiveness, selectivity, drift behaviour, and evaluation metrics across agricultural settings. To improve engineering interpretability,
Table 5 makes explicit the minimum descriptors needed to compare spraying systems across studies, namely, the sensing stack, the control logic, the implement/nozzle specification, and the principal outcome metrics with units. In particular, implement reporting is expanded to include nozzle type/configuration, pressure, flow rate, boom or air-assist geometry, and timing-related quantities because deposition outcomes depend on actuation and spray physics, not only on target detection [
35,
53,
66,
70,
73].
4.4. Harvesting
Robotic harvesting requires detection, localisation, approach planning, and compliant detachment while preserving produce quality. Harvesting reviews emphasise that occlusion by foliage, variable illumination, and cluttered orchard backgrounds create persistent perception difficulty and propagate into grasp instability and collision risk [
91]. Perception in robotic harvesting commonly relies on deep-learning-based detection or instance segmentation, often fused with red-green-blue depth (RGB-D) sensing to estimate target pose. Typical failure modes include missed detections under occlusion, false positives in cluttered scenes, and pose uncertainty that produces unstable grasp planning and execution. End-effectors span rigid grippers, suction tools, and hybrid designs; soft robotic grippers are investigated to increase compliance and reduce bruising, while hybrid suction–finger mechanisms aim to balance contact area, stability, and robustness to target variability [
92,
93].
Table 7 summarises representative harvesting end-effector approach categories and their reported evaluation characteristics, including sensing modalities, control strategies, implement types, operational strengths and weaknesses, and commonly used performance metrics. To improve engineering interpretability,
Table 7 states more explicitly the sensing stack, the perception/planning or compliant-control strategy, the end-effector configuration, and the outcome metrics with units. This is important because harvesting performance is jointly determined by perception quality, detachment strategy, contact compliance, and damage-sensitive handling rather than by detection accuracy alone [
52,
94,
95,
96,
97].
Localisation strategies vary with the environment. In orchards, systems may use the global navigation satellite system (GNSS) with real-time kinematic (RTK) corrections for coarse positioning, but they still depend on local sensing for branch avoidance and accurate base placement. In greenhouses, navigation more often relies on simultaneous localisation and mapping (SLAM) or fiducial markers. Planning must explicitly account for collision constraints and uncertainty: When targets are only partially visible, the robot may need active perception through repositioning or adopt conservative approach strategies to maintain safety. Evaluation typically reports picking success rates, cycle times, and produce damage, yet cross-study comparability is limited by differences in orchard structure and occlusion prevalence, as well as inconsistent definitions of a “successful” harvest [
105]. A key mechanistic bottleneck is uncertainty propagation: because contact actions are largely irreversible, perception and pose errors can bruise produce or entangle branches, which encourages conservative planning and reduces throughput unless uncertainty-aware planning is paired with compliant control [
106].
6. Evaluation Practices and Benchmarking
Meaningful evaluation in agricultural robotics should be grounded in operational outcomes rather than perception accuracy alone. For monitoring, predictive accuracy should be complemented by measures of spatial registration, temporal consistency across repeated runs, and uncertainty estimates that directly affect downstream decisions. For weeding, performance should be reported in terms of weed suppression efficacy alongside crop injury rates and operational throughput, with crop spacing and weed density clearly specified. For spraying, evaluation should quantify deposition distributions and drift while reporting meteorological conditions, since these factors strongly influence outcomes. In this context, International Organization for Standardization (ISO) guidelines for spray distribution in tree and bush crops and for field measurement of spray drift provide measurement principles that can improve cross-study comparability when evaluation protocols are aligned with the standards [
89,
90]. Harvesting systems should report picking success, damage rates, and cycle time or throughput, with orchard or greenhouse conditions and occlusion prevalence clearly specified. Across tasks, reliability indicators such as uptime and failure-recovery frequency, as well as safety indicators such as near-miss events and human-proximity triggers, are also important for deployment relevance but remain inconsistently reported in the literature.
Table 9 summarises the core evaluation metrics used to assess agricultural field robots across tasks, together with their definitions, units, task relevance, and key measurement protocol notes.
Fair comparison across studies requires that results be explicitly stratified by evaluation setting (e.g., laboratory, greenhouse, or open field) and by the environmental conditions under which data are acquired. Robustness-focused imaging reviews indicate that acquisition protocol choices and evaluation design can dominate apparent performance differences, and therefore, cross-study comparisons should not be treated as equivalent unless protocols and domains are aligned or the domain shift is explicitly modelled [
18]. Accordingly, studies should report key operational design domain properties, including crop type, growth stage, illumination regime, canopy structure, and soil or terrain conditions, and generalized claims should be avoided when evidence is limited to single-site or single-season trials.
Bias in experimental evaluation can be reduced through cross-season, cross-field, and cross-cultivar testing, ideally using leave-one-site-season holdouts.
Figure 5 illustrates the proposed benchmark ladder for agricultural robotics, showing the progression from component-level and laboratory testing to greenhouse, single-season field, and multi-season field evaluation with increasing system complexity and operational realism.
Dataset and benchmark studies demonstrate the value of longitudinal, multi-condition data collection. For example, field-robot datasets recorded across growth stages with calibrated sensors enable evaluation under changing plant morphology and occlusion [
85], while multi-location monitoring datasets support shift-aware assessment across sites and seasons. In harvesting, benchmark datasets for densely fruited crops are designed to capture production-like variability and enable more comparable method evaluation [
107]. Nevertheless, many robotics studies still rely on random train–test splits from a single field campaign, which mainly tests in-distribution performance and provides limited evidence of transferability [
85]. A community reporting checklist should therefore require four elements:
Autonomy-stack reporting should specify what is automated and what remains operator-assisted.
Protocol reporting should document sensor calibration, data splits, and the evaluation environment.
Safety reporting should summarise hazard-analysis elements, implemented safety mechanisms, and fail-safe behaviours.
Reproducibility reporting should state dataset and code availability and provide key parameter settings.
Quantitative reporting should also standardize units and benchmark bases by expressing cost in currency per hour and, where relevant, per hectare; endurance in minutes or hours per charge/tank together with normalized energy/fuel intensity; and all relative comparisons with explicit statements of comparator, payload, working width, speed, and field conditions [
9,
22,
29]. These requirements align with safety standards for autonomous agricultural machinery, where verification and validation principles are emphasised and safety claims are expected to be supported by explicit evidence and testing procedures rather than informal statements [
22].
Table 10 presents a recommended reporting checklist for reproducible evaluation of agricultural field robots, outlining the key items to report, their rationale, suggested reporting practice, and common pitfalls to avoid.
7. Synthesis of Tradeoffs and Design Principles
The purpose of this section is to synthesise design tradeoffs and to distil design principles that generalise across tasks and platforms.
Figure 6 presents a qualitative radar chart comparing the conceptual tradeoffs among UGVs, UAVs, gantry systems, and greenhouse robots across key agricultural robotics performance dimensions. For tasks, agricultural robots operate under coupled constraints that give rise to recurring tradeoffs. A key tradeoff exists between throughput, precision, and crop damage risk. This is particularly evident in harvesting, where higher throughput often requires faster motion and reduced sensing time, thereby increasing the likelihood of collision and produce damage under occlusion.
In contrast, more conservative operations improve safety and precision, but at the cost of reduced throughput [
105]. A similar coupling is observed in weeding, where mechanical or thermal actuation near crops demands high precision. Conservative decision thresholds reduce the risk of crop injury but may leave weeds untreated, whereas more aggressive thresholds can improve weed removal at the expense of increased crop injury risk. In spraying, an additional tradeoff arises between operational speed, drift, and deposition quality: increasing coverage speed or airflow can elevate drift risk unless deposition is actively controlled, which motivates standards-based evaluation of drift and spray distribution [
89,
90]. Monitoring presents a related tradeoff between coverage and calibration robustness, as high coverage may require less controlled acquisition conditions, thereby increasing protocol-induced variability and reducing comparability across sites and time [
106].
A second tradeoff arises between autonomy, supervision burden, and reliability. Systems operating in restricted domains, such as fixed-structure platforms or highly structured greenhouses, can often achieve higher autonomy with fewer exceptions, whereas open-field systems face greater environmental variability and therefore more frequently require operator intervention. Safety standards further indicate that increased autonomy should be supported by corresponding validation and risk assessment; otherwise, deployment risk is increased [
22]. A third tradeoff concerns energy, payload, and operating time, which is particularly important for unmanned ground vehicles (UGVs) carrying implements and for unmanned aerial vehicles (UAVs) limited by flight endurance. In both cases, energy constraints directly influence sensor selection and the range of feasible exploration and planning strategies.
Several design principles can be formulated from the cross-task synthesis. First, irreversible actuation requires uncertainty-aware decision-making. In weeding, spraying, and harvesting, actuation errors can result in crop damage or incorrect chemical application; accordingly, perception modules should provide uncertainty estimates, and planning should incorporate these estimates rather than relying only on point predictions [
18]. Second, evaluation should be aligned with the operational design domain. Component-level metrics alone do not reliably predict field-level performance under domain shift, and evaluation should therefore include cross-site and cross-season protocols together with clear reporting of environmental conditions [
18]. Third, safety functions should be treated as core autonomy modules rather than secondary additions. Safety standards emphasise verification and validation, which implies that emergency stop functions, proximity sensing, and fail-safe planning should be explicitly specified, tested, and reported alongside task performance [
22]. Fourth, platform choice should be treated as a primary design variable rather than an implementation detail, because platform constraints determine the feasible sensing, actuation, and safety envelope and therefore shape the interpretation of task-centric comparisons. These principles are intended as testable claims, since each can be linked to measurable outcomes under controlled changes in uncertainty modelling, evaluation protocol, safety validation, or platform archetype.
In addition to perception, evaluation, and safety, robust agricultural robot design also depends on mechanical embodiment and field reliability. For intervention tasks, tool-level parameters such as gripper force/stroke and compliance, laser power and spot diameter, and nozzle pressure and flow characteristics influence precision, selectivity, and repeatability, while reliability indicators such as uptime, failure frequency, wear or clogging, calibration drift, and maintenance burden determine sustained field usefulness [
22,
53,
86,
109,
110].
Table 11 summarises design principles for robust and comparable agricultural robotic systems, linking each principle to its rationale, practical design implication, representative task relevance, and supporting references while explicitly including both autonomy-stack and mechanical-engineering criteria.
A decision framework for selecting platform and toolchain can be formulated by mapping crop type and farm scale to task requirements and the associated autonomy-stack constraints. For large-scale field monitoring, unmanned aerial vehicles (UAVs) equipped with multispectral sensing are generally well suited, with primary emphasis placed on calibration, geo-referencing, and evaluation under domain shift. For under-canopy monitoring and intervention tasks, such as weeding and targeted spraying, unmanned ground vehicles (UGVs) are often more appropriate, with design priorities centred on row navigation, perception robustness under shadow and occlusion, and operational safety. For high-repeatability phenotyping or controlled interventions in research plots and specialised facilities, gantry and fixed-structure systems offer advantages through controlled sensor pose and improved longitudinal consistency. In greenhouse harvesting and inspection, the main constraints arise from structured but confined environments, where global navigation satellite system (GNSS)-denied localisation, reliable navigation, and safety in close proximity to workers become central. Hybrid systems are most suitable when monitoring and intervention are separated across time or space, although their effectiveness depends on reliable cross-platform calibration and data integration [
17].
9. Conclusions
This review examined agricultural robotics through four principal tasks, monitoring, weeding, spraying, and harvesting, using complementary platform-based and autonomy-stack-based perspectives. The platform type, including unmanned ground vehicles (UGVs), unmanned aerial vehicles (UAVs), gantry and fixed-structure systems, greenhouse robots, and hybrid systems, was shown to constrain feasible sensing, actuation, and safety envelopes, therefore shaping how autonomy claims should be interpreted. In parallel, mapping each task from sensing to action onto the autonomy stack, including perception, localisation, planning, control, actuation, safety, and human-robot interaction (HRI), enables a more mechanistic comparison of failure modes, bottlenecks, and design choices across application domains.
Across these tasks, several recurring design tradeoffs were identified. In weeding and harvesting, throughput, precision, and crop damage risk are tightly coupled because actions are local and often irreversible. In spraying, drift and deposition quality emerge as additional coupled outcomes, which makes standards-aligned measurement essential for meaningful comparison. The autonomy level was also found to be closely linked to supervision burden and reliability, and higher autonomy without validated safety mechanisms increases deployment risk. Taken together, these patterns support design principles that emphasize uncertainty-aware actuation, shift-aware evaluation, safety as a core autonomy function, and platform-aware interpretation of results.
Compared with industrial robotics, agricultural field robotics must operate in only partially structured and seasonally changing environments, where deformable biological targets, occlusion, illumination variability, GNSS degradation, and terrain-induced slip, sinkage, and vibration alter both autonomy design and the meaning of reported performance. These conditions make deployment readiness depend not only on component accuracy but also on robustness to domain shift, tool-specific actuation quality, supervision burden, and safety behaviour in realistic field operations. For this reason, the present review places particular emphasis on operational design domains, implement/tool specification, and evaluation protocols rather than on perception or control accuracy alone.
A further synthesis emerging from the reviewed literature is that agricultural robotic systems should be interpreted according to deployment maturity rather than only nominal functionality. In practice, the evidence base differs substantially between lab prototypes, field prototypes, pre-commercial pilots, and commercial systems. The review therefore argues that future studies should report deployment maturity explicitly together with the scope of field evidence, including the number of sites and seasons, operating hours, failure categories, supervision requirements, and safety procedures. Without such evidence, strong component-level results should not be interpreted as proof of commercial readiness or broad field robustness. In this context, laboratory concepts are generally supported by controlled-environment or component-level validation, whereas field prototypes are typically evaluated through task-specific trials with limited site or seasonal coverage. Pre-commercial pilots provide more integrated demonstrations under realistic operational conditions. Commercial readiness, however, should be interpreted more cautiously and should ideally be supported by sustained multi-site or multi-season evidence, explicit reliability reporting, clear supervision requirements, and safety-validated deployment procedures rather than task success alone.
A central bottleneck to progress remains evaluation practice. Many studies continue to report strong component-level results under controlled conditions, while providing limited evidence of transfer across sites, seasons, or cultivars. Robustness-focused imaging studies indicate that credible deployment claims require evaluation protocols with spatial and temporal holdouts, together with clear reporting of acquisition settings and metadata. At the same time, safety standards for autonomous agricultural machinery make clear that validation should address hazards, verification procedures, and evidence of safe operation, rather than task success alone. For this reason, the proposed benchmark ladder and reporting checklist are presented as practical steps toward greater comparability and stronger deployment relevance.
Compared with earlier reviews that are often organised primarily by platform, sensing modality, or individual application area, the main contribution of this review is the joint synthesis of agricultural robotics through task, platform, autonomy-stack, and evaluation-setting lenses. This integrated framing makes it possible to compare systems not only by what they do but also by how they sense, act, fail, and are validated under different deployment conditions. The most important challenges that remain are shift-aware generalisation across sites and seasons, reliable localisation and planning under occlusion and deformability, standardised reporting of tool and safety performance, long-duration reliability with explicit fault handling, and clearer evidence of maturity and deployment readiness in real agricultural practice.
Overall, the reviewed advances indicate that agricultural robotics can contribute not only to higher productivity and more sustainable food production, consistent with SDG 2, but also to more efficient resource use and reduced waste, consistent with SDG 12. Moreover, the transition toward uncertainty-aware, resource-efficient, and electrified robotic systems strengthens the relevance of this field to SDG 13, particularly in the broader context of net-zero agriculture.
A realistic path toward robust field autonomy is likely to be progressive rather than abrupt. In the near term, deployment relevance can be improved through shift-aware evaluation, protocol transparency, and systematic safety reporting. In the mid-term, dependable long-duration autonomy will require maintainable calibration, fault detection, and uncertainty-aware planning under occlusion and deformability. In the long term, multi-season and multi-site operation will depend on scalable benchmarking infrastructure, formal safety cases, standards-aligned validation, and sustained field testing. Progress along this path would strengthen the role of robotics in precision agriculture by enabling more repeatable monitoring, safer and more selective intervention, and higher quality harvesting under realistic field variability.