1. Introduction
Dynamic, heterogeneous network conditions are now a central bottleneck in scaling digital agriculture, especially as farms are pushed to produce more with fewer resources under tightening environmental constraints. The Food and Agriculture Organization (FAO) projects that global food production must increase substantially by 2050 while simultaneously reducing environmental impact, driving the demand for fine-grained, data-driven decision-making in the field [
1]. Precision agriculture operationalizes this vision by tailoring inputs such as fertilizer, irrigation, and crop protection to spatial and temporal variability within and across fields [
2]. However, realizing precision agriculture at scale requires continuously learning models from diverse, distributed sensor streams across many farms, which are often located in rural regions where connectivity is intermittent and infrastructure is sparse.
Over the past decade, machine learning and deep learning have shown considerable promise for yield prediction, soil and nutrient modeling, crop health monitoring, and management-zone delineation [
3,
4]. These approaches are increasingly reliant on heterogeneous data from on-board sensors, satellites, unmanned aerial vehicles (UAVs), and farm machinery, and they demand low-latency feedback to be actionable during narrow agronomic windows. Yet the most informative data in precision agriculture are often siloed on tractors, implements, and local farm servers, governed by strict privacy, regulatory, and commercial constraints. Centralizing these data in the cloud is frequently infeasible or undesirable due to limited rural backhaul, high bandwidth costs, and the sensitivity of agronomic and yield data, which can reveal proprietary management practices and economic performance [
5,
6].
Federated learning (FL) offers a compelling alternative by enabling collaborative model training across many data owners without sharing raw data [
7]. In FL, edge clients train local models and periodically upload updates to one or more aggregators, which compute a global model and redistribute it for further refinement. A great deal of the emerging literature has focused on improving FL optimization under data heterogeneity (e.g., FedProx [
8], SCAFFOLD [
9], FedNova [
10]) and on reducing communication with quantization and sparse participation (e.g., FedPAQ [
11]). However, most existing designs implicitly assume relatively stable, high-quality connectivity (e.g., smartphones on WiFi or 4G), modest scale, and a single terrestrial aggregation tier. These assumptions break down in large, geographically dispersed farms where tractors may move in and out of coverage, switch between 3G/4G/5G, or rely on costly backhaul links, and where long training horizons must coexist with strict operational constraints.
Communication-efficient distributed learning has attacked bandwidth bottlenecks using aggressive gradient compression and sparsification, including deep gradient compression [
12], stochastic quantization (QSGD) [
13], and Top-
k or threshold-based sparse communication [
14]. Post-training low-bit quantization has further enabled compact deployment of models on embedded hardware [
15]. While these techniques translate naturally to FL, they are typically evaluated in datacenter-like or smartphone settings, and are rarely co-designed with explicit models of dynamic rural network quality, contact windows, or multi-hop edge-to-space routing. As a result, they leave a substantial performance gap in regimes where link availability, latency, and bandwidth fluctuate on the timescale of individual training rounds.
In parallel, the wireless and networking communities have begun to study FL over bandwidth-limited and fading channels, including over-the-air and analog aggregation schemes that exploit the superposition property of the wireless medium [
16]. At the same time, standardization bodies and industry have accelerated the integration of non-terrestrial networks (NTNs) into 5G and beyond, with Low Earth Orbit (LEO) and Geostationary Earth Orbit (GEO) satellites envisioned as integral parts of future broadband and IoT infrastructure [
17]. LEO mega-constellations promise near-global coverage and reduced latency by deploying hundreds to thousands of satellites in low orbits [
18]. These developments are particularly attractive for agriculture, where satellite connectivity can bridge coverage gaps, backhaul data from remote fields, and enable continuous operation during key agronomic windows. Yet, to date, there is limited work that tightly couples FL with a multi-orbit satellite hierarchy tailored to the needs of precision agriculture and its extreme heterogeneity in both data and connectivity.
Fairness and robustness are emerging concerns in FL, especially when clients differ substantially in terms of data quantity, data quality, and network conditions [
6]. Naively optimizing a single global objective can systematically favor well-connected, data-rich clients and degrade performance for under-represented or connectivity-poor participants. Agnostic FL [
19] and fairness-aware methods such as q-FFL [
20] explicitly adjust the optimization objective to improve worst-case or distributionally robust performance, but they typically assume a single aggregation tier and do not account for systematic, topology-induced staleness arising from orbital dynamics or intermittent terrestrial coverage. Moreover, most existing fairness formulations operate at the level of individual devices rather than regions, whereas agricultural deployments must balance performance across entire agro-climatic zones that may differ dramatically in connectivity, farm size, and crop mix.
Finally, security and privacy remain non-negotiable in agricultural deployments. FL frameworks for sensitive domains commonly rely on modern symmetric cryptography, secure aggregation, and message authentication to prevent eavesdropping and tampering [
5]. Stream ciphers such as Salsa20 provide high-throughput, software-friendly encryption with strong security margins, making them attractive for resource-constrained edge devices and satellite links alike [
21]. However, the interplay between cryptographic protection, hierarchical aggregation, and dynamic connectivity—spanning tractors, cluster heads, LEO satellites, and GEO hubs—has not been systematically explored in the context of large-scale agricultural FL. More related works are extended in
Appendix A.
Against this backdrop, we propose a scalable, privacy-preserving federated learning architecture explicitly designed for precision agriculture operating under highly dynamic, heterogeneous network conditions. Our system tightly couples (i) ground-based intelligent farm nodes, where tractors act as edge clients that form clusters and perform network-quality-aware, similarity-gated communication; with (ii) a space-assisted hierarchical aggregation fabric, where LEO satellites perform regional aggregation and a GEO satellite performs staleness- and fairness-aware global aggregation over asynchronous regional models. Building on ideas from communication-efficient distributed learning [
12,
13,
14,
15], wireless FL [
16], fairness-aware FL [
6,
19,
20], and emerging non-terrestrial 5G architectures [
17,
18], the proposed design is tailored to the unique constraints of precision agriculture: highly mobile and resource-constrained tractors, intermittent 3G/4G/5G coverage, satellite contact windows, region-level fairness, and strong end-to-end security. As our experimental evaluation shows, this co-designed ground–space hierarchy can approach centralized performance while reducing uplink communication by an order of magnitude and substantially improving performance in connectivity-poor regions.
We present a deployable ground–space federated learning (FL) fabric for precision agriculture under highly dynamic rural connectivity. Our contributions are as follows:
Connectivity-aware FL control at tractors: We couple a normalized Network Quality Index (NQI) with similarity-based checkpointing to decide whether to transmit and how to encode updates (full vs. chunked vs. compressed), maximizing value-per-bit under intermittent 3G/4G/5G.
Orbit-aware hierarchical aggregation across LEO and GEO: We design a three-tier aggregation pipeline (drivers → LEO → GEO) with explicit staleness-aware weighting to handle orbit-induced asynchrony and buffering.
Region-level fairness for connectivity-disadvantaged areas: At GEO, we introduce a region-level weighting scheme that compensates for systematic connectivity losses, improving the worst-region performance and reducing inter-regional disparity.
An end-to-end secure protocol with an explicit threat model: We provide a concrete encryption + authentication design (Salsa20-family AEAD) with nonce/counter-based replay resistance suitable for multi-hop tractor–satellite links.
Attribution via ablations and system metrics: We report accuracy, communication bytes, staleness distributions, and fairness metrics, and we perform ablations to isolate the effect of NQI gating, checkpointing, compression, hierarchy, and fairness weighting.
3. Results
In this section, we empirically evaluate the proposed two-phase architecture—ground-based intelligent farm nodes (Phase 1) and the space-assisted hierarchical federated learning framework (Phase 2)—under realistic agricultural and networking conditions. Our goal is to answer the following questions:
Predictive performance: Does the proposed hierarchical federated pipeline achieve competitive or superior model accuracy compared to centralized and non-hierarchical FL baselines?
Communication and energy efficiency: To what extent do NQI-aware transmission, quantization, and sparsification reduce the uplink cost from tractors to LEO/GEO compared to vanilla FedAvg?
Robustness to dynamic networks: How robust is the system to intermittent connectivity, heterogeneous 3G/4G/5G coverage, and satellite staleness?
Fairness and regional equity: Does the staleness- and fairness-aware aggregation at GEO prevent overfitting to well-connected, dense regions and preserve performance in under-represented areas?
We first describe the experimental setup and baselines, followed by a detailed analysis of results along each dimension.
3.1. Experimental Setup
The data sources used for training and experimentation are detailed in the Data Availability statement below. In our evaluation, we concentrate on two representative precision-agriculture tasks aligned with Phase 1:
Nutrient prediction (Task A): a regression task predicting soil nutrient levels and the resulting optimal fertilizer application at the field-block level.
Crop health assessment (Task B): a classification or risk-scoring task predicting crop stress and recommending the pesticide spray schedule .
Each tractor is associated with a local dataset
comprising sensor readings (nutrient probes, crop-health indices, environmental variables) and historical intervention logs.
Table 1 summarizes the data partitions.
3.1.1. Dataset Sources, Preprocessing, and Feature Construction
To ensure reproducibility,
Table 2 lists the exact public sources used and the role each plays in feature/label construction. All datasets are accessed via their official portals/APIs, and all preprocessing steps (spatial joins, temporal alignment, filtering, and normalization) are implemented in a deterministic pipeline with fixed random seeds.
Preprocessing Pipeline
We (i) spatially align remote-sensing grids to field blocks using centroid-based assignment; (ii) aggregate imagery to weekly summaries (mean/percentiles) to reduce cloud/noise sensitivity; (iii) standardize continuous features with training-set statistics; and (iv) construct labels using task-specific rules described below.
Labels and Tasks
Task A predicts nutrient targets derived from soil survey attributes and agronomic guidelines (regression). Task B predicts crop-stress risk (classification) using vegetation-index thresholds and/or historical condition indicators; labels are computed solely from public signals to avoid leakage of private farm outcomes.
Task B Label Rule (Crop Stress)
We define a deterministic binary stress label using only remote-sensing vegetation-index dynamics. For each field-block and week t, we compute cloud-masked weekly means of NDVI and EVI and define a historical baseline over the preceding 8 weeks: , , and . We label a sample as stress () if, simultaneously, (i) , (ii) , and (iii) is below the crop-type-specific 20th percentile estimated from the training set for that crop and region (to avoid over-labeling naturally low-vigor crops). Otherwise, . Features for Task B are computed strictly from weeks to t (lagged indices, deltas, and rolling statistics), so the label uses no future information.
Federated Partitioning
We emulate a federation by assigning each client (tractor) a disjoint slice of the dataset using a two-level hierarchy: (a) tractors are grouped by farm; (b) within each farm, tractors receive non-iid splits by crop type and soil class. This induces realistic statistical heterogeneity across clients while preserving a reproducible partitioning policy.
3.1.2. Model Architectures and Training
For each task, we deploy lightweight models suitable for on-board execution:
Task A: a three-layer fully-connected network with 64 and 32 hidden units (approximately 95 k parameters).
Task B: a lightweight neural classifier (MLP) over multispectral indices and agronomic features (3 layers; hidden sizes 128 and 64; ReLU activations; dropout 0.1; approximately 120 k parameters).
Each global round consists of local epochs per tractor with learning rate and batch size . All methods are trained for global rounds using identical hyperparameters to ensure a fair comparison.
Rationale for Neural Models Under Federated Aggregation
Our system relies on weight-space operations (FedAvg, cosine-similarity checkpointing, Top-k sparsification, and low-bit quantization) that are naturally defined for differentiable models with vector parameters. For this reason, both Task A and Task B use compact neural networks whose parameter tensors admit consistent aggregation, compression, and similarity gating. This choice ensures that communication-efficiency mechanisms and the hierarchical aggregation stack operate on well-defined model updates.
3.1.3. Network and Orbit Simulation
To evaluate robustness under realistic connectivity conditions, we simulate spatially varying network profiles and satellite passes:
Terrestrial networks: Individual tractors are assigned 3G (31%), 4G (49%), or 5G (20%) profiles according to rural coverage statistics. Each profile yields a time-varying triplet of signal strength, bandwidth, and latency, from which the Network Quality Index (NQI) is computed.
LEO/GEO passes: We simulate LEO satellites in near-polar orbits and one GEO satellite. Cluster drivers maintain ephemeris tables; contact windows are discretized into 30 s slots.
Dropouts and outages: Terrestrial links fail independently with probability
, swept in
Section 3.5. LEO passes are dropped with probability
, modeling weather and hardware issues.
NQI Parameterization and Thresholds
To make NQI comparable across clients and radio types, we compute NQI from normalized link indicators using Equations (
3) and (
4). In the simulator, each tractor samples
from profile-specific ranges shown in
Table 3. Unless otherwise stated, we use NQI weights
,
,
(bandwidth-dominant rural regime), selected via the validation grid search in Equation (
5). A tractor attempts an uplink only if (i) the checkpointer accepts the update and (ii)
with
. For pipeline selection (
Figure 3), we use compressed uplink if
, scheduled/chunked if
, and full/low compression if
.
Limitations and Sensitivity to Real-World Effects
We emphasize that the current evaluation is conducted in a simulation testbed and does not yet include physical deployment on production agricultural machinery fleets or over operational satellite links. Real rural connectivity can exhibit non-ideal effects such as terrain-induced shadowing, foliage attenuation, weather-driven fading, tower congestion, antenna placement variability, and correlated regional outages. To mitigate sensitivity to any single parameter choice, we include robustness sweeps over key simulator parameters (e.g., terrestrial outage probability , satellite pass failure probability , and bandwidth/latency distributions) and report additional system KPIs such as update delivery ratio and latency proxies. These sweeps demonstrate that the principal trends (communication savings, robustness under degraded connectivity, and fairness improvements in low-NQI regions) are consistent across a broad range of operating conditions.
3.1.4. Evaluation Metrics
We report both model-quality and system-level metrics in following sub-sections:
Predictive accuracy: RMSE and MAE for Task A; AUC and F1-score for Task B.
Communication cost: total uplink bytes sent per tractor and per global round, separated into tractor→driver and driver→LEO transmissions.
Energy proxy: communication energy estimated as
where
is the transmit power and
is airtime; we report normalized values.
Staleness and fairness: average staleness (round lag) of incorporated updates, per-region accuracy, and the variance of performance across regions.
Update success rate (delivery ratio): fraction of scheduled model updates that successfully reach the intended aggregator tier (driver/LEO/GEO) within a deadline window.
Latency proxy: end-to-end delay from tractor update creation to incorporation at GEO (or to receipt of the next global model), decomposed into terrestrial uplink, buffering during satellite contact windows, and inter-orbit forwarding.
3.1.5. System Overhead Microbenchmarks
To confirm that the proposed pipeline is feasible on edge and gateway hardware, we include indicative per-update overheads for key operations (
Figure 8).
Table 4 reports millisecond-level costs measured/estimated for representative embedded-class CPUs (tractor/gateway) and server-class aggregation (LEO/GEO emulation). These overheads are small relative to overnight local training and satellite contact-window timescales, and they scale linearly with model size and update payload bytes.
3.2. Baselines
We compare our full architecture against both classical and state-of-the-art federated learning methods, in addition to several ablated variants of our own system:
Centralized (Ideal Cloud): All data is centrally collected and trained in the cloud with no network constraints or privacy restrictions. This serves as an upper bound on achievable accuracy.
Vanilla FedAvg [
25]: Standard FedAvg over tractors directly connected to a terrestrial server, without clustering, NQI-aware transmission, or satellite hierarchy.
FedProx [
26]: FedAvg augmented with a proximal term to mitigate client drift under heterogeneous local objectives. We adapt FedProx to our setting but use the same communication pattern as FedAvg.
SCAFFOLD [
9]: A variance-reduced FL method that uses control variates to correct client drift. This baseline targets better optimization under non-iid data but does not explicitly model network dynamics.
FedNova [
10]: A method that normalizes local updates to decouple performance from the number of local steps and client sampling, improving convergence with unbalanced participation.
FedPAQ [
11]: A communication-efficient FL algorithm that combines periodic aggregation with gradient quantization. We use this as a representative quantization-based baseline.
Over-the-Air FL (AirComp-FL) [
27]: An over-the-air analog aggregation scheme that exploits the wireless medium for in situ model aggregation. We use a digital approximation that preserves its single-shot aggregation behavior but does not incorporate our NQI or satellite hierarchy.
Hierarchical FL w/o NQI: our clustering and LEO/GEO hierarchy, but with periodic transmissions and no NQI-based scheduling or similarity-based gating.
Ours (Ground only): Phase 1 ground architecture with clustering, NQI-aware pipelines, and checkpointing, but without LEO/GEO (all drivers talk directly to a terrestrial server).
Ours (Full Hierarchical): the complete two-phase architecture described in
Section 2, including ground clustering, NQI-aware transmission, satellite hierarchy, and staleness/fairness-aware aggregation.
3.3. Overall Predictive Performance
Table 5 reports predictive performance on the held-out test sets for both tasks. All models are trained for the same number of global rounds with identical local hyperparameters.
3.4. Communication and Energy Efficiency
Table 6 and
Table 7 report mean per-tractor uplink volume and the corresponding energy proxy over
global rounds. The numbers for vanilla FedAvg are deliberately high; our architecture achieves roughly an order-of-magnitude reduction (10×) in uplink usage.
Thus, the full hierarchical system communicates only about one-tenth of the bytes of vanilla FedAvg while achieving substantially higher accuracy.
Breakdown by Transmission Pipeline
Table 8 and
Table 9 decompose the share of updates handled by each of the three LEO Transmission pipelines.
Most updates flow through the heavily compressed proximity-driven and scheduled pipelines, while the fallback pipeline remains reserved for extreme NQI conditions, preventing congestion during poor connectivity.
3.5. Robustness to Dynamic Network Conditions
We sweep the terrestrial outage probability
from 0 to 0.4 and track model performance. A complementary robustness view is provided in
Table 10, which reports the distribution of update staleness and its impact on accuracy for the full hierarchical system.
Sensitivity Sweep over Outage and Satellite Pass Failures
Table 11 reports Task B AUC and communication under increasing terrestrial outage probability
and two satellite pass–failure settings. As outages increase, the delivery ratio decreases and models become slightly more stale; however, staleness-aware weighting is best at preserving the performance while communication drops because the policy skips low-NQI attempts rather than repeatedly retransmitting.
Most of the model’s effective gradient signal comes from fresh updates (0–1 rounds stale), while highly stale updates contribute little due to the staleness-weighting mechanism.
3.6. Fairness and Regional Equity
We partition farms into three regions based on geography and climate and report per-region performance for Task B in
Table 12.
The fairness-weighted aggregation at GEO markedly reduces regional disparity: the standard deviation of AUC across regions drops from 0.058 under vanilla FedAvg to 0.017 under the full hierarchical system. The largest absolute gains appear in the low-NQI region (0.74 → 0.88), showing that the architecture explicitly benefits connectivity-poor areas rather than merely optimizing for already well-connected farms.
3.7. Ablation Studies
To isolate the contribution of each architectural component, we conduct ablations by selectively disabling mechanisms (
Table 13). Results for Task B are summarized in
Table 14 and
Table 15. Removing NQI or checkpointing nearly doubles communication relative to the full system, with only marginal changes in AUC, confirming that most filtered updates are indeed redundant. Disabling compression slightly increases AUC (0.92 → 0.93) but inflates communication by more than 7×, which is impractical in constrained environments. Finally, removing fairness weights leaves the mean AUC unchanged but enlarges regional gaps back toward the FedAvg regime, as seen by recomputing the standard deviation of per-region AUC.
Across all experiments, the numerical trends support the central claims of our architecture:
Near-centralized accuracy: For both tasks, Ours (Full Hierarchical) remains within roughly 1–2% of the centralized ideal, despite never aggregating raw data and operating under dynamic, often degraded connectivity. This shows that the additional constraints imposed by privacy and networking do not fundamentally limit achievable model quality.
Improvement over classical FL methods: All existing federated baselines—FedAvg, FedProx, SCAFFOLD, FedNova, FedPAQ, and AirComp-FL—benefit from more advanced optimization or compression strategies, but still fall short of our architecture. Compared to vanilla FedAvg, the full hierarchical system reduces RMSE on nutrient prediction from 8.63 to 7.71 (an improvement of about 11%) and increases the AUC for crop health from 0.82 to 0.92 (a gain of roughly 10 percentage points). Even the strongest algorithmic baselines (SCAFFOLD and FedProx) are consistently outperformed by our NQI- and hierarchy-aware design.
Benefit of space-assisted hierarchy. The jump from Ours (Ground only) to Ours (Full Hierarchical) is modest on aggregate (e.g., AUC 0.90 → 0.92), but this incremental improvement is achieved on top of already strong ground-level performance and becomes crucial for farms located in connectivity-poor regions. As shown in the fairness and robustness analyses, the satellite layer primarily helps under-served farms catch up to the performance of well-connected regions.
Communication-only baselines are not enough. Communication-efficient methods such as FedPAQ and AirComp-FL improve upon vanilla FedAvg in terms of bytes transmitted, but their accuracy remains closer to standard FedAvg than to our architecture. This highlights that communication reduction alone is insufficient; coupling compression with NQI-aware scheduling, checkpointing, and multi-orbit aggregation is necessary to obtain both high accuracy and low bandwidth usage.
Taken together, these findings demonstrate that the proposed system architecture is both practically viable and highly effective for large-scale, privacy-preserving, and connectivity-aware federated learning in precision agriculture.
4. Discussion
The proposed ground–space federated learning architecture is explicitly engineered for harsh, non-stationary environments, yet its behavior under edge-case conditions, at global scale, and across agronomic regimes raises several important considerations. In this section, we reflect on system performance under extreme network outages, the scalability of the heterogeneous FL framework, the reproducibility of our evaluation, failure recovery mechanisms, and performance deviations driven by seasonal and soil variability.
4.1. System Performance Under Extreme Network Outages
Our experiments demonstrate that the full hierarchical system maintains high predictive performance even as terrestrial outage probability increases and satellite passes are occasionally dropped, chiefly due to the combination of NQI-aware scheduling, similarity-based checkpointing, aggressive compression, and staleness-aware aggregation. Within the evaluated regime, the most effective gradient signal originates from updates that are at most one or two global rounds stale, while highly stale contributions are automatically down-weighted. This indicates that the architecture is robust to moderate asynchrony and short-lived connectivity disruptions. However, the same design choices expose fundamental trade-offs under more extreme outage patterns. When entire regions experience prolonged blackouts spanning many global rounds, the staleness-weighting mechanism will eventually suppress their influence to preserve stability of the global model. This behavior is desirable from a convergence standpoint but can gradually erode performance for severely isolated regions, effectively freezing their models near the last successful synchronization point. In practice, this suggests that the system requires a minimum level of “liveness”—infrequent but non-zero contact windows—to sustain both accuracy and fairness. Ultra-scarce connectivity regimes may require additional mechanisms, such as local replay buffers, opportunistic peer-to-peer relays between farms, or explicit “catch-up” phases in which temporarily over-weighted fresh updates from recovered regions accelerate their reintegration into the global model.
4.2. Scalability of the Heterogeneous Federated Learning Framework
The hierarchical design is intended to scale along multiple axes: the number of tractors, the geographic span of deployments, and the number of concurrent tasks. On the ground, clustering by task and geography, together with -based driver election, transforms a potentially flat federation of thousands of tractors into a more manageable set of cluster-level aggregators. This reduces both uplink fan-in and per-round aggregation complexity at the global level. In orbit, LEO satellites further partition the federation into regional footprints, each of which is responsible for aggregating a bounded number of cluster models during its pass, while the GEO satellite aggregates only a small set of regional models per global round. From a systems perspective, the dominant scaling bottlenecks become (i) control-plane complexity for maintaining ephemeris tables and contact schedules at cluster drivers; (ii) cryptographic overhead for Salsa20 + MAC over an increasing number of links; and (iii) the GEO satellite’s capacity to perform robust aggregation over many regions within its latency budget. The first two grow roughly linearly with the number of active clients and can be mitigated via batched key management, hierarchical keying, and lightweight broadcast schedules. The third is alleviated by the strong reduction in model count at the GEO layer: even if tens of thousands of tractors participate, the GEO typically aggregates only hundreds of regional models, which is well within the capability of modern satellite payloads or associated ground segments. Consequently, the architecture scales primarily in bandwidth and scheduling complexity, not intractable compute, and admits further sharding across multiple GEO or high-orbit coordinators if deployments expand to continental or global coverage.
4.3. Reproducibility and Experimental Transparency
Reproducibility is particularly challenging for systems that intertwine learning algorithms with complex, time-varying networks and orbital dynamics. To address this, our evaluation is structured around modular components: (i) ground-level FL logic (clustering, NQI computation, checkpointing, compression); (ii) a network simulator that injects realistic 3G/4G/5G traces, outage patterns, and NQI trajectories; and (iii) an orbit simulator that models LEO/GEO passes, contact windows, and pass failures. Each component is parameterized by explicit configuration files (e.g., satellite constellation parameters, outage probabilities, mobility patterns) and seeded pseudo-random generators to ensure that experimental runs are repeatable. From a methodological standpoint, the entire pipeline can be re-instantiated by other researchers using synthetic or anonymized real-world datasets, as long as they adhere to the same configuration interface and seeding scheme. Moreover, the hierarchical structure of the algorithms (edge round, cluster election, LEO aggregation, and GEO aggregation) facilitates independent verification: each stage can be unit-tested in isolation with controlled inputs (e.g., model staleness distributions, artificial NQI traces) before being integrated into the end-to-end system. Nonetheless, we acknowledge that true “drop-in” replication on production machinery fleets will require additional engineering details—hardware drivers, tractor scheduling constraints, and vendor-specific data formats—which are beyond the scope of this paper but should be documented in future open-source or industrial deployments.
4.4. Failure Recovery During Node and Driver Failures
The system incorporates multiple layers of resilience to node and driver failures, but their behavior under adversarial or correlated fault patterns merits careful discussion. At the ground layer, continuous health monitoring and -based re-election ensure that cluster drivers can be replaced when their capability scores degrade or when they disappear from the network. Because driver election is computed locally within each cluster using reported tuples, failover does not require centralized orchestration and can respond quickly to localized outages or hardware failures. Member tractors can also forward their encrypted payloads to neighboring nodes with higher NQI, effectively routing around temporary blind spots. At the orbital layer, buffering and retry policies at both LEO and GEO introduce temporal redundancy, allowing the system to tolerate isolated pass failures or short-lived inter-satellite outages without destabilizing global aggregation. However, correlated failures—such as a systemic failure of a subset of LEO satellites serving a particular region, or a persistent failure of the GEO downlink to specific clusters—may still introduce bias in the training process by systematically excluding certain regions. In such scenarios, the fairness-weighting mechanism must be complemented with explicit monitoring and alarms: large, persistent drops in regional contribution weights or participation rates should trigger reconfiguration (e.g., reassigning clusters to alternative LEOs, temporarily relaxing staleness penalties, or increasing local personalization weightings) to prevent long-term degradation for affected regions.
4.5. Performance Deviations Due to Seasonal and Soil Variability
Agricultural systems are intrinsically non-stationary: crop phenology, management practices, disease pressure, and soil moisture dynamics all vary across seasons and years. This non-stationarity manifests as covariate shift in the underlying data distributions and as label drift in the mapping from sensor readings to optimal interventions. Our experiments, which span representative but finite windows of agronomic conditions, show that the global model and regional adapters maintain high performance across the evaluation period. Nevertheless, seasonal transitions (e.g., planting vs. harvest, wet vs. dry seasons) and slow-changing soil properties (e.g., organic matter buildup, salinity) can induce performance deviations that may accumulate over longer horizons than those simulated. The hierarchical architecture provides several hooks for mitigating such drift. First, continual federated updates across seasons allow the global and regional models to adapt as new data arrives, rather than remaining fixed. Second, the regional adapters can be extended to incorporate explicit season or crop-stage features, enabling the GEO to learn seasonally aware aggregation weights. Third, the on-tractor personalization term, which blends global, regional, and local models, can be modulated based on detected shifts in local error statistics: farms experiencing emerging drift can temporarily place more weight on recent local updates while the global model catches up. In future work, explicit drift detection and adaptive re-weighting across seasons and soil regimes could be added to the aggregation logic, providing a principled mechanism to preserve robustness in the face of long-term environmental and management changes.
While the proposed architecture already exhibits strong robustness, scalability, and fault tolerance in our experiments, these discussion points highlight both its practical strengths and the boundaries of its applicability. They also suggest concrete avenues for future extensions, including peer-to-peer relaying for ultra-scarce connectivity regimes, multi-GEO coordination for continental-scale deployments, richer instrumentation for reproducibility, and explicit drift-aware adaptation to seasonal and pedological variability.
4.6. Deployment Feasibility in Real Agricultural Operations
While the proposed ground–space FL architecture is evaluated in simulation, its design choices were motivated by practical constraints in production agriculture. Below, we discuss feasibility factors that influence real deployments, including satellite backhaul costs, hardware integration, compatibility with existing machinery communication stacks, and multi-operator collaboration models.
4.6.1. Satellite Backhaul Cost Drivers and Mitigation
Satellite connectivity can be a significant operational expense, particularly when uplinking large payloads frequently or during peak demand windows. In our architecture, satellite airtime is treated as a scarce resource and is minimized through three system-level controls: (i) cluster-level aggregation (drivers uplink on behalf of many tractors), (ii) similarity-based checkpointing to suppress redundant updates, and (iii) adaptive compression (e.g., quantization and Top-k sparsification) under low NQI. These mechanisms reduce the number of uplinks and the bytes per uplink, directly lowering backhaul usage. Practically, this enables multiple deployment tiers: farms with reliable cellular coverage can operate the ground-only system, while satellite backhaul is reserved for connectivity-poor regions or critical agronomic windows.
4.6.2. Hardware and Terminal Requirements
A key feasibility advantage of our hierarchical design is that satellite terminals need not be installed on every tractor. Instead, satellite access can be concentrated at cluster drivers or farm gateways that already host telematics backhaul, edge compute, and external antennas. Tractors participate using existing cellular/WiFi radios for local connectivity within a farm or to a nearby driver node. This reduces hardware modification costs and simplifies certification and maintenance. In settings where direct tractor-to-satellite links are desirable, the same protocol remains applicable, but the terminal footprint and power budget become first-order constraints; our communication-efficient pipeline is explicitly designed to reduce airtime and transmission energy in such cases.
4.6.3. Compatibility with Existing Machinery Communication Systems
Modern farm machinery commonly uses standardized in-vehicle networks and implement buses (e.g., CAN/J1939 and ISOBUS) coupled with telematics control units (TCUs) or edge gateways. Our design does not require modifications to safety-critical control loops: model training and communication can be scheduled during idle windows (e.g., overnight) and executed within an isolated software container on the TCU/gateway, while sensor and operational data are accessed through existing data abstraction layers. The clustering and driver election logic can be hosted at the gateway or on a designated high-capability tractor, and the payload format is agnostic to the underlying bus as long as feature extraction is performed locally. These properties support incremental deployment alongside existing precision-agriculture workflows without disrupting machine operation.
4.6.4. Multi-Operator Collaboration and Governance Models
Cross-farm learning requires organizational agreements about governance, incentives, and data stewardship. Federated learning naturally supports multi-operator collaboration by keeping raw data local while sharing only protected model updates. In practice, several collaboration models are plausible: (i) cooperative or regional association deployments in which a trusted coordinator (or GEO-level orchestrator) aggregates regional models; (ii) OEM-led deployments where the equipment provider manages keying, orchestration, and updates; and (iii) service-provider models where an agricultural IoT platform operates the federated infrastructure as a managed service. Our region-level fairness weighting is particularly relevant in multi-operator settings because it mitigates dominance by well-connected or data-rich participants, which is often a governance concern. Future work will further explore incentive mechanisms and auditing interfaces that make participation transparent and economically viable.
Scaling with Client Count
Because aggregation fan-in is reduced from tractors→drivers→LEO→GEO, the GEO tier aggregates regional models per round rather than client models, making compute and bandwidth scale primarily with the number of regions/orbital footprints rather than the raw fleet size.
4.6.5. Practical Takeaway
Overall, the architecture admits staged adoption: (1) ground-only deployment using existing cellular/WiFi backhaul and farm gateways; (2) satellite-assisted backhaul for regions with persistent coverage gaps; and (3) broader multi-operator federations with explicit governance and fairness controls. These considerations clarify the real-world pathway from simulation to production deployment.
4.7. Proposed System Methods Convergence Discussion
where
is the expected loss on tractor/client
k. Our hierarchy induces three aggregation steps: intra-cluster (driver), regional (LEO), and global (GEO). Let
t denote the GEO global round. Each participating client performs
E local SGD steps starting from the last received model, producing an update
. Communication may be compressed (quantization/Top-
k) and delayed due to connectivity and satellite contact windows.
- (A1)
L-smoothness. Each has L-Lipschitz gradients.
- (A2)
Bounded stochastic gradient variance.
- (A3)
Bounded heterogeneity (client drift).
- (A4)
Bounded staleness. Any regional model used at GEO at round t was computed from an iterate no older than rounds, i.e., .
- (A5)
Compression regularity. The compressor
satisfies a standard unbiasedness/contraction property:
for some
(equivalently, a bounded distortion/contraction form).
Let
be the regional (LEO) model arriving with delay
. GEO forms a convex combination
Since
and
, GEO performs a convex aggregation of delayed regional iterates. This can be rewritten as an asynchronous weighted local-SGD update plus additive perturbations due to (i) delay, (ii) heterogeneity, and (iii) compression.
Under (A1)–(A5), with a diminishing step size
(or sufficiently small constant
), the expected average gradient norm admits a standard non-convex convergence guarantee of the form
Thus, the method converges to a stationary point as , and the additional terms quantify how data heterogeneity (), local steps (E), bounded delay (), and compression distortion affect the limiting neighborhood. For convex/strongly convex objectives, analogous sublinear/linear rates follow from the same framework.
The argument follows standard analyses for local SGD/FedAvg and asynchronous optimization. The key steps are as follows: (i) Express each regional model
as the result of averaging compressed local updates computed from an earlier iterate
; (ii) Use
L-smoothness to relate
to
plus quadratic terms; (iii) Bound the deviation between delayed gradients
and current gradients
using the bounded-delay assumption (A4); (iv) Bound client drift using (A3) and stochastic noise using (A2); (v) Bound compression noise via (A5) (and via error-feedback, if enabled). Combining these inequalities and telescoping over
yields (
18).
Practically, the staleness weight
further reduces the influence of large delays, tightening the delay-induced error term relative to unweighted asynchronous aggregation.
While Salsa20 + MAC provides confidentiality, integrity, authentication, and replay resistance for model updates in transit, it does not by itself prevent adversarial learning-layer behaviors such as model poisoning or malicious aggregation. Handling such threats requires complementary mechanisms (e.g., secure aggregation, Byzantine-robust aggregation, and anomaly detection), which we identify as important future work.
4.8. Future Works
While the proposed ground–space federated learning architecture already delivers strong gains in accuracy, communication efficiency, and fairness, its full potential will only be realized through sustained engineering, deployment, and co-design with real agricultural operations. We highlight several practical and immediately actionable directions.
4.8.1. Real-World Fleet Deployment and Operational Integration
A natural next step is a staged deployment on real machinery fleets operating across multiple regions and seasons. Practically, this entails:
Edge integration with existing controllers: Porting the edge pipeline (local training, NQI estimation, payload construction, Salsa20 + MAC) into embedded runtimes compatible with in-cab displays, telematics gateways, and implement controllers. This includes model-runtime integration, safe resource limits, and non-disruptive scheduling so that training never interferes with safety-critical control loops.
Operator-centric UX: Designing dashboards and workflows that explain model updates, predicted recommendations, and connectivity-driven behaviors to operators and agronomists (e.g., why an update was skipped or why a tractor was asked to park at a suggested high-NQI location).
Compliance and data governance: Working with legal, regulatory, and enterprise data-governance teams to formalize how model updates, metadata, and logs are stored, audited, and shared across OEMs, dealers, and growers, while maintaining the privacy guarantees inherent to FL.
Such deployments would not only validate the architecture under real operational constraints (weather, equipment downtime, human behavior), but also surface new requirements that are difficult to anticipate in simulations.
4.8.2. Adaptive Orchestration for Multi-Fleet and Multi-OEM Environments
In realistic ecosystems, multiple fleets from different OEMs, service providers, and cooperatives will share the same physical regions, often with distinct objectives, privacy policies, and contractual boundaries. Extending our system to such environments opens several practical avenues:
Multi-tenant orchestration: Developing a “federation-of-federations” layer in which different organizations can contribute models or submodels via the same LEO/GEO fabric while enforcing strict isolation of raw data and sensitive metadata.
Task-aware scheduling: Allowing LEO and GEO schedulers to jointly optimize contact windows and bandwidth across multiple tasks (nutrient modeling, disease risk, logistics optimization), using task priorities and agronomic urgency as first-class scheduling signals.
Economic incentives: Exploring pricing and incentive mechanisms that reward fleets for contributing high-value, diverse updates while accounting for their connectivity costs and operational constraints.
This line of work would make the architecture directly deployable in heterogeneous commercial landscapes rather than within a single vertically integrated stack.
4.8.3. On-Device Model Compression, Co-Processors, and Energy-Aware Training
Although the current system uses lightweight models and compressed communication, future deployments will face tighter energy and compute budgets, especially on smaller implements and battery-powered devices. Practical next steps include:
AutoML for edge constraints: Integrating neural architecture search and structured pruning to automatically generate task-specific models that respect per-tractor constraints on latency, memory footprint, and energy.
Hardware-aware co-design: Co-designing models with emerging accelerators (e.g., low-power NPUs on telematics modules), including quantization-aware training tailored to their supported formats.
Energy-aware FL policies: Extending NQI to an Energy- and Network Quality Index that explicitly balances the marginal value of an update against both uplink cost and remaining energy or fuel margins.
These improvements would allow the system to operate robustly on a much broader range of equipment, including retrofitted legacy machines and low-cost sensors.
4.8.4. Safety, Human-in-the-Loop Overrides, and Agronomic Validation
For many agronomic decisions (e.g., fertilizer rates, pesticide timing), the consequences of erroneous model recommendations are substantial. Turning the architecture into a production-grade decision-support system requires
Human-in-the-loop workflows: Embedding explicit confirmation, override, and feedback mechanisms so that operators and agronomists can accept, modify, or reject recommendations. Their feedback can be fed back into local training as labeled corrections.
Safety envelopes and guardrails: Implementing agronomy-informed safety envelopes (e.g., minimum/maximum rates, “never spray” zones, environmental constraints) which the model cannot violate, even under distribution shift or partial failures.
Field trials and side-by-side comparisons: Conducting multi-season, multi-location field trials that compare model-driven practices against agronomist-designed or rule-based baselines, measuring not only yield and input use, but also environmental and economic outcomes.
Such work would bridge the gap between algorithmic performance and agronomic trustworthiness, a prerequisite for widespread adoption.
4.8.5. Drift-Aware and Context-Aware Aggregation Across Seasons and Soil Types
Our architecture already provides hooks for regional adapters and personalization, but it does not yet fully exploit seasonality, crop rotations, and slow-changing soil properties. Practical extensions include
Context-tagged updates: Attaching lightweight context tags (season, crop stage, soil texture class, management system) to updates so that LEO and GEO aggregators can learn specialized submodels or adapters for recurring contexts.
Online drift detection: Deploying simple, explainable drift detectors both on tractors and at the GEO layer to trigger adaptive re-weighting, re-initialization, or additional local fine-tuning when performance degradation is detected.
Cross-context transfer: Systematically studying when and how knowledge from one crop, season, or soil regime can be safely transferred to another, and encoding these transfer policies into the aggregation logic to avoid negative transfer.
Long-horizon memory: Maintaining compact, privacy-preserving summaries of historical context–performance relationships at GEO, enabling the system to “remember” how models behaved under similar conditions in previous years.
These mechanisms would help ensure that the system remains reliable across years, crop cycles, and climate variability, not just within a single experimental window.
4.8.6. Robustness to Adversarial and Systemic Failures
Finally, the architecture must be hardened against both benign systemic failures and adversarial behavior:
Byzantine-robust aggregation in orbit: Combining the current staleness- and fairness-aware weighting with robust aggregation rules (e.g., trimmed means, coordinate-wise median) at LEO and GEO to mitigate the impact of corrupted or malicious updates.
End-to-end observability: Building monitoring and alerting infrastructure that tracks regional participation, staleness distributions, fairness metrics, and anomaly indicators, surfacing actionable diagnostics to operators of the ground and satellite segments.
Disaster and black-swan scenarios: Stress-testing the system under scenarios such as coordinated satellite outages, extended regional connectivity loss, or rapid policy changes (e.g., new environmental regulations) and defining emergency operating modes (e.g., local-only, rule-based fallbacks).
These enhancements would move the system from a robust research prototype toward a resilient, safety-critical platform capable of operating under real-world uncertainty and risk. In combination, these future directions emphasize practicality over purely algorithmic novelty: integrating with real fleets and operators, scaling across organizations and continents, co-designing with hardware and energy budgets, embedding safety and human judgment, and explicitly managing long-term agronomic and environmental variability. Pursuing them will turn the proposed architecture into a foundation for a globally distributed, continuously learning agricultural intelligence network that can be safely entrusted with high-stakes decisions in the field.
4.9. Small-Scale Pilot Validation Plan
While our results are obtained via simulation, the proposed architecture is designed to be implementable on current-generation edge gateways and telematics stacks. In future work, we plan to carry out a small-scale pilot deployment across 1–2 farms to validate key system indicators under real field conditions.
A minimal pilot would involve 5–10 edge nodes (tractors or tractor-mounted gateways) clustered into 1–2 task groups (e.g., nutrient prediction and crop health). Each node performs local training during idle windows and periodically attempts to transmit encrypted model updates to a local driver or farm gateway.
We will instrument the following deployment-level KPIs: (i) End-to-end model update delivery ratio (tractor→driver/LEO/GEO). (ii) Latency distributions (update creation → aggregation incorporation and global model receipt). (iii) Packet loss and retransmission counts. (iv) Airtime/energy proxies for transmissions, and (v) model quality impact under real link variability.
The pilot can initially validate the ground layer using cellular/WiFi backhaul and optionally incorporate non-terrestrial links via available satellite IoT backhaul services or gateway-based satellite terminals where feasible. The same NQI computation and checkpointing logic applies, enabling direct comparison between measured link behavior and the simulator parameterization. These pilot measurements will calibrate the simulator and enable subsequent large-scale evaluations that more accurately reflect terrain, weather, and carrier-side dynamics in rural environments.
5. Conclusions
This work introduced a two-phase federated learning architecture that unifies ground-based intelligent farm nodes with a space-assisted hierarchical aggregation layer to address the core challenges of precision agriculture at scale: intermittent connectivity, strict privacy requirements, and heterogeneous edge capabilities. On the ground, tractors participate as edge learners in task- and geography-aware clusters, where NQI-guided scheduling, similarity-based checkpointing, and multi-level compression (4-bit quantization and Top-k sparsification) jointly suppress redundant traffic and adapt to dynamic 3G/4G/5G conditions. In orbit, LEO satellites act as regional aggregators and relay nodes, while a GEO satellite performs robust, staleness- and fairness-weighted aggregation that explicitly compensates for delayed updates and structurally disadvantaged regions, all under an end-to-end Salsa20 + MAC security envelope.
Extensive simulations on realistic nutrient prediction and crop health tasks show that the proposed system attains near-centralized accuracy while dramatically lowering the cost of participation. Relative to vanilla FedAvg, the full hierarchical architecture improves nutrient RMSE from to , increases crop-health AUC from to , and reduces the per-tractor uplink volume by about (from 650 MB to 65 MB over 100 rounds), with a corresponding drop in the communication energy proxy from to . Ablation studies confirm that NQI-aware scheduling and checkpointing remove predominantly redundant updates, compression provides order-of-magnitude bandwidth savings with only marginal accuracy loss, and the satellite hierarchy notably strengthens performance in connectivity-poor regions. Fairness-oriented GEO aggregation further reduces inter-regional disparity and ensures that low-NQI regions benefit disproportionately from the global model.
Overall, our results indicate that a tightly integrated ground–LEO–GEO federated pipeline is not merely a communication optimization, but a viable systems blueprint for globally scalable, privacy-preserving agricultural intelligence. Future work will focus on deploying the architecture on real fleets of farm machinery, extending it to additional sensing and decision-support tasks, and co-designing orbit-aware resource allocation, security hardening, and model personalization techniques to support even larger and more diverse agricultural ecosystems.