Next Article in Journal
Machine Learning Regressors Calibrated on Computed Data for Road Traffic Noise Prediction
Previous Article in Journal
Diff-KNN: Residual Correction of Baseline Wind Predictions in Urban Settings
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Federated Learning for Soil Moisture Prediction: Benchmarking Lightweight CNNs and Robustness in Distributed Agricultural IoT Networks

1
School of Engineering and Applied Sciences, Nile University, Giza 12588, Egypt
2
Nanoelectronics Integrated Systems Center (NISC), Nile University, Giza 12588, Egypt
*
Author to whom correspondence should be addressed.
Mach. Learn. Knowl. Extr. 2025, 7(4), 132; https://doi.org/10.3390/make7040132
Submission received: 21 September 2025 / Revised: 17 October 2025 / Accepted: 21 October 2025 / Published: 31 October 2025

Abstract

Federated learning (FL) provides a privacy-preserving approach for training machine learning models across distributed datasets; however, its deployment in environmental monitoring remains underexplored. This paper uses the WHIN dataset, comprising 144 weather stations across Indiana, to establish a benchmark for FL in soil moisture prediction. The work presents three primary contributions: the design of lightweight CNNs optimized for edge deployment, a comprehensive robustness assessment of FL under non-IID and adversarial conditions, and the development of a large-scale, reproducible agricultural FL benchmark using the WHIN network. The paper designs and evaluates lightweight (∼0.8 k parameters) and heavy (∼9.4 k parameters) convolutional neural networks (CNNs) under both centralized and federated settings, supported by ablation studies on feature importance and model architecture. Results show that lightweight CNNs achieve near-heavy CNN performance (MAE = 7.8 cbar vs. 7.6 cbar) while reducing computation and communication overhead. Beyond accuracy, this work systematically benchmarks robustness under adversarial and non-IID conditions, providing new insights for deploying federated models in agricultural IoT.

Graphical Abstract

1. Introduction

Global climate modeling, flood forecasting, drought monitoring, and agricultural yield are all influenced mainly by soil moisture [1]. Accurate estimation of soil moisture supports climate initiatives, improves irrigation management, and reduces water losses [1]. However, although in situ sensors provide high precision, they remain expensive to deploy at scale and offer limited spatial coverage [2]. Therefore, recent advances in machine learning (ML) and remote sensing have enabled data-driven soil moisture forecasting by combining diverse data sources, such as satellite imagery, weather station measurements, and environmental records [2].
At both field and regional scales, integrating soil, crop, and weather data has improved prediction accuracy. For example, combining soil and weather variables improved local soil moisture prediction in the Red River Valley [3]. Similarly, the multimodal MIS-ME framework, which merged weather and imagery data, achieved a mean absolute percentage error (MAPE) of approximately 10.14%, outperforming unimodal baselines [4]. In Egypt, an IoT-based irrigation system was implemented that combined soil moisture sensors and weather forecasts to improve water management [5].
A wide range of ML and deep learning architectures have been applied to soil moisture prediction, including multimodal fusion models that integrate heterogeneous variables [2,3,4,6,7], long short-term memory (LSTM) networks that capture temporal dynamics [6], and convolutional neural networks (CNNs) that extract spatial features from environmental data [8]. Earlier ML-based IoT systems have been developed to monitor soil and environmental parameters, integrating sensor networks with predictive algorithms for crop and fertilizer recommendations [9]. More recently, transformer-based models have shown competitive results for soil and weather time-series forecasting [10]. Physics-informed deep learning methods [2] and knowledge-guided frameworks that couple Sentinel-1 SAR with physical backscatter models [11] also improved generalization under noisy data. Other studies employed active learning [12] to optimize sensor placement and multiscale deep learning [13] to combine satellite and in situ data, achieving regional correlations of R 0.90 .
Despite these advancements, most existing models rely on centralized data collection, which is often impractical for distributed agricultural networks due to privacy, ownership, and bandwidth constraints [6,8,9,10,14,15]. Centralized frameworks, such as MIS-ME [4] and multiscale deep learning models [13], achieved strong predictive performance (MAPE = 10.14%, RMSE = 0.034 m3/m3), yet they required data aggregation on a central server. Similarly, physics-informed and knowledge-guided models [2,11] have improved robustness under certain physical or sensing constraints; however, they were still developed and validated in centralized settings. None of these studies systematically evaluated performance under federated or heterogeneous (non-IID) conditions.
Federated learning (FL) offers a decentralized alternative that enables model training across multiple clients without exchanging raw data. Early studies in agriculture showed FL’s potential for communication-efficient optimization [16,17], IoT-based environmental monitoring [18], and crop yield prediction [19]. However, its application to soil moisture prediction is underexplored.  Previous research has primarily focused on algorithmic enhancements rather than evaluating the application of federated sensor networks for soil moisture prediction [7]. Figure 1 shows the conceptual framework of federated learning applied to soil moisture prediction. In this setup, multi-source environmental data such as satellite imagery, meteorological records, and in situ sensor measurements are processed locally at each weather station. Only model updates (not raw data) are exchanged using federated averaging, enabling privacy-preserving model training. While such a framework could support broader applications, such as irrigation scheduling, drought monitoring, and flood risk assessment, this study focuses explicitly on soil moisture prediction using distributed weather-station data.
This work represents a systematic benchmark of FL for soil moisture prediction in a large-scale IoT environment. It includes robustness evaluation under non-IID and adversarial conditions, a lightweight CNN architecture with approximately 0.8 k parameters optimized for resource-constrained edge devices [20], and a large-scale, reproducible benchmark built on the WHIN sensor network. Experimental results show an efficiency–robustness trade-off, where the lightweight CNN achieved performance comparable to heavier models (approximately 9.4 k parameters) while reducing computational and communication costs. Therefore, the results show that lightweight FL models can achieve high predictive accuracy while remaining practical for real-world IoT deployment.
The rest of this paper is organized as follows: Section 2 describes the dataset specifications and preprocessing steps. Section 3 introduces the baseline federated learning models. Section 4 discusses ablation studies and model selection. Section 5 compares the performance of centralized and federated models. Section 6 evaluates robustness under non-IID conditions. Section 7 presents the final discussion, and Section 8 concludes the paper, outlining future work.

2. Dataset Specifications and Preprocessing

The dataset used in this study is provided by the Wabash Heartland Innovation Network (WHIN), a digital agriculture Living Laboratory spanning 10 counties in north-central Indiana [21]. WHIN’s sensor network gathers structured, real-time data from over 160 weather stations across its deployment area, making it one of the most densely deployed agricultural weather networks in the US [21]. Geographically, the WHIN network covers approximately 39° to 42° N latitude and −87°to −86° W longitude. The WHIN dataset is freely available to researchers, students, and educators under open-access licensing terms [21]. This dataset comprises measurements from 144 weather stations for May 2020, and is available in both CSV and JSON formats. It can also be accessed programmatically through the WHIN API for automated data retrieval and analysis.
Each weather station records 24 environmental variables every 15 min [21], including air temperature, humidity, barometric pressure, rainfall, solar radiation, wind direction and speed, and soil temperature and moisture at four depths (1–4 inches). In this study, soil moisture at a depth of 4 inches (soil_moist_4) is selected as the target variable because it reflects conditions in the root zone that are significant and difficult to measure accurately [22]. Soil moisture (soil_moist_4) is measured in centibars (cbar), which represent soil water tension. Higher cbar values indicate drier soil, while lower values correspond to wetter conditions. The observed range of 0–200 cbar is consistent with the calibration of WHIN’s field sensors for typical agricultural soils.
To identify the most relevant input features for (soil_moist_4), all variables from each station are aggregated, non-numeric identifiers are removed, and the other soil moisture depth levels are excluded. Feature importance is then estimated locally at each station using a Random Forest surrogate model with permutation importance, a technique known for its robustness to nonlinear relationships and reduced risk of overfitting. The averaged importance scores across all stations are summarized in Table 1. A higher importance value indicates a greater influence of that feature on soil moisture prediction, and the importance scores for all features sum to 1.
To prepare the data for model training, each local dataset is divided chronologically into 70% training, 10% validation, and 20% test subsets, ensuring that future timestamps are never used during training. To standardize feature ranges across clients, Min–Max normalization is applied separately for each station, while missing values are imputed using the median. For reproducibility, the same data splits are used across all experiments.

3. Baseline Federated Learning Models

This study evaluates the predictive capability of FL for soil moisture estimation using three baseline models of increasing architectural complexity: Linear Regression (LR), Multilayer Perceptron (MLP), and a lightweight convolutional neural network (CNN). This enables a fair comparison between a traditional regression approach, a simple neural network, and a deep convolutional model under identical FL settings.
The LR model serves as a simple non-deep learning baseline. The MLP architecture comprises two hidden layers with 64 and 32 neurons, respectively, each using ReLU activations, which provides a good trade-off between performance and computational efficiency. The lightweight CNN is designed for low-resource environments and includes a single one-dimensional convolutional layer (kernel size = 1) with ReLU activations, followed by adaptive average pooling and a fully connected regression output. A dropout rate of 0.2 helps prevent overfitting. Depending on the number of hidden channels (8–64), the CNN has between 0.7 k and 3.2 k parameters, making it suitable for deployment on edge devices [20].
Federated training follows the FedAvg algorithm, where a central server aggregates model updates from multiple clients to build a shared global model. To efficiently configure FedAvg, a sensitivity analysis is performed by varying the number of local epochs (2–5) and the client sampling ratio (20–40%). The analysis shows that the mean MAE fluctuation remains below 2%, confirming stable convergence across configurations. Based on these findings, the setup uses 40 clients (approximately 28% of the 144 total) per round and three local epochs to balance predictive accuracy and communication efficiency. Each client trains locally using the Adam optimizer (learning rate = 0.001, batch size = 32), and client updates are weighted according to their sample counts. For reproducibility, deterministic random seeds (seed = 42 + client_ID) are applied to each client.
After establishing a stable training configuration, the models are statistically compared to determine whether their performance differences are meaningful or due to random variation. The paired Wilcoxon signed-rank test compares MAE values across the 144 weather stations. The test produces the signed-rank statistic (W) and standardized Z-value, which indicates how far the observed difference deviates from the null hypothesis in standard deviation units. The effect size ( r = Z / N ), where N = 144 , quantifies the magnitude of improvement beyond random noise. According to [23,24], r values of 0.1, 0.3, and 0.5 correspond to small, medium, and large effects, respectively; therefore, the values between 0.62 and 0.74 observed in Table 2 represent practically meaningful improvements.
Furthermore, to ensure that the results are not dependent on a single training run, each experiment is repeated five times with different random seeds. Random number generators are initialized deterministically to ensure reproducibility. Results are reported as mean ± standard deviation (SD), where smaller SD values indicate more consistent performance. Additionally, 95% confidence intervals ( C I 95 = 1.96 × σ / n , with n = 5 ) are computed; narrower intervals indicate greater reliability and stability of the model’s performance.
As shown in Table 3, both federated CNN variants outperform the MLP and LR baselines in soil moisture prediction. The lightweight CNN achieves an MAE of 8.07 ± 0.26 cbar, outperforming the MLP (9.68 ± 0.38 cbar) and LR (12.03 ± 0.41 cbar) models. The heavy CNN achieves the best accuracy (7.59 ± 0.28 cbar), but at a higher computational cost. The narrow confidence intervals across CNN variants confirm stable, reproducible performance, indicating that FedAvg aggregation maintains predictive accuracy and robustness under decentralized WHIN network conditions.

4. Ablation Studies and Model Selection

Building on the baseline FL experiments, which demonstrate that both lightweight and heavy CNN models outperform LR and MLP in soil moisture prediction, we are now investigating how input features and architectural design influence predictive performance. The baseline results indicate that even minor modifications in architecture or feature representation can significantly impact performance. This observation underscores the need for a systematic ablation analysis to identify optimal configurations for effective FL deployment.

4.1. Feature Ablation

The effect of feature selection on model performance is evaluated using Random Forest permutation importance: features are ranked, and models are trained with 1–17 features in order of importance. All MAE and RMSE values are reported in centibars (cbar) for consistency.
As shown in Table 4 and Figure 2, predictive accuracy improves as more features are included, particularly between 10 and 15 features. Beyond 15 features, performance gains plateau, suggesting diminishing returns and indicating that 14–15 features capture the most informative data for soil moisture prediction. RMSE values remain stable across feature counts, indicating that the model consistently captures the data’s underlying patterns.

4.2. Architecture Ablation

Once a nearly optimal feature subset has been identified, the impact of CNN architectural choices, including the number of convolutional layers (1 to 3) and the number of hidden channels (8 to 64), on predictive performance is assessed. All models are trained under identical FL settings to isolate the effect of the architecture.
The results in Table 5 and Figure 3 show that increasing hidden channels generally improves performance, while adding a third convolutional layer provides moderate gains over two-layer models. Therefore, deeper and wider architectures capture more complex relationships without overfitting, consistent with the improvements observed in the baseline heavy CNN. Together, depth and width adjustments enable the model to explore the distributed data fully.

4.3. Combined Ablation Study

Finally, a combined ablation is performed to identify the configuration that optimizes both feature selection and architecture collectively. Models are trained with 2–3 convolutional layers, 16–64 hidden channels, and 10–15 input features (the ranges previously identified as most effective). Efficient training under federated averaging is ensured through early stopping based on validation MAE.
Figure 4 shows that deeper networks and larger feature sets generally achieve lower MAE. Table 6 confirms that the heavy CNN (64 channels, three layers, 15 features) achieves the lowest MAE (7.86 cbar), while a lightweight CNN (16 channels, three layers, 14 features) reaches nearly the same performance (7.87 cbar) with fewer parameters.

5. Centralized vs. Federated Performance

Building on the ablation studies, which identified near-optimal feature sets and CNN architectures, we next compare the performance of federated and centralized training configurations. Both the lightweight and heavy CNN models are evaluated under identical preprocessing and training settings, allowing for a fair assessment of how distributed learning affects predictive accuracy.
Table 7 and Figure 5 summarize the results, showing that centralized training achieves slightly lower MAE and RMSE values, as it benefits from direct access to the full dataset during optimization. However, federated learning achieves a slight increase of ∼0.5 cbar in MAE, showing that the FedAvg aggregation maintains strong predictive performance even under distributed, heterogeneous data conditions.
Notably, the lightweight CNN achieves competitive accuracy compared to the heavy CNN while requiring nearly ten times fewer parameters (0.8 k vs. 9.4 k). This reduction in model size makes it highly suitable for edge deployment, where resource constraints and communication efficiency are critical. The heavy CNN provides marginal gains in accuracy but at a higher computational cost, showing the trade-off between model complexity and practical deployment considerations.
Given these findings, the lightweight CNN is selected for subsequent robustness experiments and stress tests under non-IID data distributions, ensuring that the model is both accurate and deployable in real-world WHIN network environments.

6. Robustness Evaluation Under Non-IID Conditions

Building on the baseline and ablation studies, the efficiency of the selected lightweight CNN is evaluated in heterogeneous and adverse deployment scenarios. While previous experiments identified an optimal configuration (14 input features, 16 hidden channels, three convolutional layers, ∼0.8 k parameters) that achieves near-optimal accuracy (MAE = 7.87 cbar), it is still critical to verify whether this architecture maintains performance under non-IID data distributions and robustness perturbations, which are common in real-world IoT deployments.
Therefore, we designed a robustness evaluation pipeline, shown in Figure 6, that simulates two types of non-IID scenarios (Dirichlet distribution skew and feature shift) at multiple severity levels. Three classes of robustness perturbations are introduced: label noise, input noise, and Byzantine failures, applied at varying intensities. Specifically, the perturbations are simulated as follows:
  • Label noise: 10–30% of local samples have randomly flipped soil moisture labels.
  • Input noise: Gaussian noise N ( 0 , σ 2 ) with σ 0.01, 0.05, and 0.1 is added to normalized features.
  • Byzantine attacks: A subset of clients (5–20%) replaced local model updates with random gradients scaled to match the 2 norm of valid updates.
To systematically model heterogeneous client distributions, Dirichlet sampling with concentration parameter α is applied. Low α (0.1) creates highly unbalanced local datasets, while higher α (1.0) approximates near-IID conditions. Across all configurations, the model maintains stable MAE and RMSE under moderate non-IID and up to 20% label corruption, with only minor degradation. Performance decreases at intermediate heterogeneity ( α = 0.5 ) are attributed to moderate client drift, where label imbalance and feature skew slow down local convergence before FedAvg aggregation stabilizes the global model.
Table 8, Table 9 and Table 10 report detailed results across varying non-IID and robustness settings, while Figure 7 summarizes the aggregated MAE and RMSE trends.
Under mild non-IID and low-noise conditions ( α = 1.0 , σ = 0.01 ), the lightweight CNN achieves an average MAE of 8.27 cbar. As the severity of non-IID skew or Byzantine participation increases, both MAE and RMSE rise predictably, but no catastrophic failures occur. This indicates that the lightweight architecture maintains convergence stability and robustness, even when facing challenging conditions.

7. Discussion

Table 11 provides a comparative overview of soil moisture prediction studies using machine learning. Most previous works relied on centralized setups and often used satellite or field-scale datasets, while none employed the WHIN sensor network. This study establishes the first quantitative federated baselines on WHIN, contributing to distributed soil moisture modeling.
Compared with previous centralized models [2,10,13], which achieved low RMSE values using complex architectures such as transformers or physics-informed deep learning, the proposed lightweight CNN offers a substantially smaller model footprint (0.8 k parameters) with comparable predictive capability. Despite its simplicity, it achieves near-centralized accuracy (MAE = 7.80 for FL vs. 7.25 centralized) and demonstrates efficiency–accuracy balance, as illustrated in Figure 8. This supports the feasibility of deploying compact models on edge and IoT devices without sacrificing performance.
While multimodal or knowledge-guided approaches [4,11] enhance interpretability and performance under specific conditions, they remain limited to centralized frameworks that require unified access to data. In contrast, the proposed federated setup ensures data privacy and scalability across distributed stations, aligning with the real-world constraints of agricultural sensing. Moreover, previous IoT-based systems [5] lacked quantitative evaluation metrics, whereas the present study provides concrete benchmarks based on real sensor data.
An additional strength of this work lies in its robustness evaluation. Previous studies ignored the effects of non-IID distributions, sensor noise, or Byzantine failure conditions that frequently occur in field-deployed networks. The proposed FL framework maintained stable performance under these perturbations, showing minimal degradation across robustness tests. This confirms that the model generalizes well to heterogeneous and imperfect sensing environments.
Overall, the results indicate that lightweight CNNs, when integrated into a federated learning framework, can deliver competitive performance in soil moisture prediction while ensuring data privacy and computational efficiency.

8. Conclusions and Future Work

This work presented the first FL benchmark for soil moisture prediction using the WHIN dataset (144 weather stations). Lightweight (0.8 k) and heavy (9.4 k) CNNs were evaluated under centralized and federated setups, including robustness to noise, non-IID data, and Byzantine failures. Lightweight CNNs nearly performed as well as large CNN models, while being over 10 times smaller, offering strong trade-offs in efficiency and communication. FL achieved near-centralized performance while preserving data privacy and remaining robust under adverse conditions, proving its suitability for distributed agricultural sensing. Notably, all baselines were evaluated under reproducible settings, providing a reliable benchmark for future research. Future work will explore communication-efficient FL (such as compression, partial participation), enhanced robustness (such as differential privacy, resilient aggregation), and deployment on live WHIN infrastructure. Extending to multi-season, multi-region data will further validate generalizability. This work demonstrates that FL with compact models is both effective and practical for scalable, privacy-aware agricultural monitoring.

Author Contributions

Conceptualization, S.Z. and L.A.S.; methodology, S.Z. and L.A.S.; software, S.Z.; validation, S.Z. and L.A.S.; formal analysis, S.Z.; investigation, S.Z.; resources, L.A.S.; data curation, S.Z.; writing—original draft preparation, S.Z.; writing—review and editing, S.Z. and L.A.S.; visualization, S.Z.; supervision, L.A.S.; project administration, L.A.S.; funding acquisition, L.A.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Arab–German Young Academy of Sciences and Humanities (AGYA), supported by the German Federal Ministry of Research, Technology and Space (BMFTR), grant number 01DL25001. The authors are solely responsible for the content and recommendations of this publication, which do not necessarily reflect the views of AGYA or its funding partners.

Data Availability Statement

The dataset analyzed in this study was provided by the Wabash Heartland Innovation Network (WHIN). A one-month sample dataset from 144 weather stations is freely available to educators, students, and researchers, subject to WHIN’s publicly stated licensing terms [21].

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Gruber, A.; Scanlon, T.; Van Der Schalie, R.; Wagner, W.; Dorigo, W. Evolution of the ESA CCI Soil Moisture Climate Data Records and Their Underlying Merging Methodology. Earth Syst. Sci. Data 2019, 11, 717–739. [Google Scholar] [CrossRef]
  2. Wang, Y.; Wang, W.; Ma, Z.; Zhao, M.; Li, W.; Hou, X.; Li, J.; Ye, F.; Ma, W. A deep learning approach based on physical constraints for predicting soil moisture in unsaturated zones. Water Resour. Res. 2023, 59, e2023WR035194. [Google Scholar] [CrossRef]
  3. Acharya, U.; Daigh, A.L.; Oduor, P.G. Machine learning for predicting field soil moisture using soil, crop, and nearby weather station data in the Red River Valley of the North. Soil Syst. 2021, 5, 57. [Google Scholar] [CrossRef]
  4. Rakib, M.; Mohammed, A.A.; Diggins, D.C.; Sharma, S.; Sadler, J.M.; Ochsner, T.; Bagavathi, A. MIS-ME: A Multi-Modal Framework for Soil Moisture Estimation. In Proceedings of the 2024 IEEE 11th International Conference on Data Science and Advanced Analytics (DSAA), San Diego, CA, USA, 6–10 October 2024; pp. 1–10. [Google Scholar]
  5. Abo-Zahhad, M.M. Iot-based automated management irrigation system using soil moisture data and weather forecasting adopting machine learning technique. Sohag Eng. J. 2023, 3, 122–140. [Google Scholar] [CrossRef]
  6. Sungmin, O.; Orth, R. Global soil moisture from in-situ measurements using machine learning—SoMo. ml. arXiv 2020, arXiv:2010.02374. [Google Scholar]
  7. Wang, Y.; Shi, L.; Hu, Y.; Hu, X.; Song, W.; Wang, L. A comprehensive study of deep learning for soil moisture prediction. Hydrol. Earth Syst. Sci. Discuss. 2023, 2023, 1–38. [Google Scholar] [CrossRef]
  8. Zhang, T.; Zhou, Q.; Yang, S.; Li, J. Soil moisture estimation using convolutional neural networks with multi-source remote sensing data. Remote Sens. 2021, 13, 3668. [Google Scholar] [CrossRef]
  9. Islam, M.R.; Oliullah, K.; Kabir, M.M.; Alom, M.; Mridha, M. Machine learning enabled IoT system for soil nutrients monitoring and crop recommendation. J. Agric. Food Res. 2023, 14, 100880. [Google Scholar] [CrossRef]
  10. Deforce, B.; Baesens, B.; Asensio, E.S. Time-Series Foundation Models for Forecasting Soil Moisture Levels in Smart Agriculture. arXiv 2024, arXiv:2405.18913. [Google Scholar]
  11. Yu, Y.; Filippi, P.; Bishop, T.F. Field-scale soil moisture estimated from Sentinel-1 SAR data using a knowledge-guided deep learning approach. arXiv 2025, arXiv:2505.00265. [Google Scholar]
  12. Xie, J.; Yao, B.; Jiang, Z. Physics-constrained Active Learning for Soil Moisture Estimation and Optimal Sensor Placement. arXiv 2024, arXiv:2403.07228. [Google Scholar] [CrossRef]
  13. Liu, J.; Rahmani, F.; Lawson, K.; Shen, C. A multiscale deep learning model for soil moisture integrating satellite and in situ data. Geophys. Res. Lett. 2022, 49, e2021GL096847. [Google Scholar] [CrossRef]
  14. Žalik, K.R.; Žalik, M. A review of federated learning in agriculture. Sensors 2023, 23, 9566. [Google Scholar] [CrossRef]
  15. Dong, Y.; Werling, B.; Cao, Z.; Li, G. Implementation of an in-field IoT system for precision irrigation management. Front. Water 2024, 6, 1353597. [Google Scholar] [CrossRef]
  16. Chen, S.; Long, G.; Shen, T.; Jiang, J. Prompt federated learning for weather forecasting: Toward foundation models on meteorological data. arXiv 2023, arXiv:2301.09152. [Google Scholar] [CrossRef]
  17. Jin, Z.; Xu, X.; Bilal, M.; Wu, S.; Lin, H. UReslham: Radar reflectivity inversion for smart agriculture with spatial federated learning over geostationary satellite observations. Comput. Intell. 2024, 40, e12684. [Google Scholar] [CrossRef]
  18. Khan, N.; Nisar, S.; Khan, M.A.; Rehman, Y.A.U.; Noor, F.; Barb, G. Optimizing Federated Learning with Aggregation Strategies: A Comprehensive Survey. IEEE Open J. Comput. Soc. 2025, 6, 1227–1247. [Google Scholar] [CrossRef]
  19. Shao, J.; Sun, S.; Wang, Y.; Zhang, H. Federated learning-based crop yield prediction using multi-source agricultural data. Comput. Electron. Agric. 2023, 207, 107743. [Google Scholar] [CrossRef]
  20. Chen, F.; Li, S.; Han, J.; Ren, F.; Yang, Z. Review of Lightweight Deep Convolutional Neural Networks. Arch. Comput. Methods Eng. 2024, 31, 1915–1937. [Google Scholar] [CrossRef]
  21. Wabash Heartland Innovation Network (WHIN). WHIN Weather Network Data (Current Conditions). One-Month Sample Dataset from over 140 Weather Stations; Available Free for Educators, Students, and Researchers. 2025. Available online: https://data.whin.org/data/current-conditions (accessed on 15 July 2025).
  22. Vereecken, H.; Huisman, J.; Pachepsky, Y.; Montzka, C.; van der Kruk, J.; Bogena, H.; Weihermüller, L.; Herbst, M.; Martinez, G.; Vanderborght, J. On the spatio-temporal dynamics of soil moisture at the field scale. J. Hydrol. 2014, 516, 76–96. [Google Scholar] [CrossRef]
  23. Cohen, J. Statistical Power Analysis for the Behavioral Sciences, 2nd ed.; Lawrence Erlbaum Associates: Hillsdale, NJ, USA, 1988. [Google Scholar]
  24. Fritz, C.O.; Morris, P.E.; Richler, J.J. Effect size estimates: Current use, calculations, and interpretation. J. Exp. Psychol. Gen. 2012, 141, 2–18. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Conceptual framework of federated learning for soil moisture prediction.
Figure 1. Conceptual framework of federated learning for soil moisture prediction.
Make 07 00132 g001
Figure 2. Feature ablation of federated CNN models.
Figure 2. Feature ablation of federated CNN models.
Make 07 00132 g002
Figure 3. Effect of convolutional depth and width on federated CNN performance.
Figure 3. Effect of convolutional depth and width on federated CNN performance.
Make 07 00132 g003
Figure 4. Three-dimensional surface: features × channels vs. test MAE: (a) conv. layers = 3; (b) conv. layers = 2.
Figure 4. Three-dimensional surface: features × channels vs. test MAE: (a) conv. layers = 3; (b) conv. layers = 2.
Make 07 00132 g004
Figure 5. Centralized vs. federated performance of lightweight and heavy CNNs.
Figure 5. Centralized vs. federated performance of lightweight and heavy CNNs.
Make 07 00132 g005
Figure 6. Proposed robustness evaluation framework.
Figure 6. Proposed robustness evaluation framework.
Make 07 00132 g006
Figure 7. Federated Learning robustness landscape: (left) MAE; (reght) RMSE.
Figure 7. Federated Learning robustness landscape: (left) MAE; (reght) RMSE.
Make 07 00132 g007
Figure 8. Efficiency vs. accuracy trade-off between lightweight and heavy CNNs.
Figure 8. Efficiency vs. accuracy trade-off between lightweight and heavy CNNs.
Make 07 00132 g008
Table 1. Feature importance across 144 WHIN stations.
Table 1. Feature importance across 144 WHIN stations.
FeatureImportance Score
soil_temp_40.5185
pressure_in_hg0.1651
soil_temp_30.1037
humidity0.0645
soil_temp_20.0330
soil_temp_10.0236
temp_low0.0184
wind_direction_degrees0.0132
wind_gust_speed_mph0.0130
temp0.0118
temp_high0.0110
wind_speed_mph0.0104
wind_gust_direction_degrees0.0060
solar_radiation0.0027
solar_radiation_high0.0021
rain0.0016
rain_inches_last_hour0.0015
Table 2. Wilcoxon signed-rank test results comparing model performance across 144 stations.
Table 2. Wilcoxon signed-rank test results comparing model performance across 144 stations.
ComparisonWilcoxon Statistic (W)Z-Valuep-ValueEffect Size (r)
LR vs. CNN1545.00−8.88< 1 × 10 6 0.74
LR vs. MLP1263.00−8.17< 1 × 10 6 0.68
CNN vs. MLP2093.00−7.44< 1 × 10 6 0.62
Table 3. Federated model performance over five independent runs.
Table 3. Federated model performance over five independent runs.
ModelMAE (cbar) ± SDRMSE (cbar) ± SD95% CI MAE (cbar)
Linear Regression (LR) 12.03 ± 0.41 22.45 ± 0.56 ± 0.36
Multilayer Perceptron (MLP) 9.68 ± 0.38 21.95 ± 0.49 ± 0.33
Lightweight CNN (FL) 8.07 ± 0.26 22.01 ± 0.31 ± 0.23
Heavy CNN (FL) 7.59 ± 0.28 21.82 ± 0.33 ± 0.25
Statistically significant improvement over MLP and LR ( p < 1 × 10 6 ); Wilcoxon signed-rank test.
Table 4. Federated CNN performance with a varying number of features (MAE and RMSE in cbar).
Table 4. Federated CNN performance with a varying number of features (MAE and RMSE in cbar).
Num. FeaturesMAERMSE
18.22421.978
28.61721.850
38.36721.897
48.01722.034
58.59121.835
68.21121.927
78.75021.840
88.13421.957
98.10721.942
108.88421.805
117.99822.036
127.98822.049
138.18321.930
147.93822.022
157.99321.975
168.57821.771
178.08621.906
Table 5. Federated CNN performance with varying hidden channels and convolutional layers (MAE and RMSE in cbar).
Table 5. Federated CNN performance with varying hidden channels and convolutional layers (MAE and RMSE in cbar).
Hidden ChannelsConv. LayersMAERMSE
818.25222.527
828.03021.922
838.37822.873
1618.22622.457
1628.56423.381
1637.92021.621
3218.40022.932
3228.68423.707
3237.86921.481
6418.65023.614
6428.44923.065
6437.87321.492
Table 6. Model performance with a varying number of features, hidden channels, and convolutional layers (MAE in cbar).
Table 6. Model performance with a varying number of features, hidden channels, and convolutional layers (MAE in cbar).
Num. FeaturesHidden ChannelsConv. LayersTest MAE
101628.635
101638.170
103227.956
103237.982
106428.294
106437.910
111628.484
111638.466
113228.094
113238.175
116428.456
116437.874
121628.011
121638.191
123228.960
123238.003
126428.365
126437.884
131628.114
131637.966
133228.270
133238.101
136427.922
136438.009
141628.192
141637.869
143228.301
143237.877
146428.026
146437.889
151627.960
151638.246
153228.020
153238.069
156428.088
156437.858
Table 7. Performance comparison of lightweight and heavy CNNs under centralized and federated settings (MAE and RMSE in cbar; mean ± standard deviation and 95% CI across five runs).
Table 7. Performance comparison of lightweight and heavy CNNs under centralized and federated settings (MAE and RMSE in cbar; mean ± standard deviation and 95% CI across five runs).
SettingParamsMAE (Mean ± Std)RMSE (Mean ± Std)95% CI (MAE)
Centralized, Light CNN∼0.8 k 7.25 ± 0.07 21.3 ± 0.2 ± 0.06
Centralized, Heavy CNN∼9.4 k 6.95 ± 0.06 20.8 ± 0.2 ± 0.05
Federated, Light CNN∼0.8 k 7.80 ± 0.09 22.0 ± 0.2 ± 0.08
Federated, Heavy CNN∼9.4 k 7.60 ± 0.10 21.8 ± 0.2 ± 0.09
Table 8. Federated learning experiment results for various non-IID and robustness configurations: Dirichlet severity 0.1 (MAE and RMSE in cbar).
Table 8. Federated learning experiment results for various non-IID and robustness configurations: Dirichlet severity 0.1 (MAE and RMSE in cbar).
Non_Iid_TYPESeverityRobust._TYPERobust._SEV.Avg MAEStd MAEAvg RMSEStd RMSEAvg Rounds
Dirichlet0.1byzantine0.05014.53810.95234.43134.61460.000
Dirichlet0.1byzantine0.10015.56611.04435.53633.98760.000
Dirichlet0.1byzantine0.20016.04910.93635.94733.69760.000
Dirichlet0.1input_noise0.01014.19910.25733.48533.76442.500
Dirichlet0.1input_noise0.05014.19710.25533.49633.77842.500
Dirichlet0.1input_noise0.10014.20210.25933.50933.79042.500
Dirichlet0.1label_noise0.05014.19810.26533.47733.76343.000
Dirichlet0.1label_noise0.10014.19310.28233.48133.77343.000
Dirichlet0.1label_noise0.20014.19710.30533.48233.78243.000
Table 9. Federated learning experiment results for various non-IID and robustness configurations: Dirichlet severity 0.5 and 1.0 (MAE and RMSE in cbar).
Table 9. Federated learning experiment results for various non-IID and robustness configurations: Dirichlet severity 0.5 and 1.0 (MAE and RMSE in cbar).
Non_Iid_TYPESeverityRobust._TYPERobust._SEV.Avg MAEStd MAEAvg RMSEStd RMSEAvg Rounds
Dirichlet0.5byzantine0.05016.8082.75041.9274.98125.500
Dirichlet0.5byzantine0.10016.8242.76341.9344.98725.500
Dirichlet0.5byzantine0.20016.8992.70741.9724.95721.500
Dirichlet0.5input_noise0.01014.1342.32639.4144.46157.000
Dirichlet0.5input_noise0.05014.1372.32539.4134.46557.000
Dirichlet0.5input_noise0.10014.1532.34539.3954.45657.000
Dirichlet0.5label_noise0.05014.1372.32839.4154.45757.000
Dirichlet0.5label_noise0.10014.1402.33539.4134.44657.000
Dirichlet0.5label_noise0.20014.1442.34239.4174.43357.000
Dirichlet1.0byzantine0.05010.3611.52819.9599.02640.000
Dirichlet1.0byzantine0.10010.3851.49519.9798.99835.000
Dirichlet1.0byzantine0.20010.6131.79120.1069.15619.000
Dirichlet1.0input_noise0.0108.2721.84818.0919.70060.000
Dirichlet1.0input_noise0.0508.2701.84518.0889.69660.000
Dirichlet1.0input_noise0.1008.2641.84218.0779.69560.000
Dirichlet1.0label_noise0.0508.2731.84818.0929.70160.000
Dirichlet1.0label_noise0.1008.2751.84718.0969.69860.000
Dirichlet1.0label_noise0.2008.2741.85018.0939.70260.000
Table 10. Federated learning experiment results for various non-IID and robustness configurations: feature-shift experiments (MAE and RMSE in cbar).
Table 10. Federated learning experiment results for various non-IID and robustness configurations: feature-shift experiments (MAE and RMSE in cbar).
Non_Iid_TYPESeverityRobust._TYPERobust._SEV.Avg MAEStd MAEAvg RMSEStd RMSEAvg Rounds
feature_shift(0.02, 0.02)byzantine0.0508.0600.08922.4290.46018.500
feature_shift(0.02, 0.02)byzantine0.1008.0920.11822.1840.09616.500
feature_shift(0.02, 0.02)byzantine0.2008.9120.56923.4000.53611.000
feature_shift(0.02, 0.02)input_noise0.0107.9870.05022.2590.17732.000
feature_shift(0.02, 0.02)input_noise0.0507.9860.02622.2310.23439.500
feature_shift(0.02, 0.02)input_noise0.1007.9830.05822.3930.39348.000
feature_shift(0.02, 0.02)label_noise0.0508.0100.00522.2490.25922.000
feature_shift(0.02, 0.02)label_noise0.1008.0260.00122.2320.25526.000
feature_shift(0.02, 0.02)label_noise0.2008.0240.01322.2470.28422.000
feature_shift(0.05, 0.05)byzantine0.0508.0680.10322.4560.45818.500
feature_shift(0.05, 0.05)byzantine0.1008.1040.12422.1910.08416.500
feature_shift(0.05, 0.05)byzantine0.2008.9640.50823.4440.48211.500
feature_shift(0.05, 0.05)input_noise0.0108.0360.03722.2110.20126.000
feature_shift(0.05, 0.05)input_noise0.0508.0120.02122.2580.25226.000
feature_shift(0.05, 0.05)input_noise0.1007.9960.04722.3780.42048.000
feature_shift(0.05, 0.05)label_noise0.0508.0330.02122.2280.23726.000
feature_shift(0.05, 0.05)label_noise0.1008.0360.01822.2320.25326.000
feature_shift(0.05, 0.05)label_noise0.2008.0260.00222.2520.28022.000
feature_shift(0.1, 0.1)byzantine0.0508.0500.03722.4010.36527.000
feature_shift(0.1, 0.1)byzantine0.1008.1210.11722.2120.08516.500
feature_shift(0.1, 0.1)byzantine0.2008.9860.50323.4650.47111.500
feature_shift(0.1, 0.1)input_noise0.0108.0940.09722.1850.17319.000
feature_shift(0.1, 0.1)input_noise0.0508.0330.02122.2610.28022.000
feature_shift(0.1, 0.1)input_noise0.1008.0630.07222.1970.18322.500
feature_shift(0.1, 0.1)label_noise0.0508.0510.01622.2560.29122.000
feature_shift(0.1, 0.1)label_noise0.1008.1080.09722.1740.17519.000
feature_shift(0.1, 0.1)label_noise0.2008.0460.00322.2700.31322.000
Table 11. Comparison of related work on soil moisture prediction using machine learning approaches.
Table 11. Comparison of related work on soil moisture prediction using machine learning approaches.
StudyDataset/SourceML ApproachLearning SetupResultsKey Advantage/Limitation
[3]Red River Valley (soil, crop, weather)ML regression modelsCentralized R 2 = 0.72 (soil + crop + weather)Improved accuracy by integrating multiple inputs
[4]Weather + imagery dataMultimodal DL fusion (MIS-ME)CentralizedMAPE ≈ 10.14%Strong multimodal fusion; centralized only
[5]Egypt (soil sensors + forecasts)IoT + ML irrigation systemEdge/CentralizedImproved irrigation efficiency (qualitative)IoT integration; lacks quantitative metrics
[10]Soil + weather time-seriesTransformer foundation modelsCentralizedMAE = 0.031, RMSE = 0.042High accuracy; computationally heavy
[2]Field observationsPhysics-informed DLCentralized RMSE 3.73 × 10 3 m 3 / m 3 Robust generalization; no distributed setup
[11]Sentinel-1 SAR + in situ validationKnowledge-guided DLCentralizedR ≈ 0.64; Uncertainty ↓ 0.02Handles vegetation effects; centralized only
[12]In situ sensors (field scale)Physics-constrained DL + active learningCentralized/ActiveRelative error reduced by  42–52% vs random sensor placementData-efficient; lacks robustness testing
[13]Satellite + in situ (U.S.)Multiscale DLCentralizedR = 0.90; RMSE = 0.034Large-scale validation; high computation
This workWHIN sensor network (140+ stations)Lightweight (0.8 k) and heavy (9.4 k) CNNs + ablationsFederated and CentralizedMAE = 7.80 (FL), 7.25 (centralized)First FL benchmark; robust to non-IID, noise, Byzantine clients
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zakzouk, S.; Said, L.A. Federated Learning for Soil Moisture Prediction: Benchmarking Lightweight CNNs and Robustness in Distributed Agricultural IoT Networks. Mach. Learn. Knowl. Extr. 2025, 7, 132. https://doi.org/10.3390/make7040132

AMA Style

Zakzouk S, Said LA. Federated Learning for Soil Moisture Prediction: Benchmarking Lightweight CNNs and Robustness in Distributed Agricultural IoT Networks. Machine Learning and Knowledge Extraction. 2025; 7(4):132. https://doi.org/10.3390/make7040132

Chicago/Turabian Style

Zakzouk, Salma, and Lobna A. Said. 2025. "Federated Learning for Soil Moisture Prediction: Benchmarking Lightweight CNNs and Robustness in Distributed Agricultural IoT Networks" Machine Learning and Knowledge Extraction 7, no. 4: 132. https://doi.org/10.3390/make7040132

APA Style

Zakzouk, S., & Said, L. A. (2025). Federated Learning for Soil Moisture Prediction: Benchmarking Lightweight CNNs and Robustness in Distributed Agricultural IoT Networks. Machine Learning and Knowledge Extraction, 7(4), 132. https://doi.org/10.3390/make7040132

Article Metrics

Back to TopTop