Federated Transfer Learning for Tomato Leaf Disease Detection Using Neuro-Graph Hybrid Model

Cristea, Ana-Maria; Dobre, Ciprian

doi:10.3390/agriengineering7120432

Open AccessArticle

Federated Transfer Learning for Tomato Leaf Disease Detection Using Neuro-Graph Hybrid Model

by

Ana-Maria Cristea

^1,*

and

Ciprian Dobre

^1,2

¹

Faculty of Automatic Control and Computer Science, National University of Science and Technology Politehnica Bucharest, 060042 Bucharest, Romania

²

National Institute for Research and Development in Informatics, 011455 Bucharest, Romania

^*

Author to whom correspondence should be addressed.

AgriEngineering 2025, 7(12), 432; https://doi.org/10.3390/agriengineering7120432

Submission received: 24 October 2025 / Revised: 6 December 2025 / Accepted: 11 December 2025 / Published: 15 December 2025

Download

Browse Figures

Versions Notes

Abstract

Plant diseases are currently a major threat to agricultural economies and food availability, having a negative environmental impact. Despite being a promising line of research, current approaches struggle with poor cross-site generalization, limited labels and dataset bias. Real-field complexities, such as environmental variability, heterogeneous varieties or temporal dynamics as are often overlooked. Numerous studies have been conducted to address these challenges, proposing advanced learning strategies and improved evaluation protocols. Synthetic data generation and self-supervised learning reduce dataset bias, while domain adaptation, hyperspectral, and thermal signals improve robustness across sites. However, a large portion of current methods are developed and validated mainly on clean laboratory datasets, which do not capture the variability of real-field conditions. Existing AI models often lead to imperfect detection results when dealing with field images complexities, such as dense vegetation, variable illumination or changing symptom expression. Although augmentation techniques can approximate real-world conditions, incorporating field data represents a substantial enhancement in model reliability. Federated transfer learning offers a promising approach to enhance plant disease detection, by enabling collaborative training of models across diverse agricultural environments, using in-field data but without disclosing the participants data to each others. In this study, we collaboratively trained a hybrid Graph–SNN model using federated learning (FL) to preserve data privacy, optimized for efficient use of participant resources. The model achieved an accuracy of 0.9445 on clean laboratory data and 0.6202 exclusively on field data, underscoring the considerable challenges posed by real-world conditions. Our findings demonstrate the potential of FL for privacy preserving and reliable plant disease detection under real field conditions.

Keywords:

federated learning; transfer learning; field conditions; federated transfer learning

1. Introduction

Plant pathology is currently a promising area of research due to technological developments, as well as climatic and economic changes. Among the many crops affected by diseases, tomatoes are a particularly relevant case study due to their worldwide high consumption rates and economic importance. According to [1], the production of tomatoes worldwide for 2024 reached 45.8 million metric tons (MT), surpassing the total of 2023 by 1.43 million MT.

Tomatoes are highly susceptible to several foliar diseases, such as early blight, late blight, bacterial spots, and others, leading to expensive yield losses. Understanding and automatically detecting such diseases through technology is essential to protect the crops of millions of farmers and ensure a stable supply.

Over time, numerous approaches have been developed to address the detection of diseases in tomato leaves, encompassing traditional machine learning (ML) methods, such as K-Nearest Neighbors (k-NN), Support Vector Machine (SVM), Naive Bayes (NB), and more advanced mechanisms, such as convolutional neural networks (CNN). Although these ML techniques have been extensively explored, most approaches require large and diverse training datasets to achieve robust performance. At the same time, most studies remain restricted to laboratory datasets that do not capture real world variability.

The consistency and variety of the dataset represented a major limitation in the study of plant diseases. Due to the worldwide scale of tomato cultivation, the evolution of diseases is often shaped by regional factors such as climate, soil type, temperature, and pest populations; therefore, it is difficult to develop a broadly applicable and consistent dataset.

Beyond the need for more representative datasets, a further challenge lies in the high costs associated with generating and annotating such data. In addition, training models on large datasets requires substantial computational resources, inherently resulting in higher costs. Moreover, potential conflicts of interest could arise between the entities that contributed with data to the creation of the dataset. However, it is unrealistic to think that a model trained once would yield perfect results under all the conditions mentioned above. These challenges have encouraged the exploration of alternative strategies, among which federated learning has emerged as a promising approach.

Federated learning is an increasingly utilized technique in plant disease research, offering promising new directions of study. According to [2], FL is an innovative approach that enables decentralized training of ML models between multiple organizations, ensuring that data is retained exclusively on local devices and not transferred externally. Rather than transmitting raw data to a central server, each participant trains the model locally on its own dataset and shares only the model updates with the central server. The central server aggregates these updates to improve the overall model, ensuring the privacy of sensitive data. This decentralized framework not only enhances data security, but also enables the integration of diverse datasets, leading to improved model accuracy and generalization.

FL obviates the need to transfer data from individual devices to a centralized entity for training purposes, ensuring compliance with data privacy regulations such as GDPR. This approach allows organizations to collaborate with models without violating privacy laws, as raw data remains on local devices. In addition, FL addresses challenges associated with traditional data handling methods, which have become increasingly impractical in modern contexts. Due to stringent administrative procedures, data collection and transfer became almost impossible even within different departments of the same organization. Therefore, FL provides a viable and efficient alternative to secure, decentralized use of data, as demonstrated by [3].

In this paper, we applied FL to detect tomato diseases under complex field conditions. This approach seeks to overcome the limitations of centralized learning by exploiting FL advantages in preserving data privacy, scalability, and efficient use of computational resources. Our work was inspired by the valuable contribution of [4], who provided significant information on the challenge of occlusion and small lesions in the detection of tomato leaf disease. Their study offered important insights, which we extend here through the integration of biological inspiration and the application of FL. This is significant because it supports the development of more generalizable models for real-world deployment and reduces reliance on centralized infrastructures. To our knowledge, this is the first study in this direction.

This paper is structured as follows. Section 2 reviews the work on plant disease detection, while Section 3 describes the materials and methods used. Section 4 presents the experimental results we achieved and Section 5 discusses their implications. Finally, Section 6 concludes the study and outlines directions for future research.

2. Related Work

Given the challenges of developing reliable and field-ready detection systems, numerous studies have explored a range of data acquisition strategies and ML algorithms. In this chapter, we review the most relevant contributions, with a particular focus on approaches applied to the detection of tomato disease.

Existing research relevant to this study can be divided into two main directions. The first focuses on data generation, including the creation of synthetic datasets, augmentation strategies, and field data collection protocols. The second addresses methodological innovations, with particular attention to federated learning and related collaborative training.

An important strand of research has concentrated on data generation, because ML models require training on diverse and consistent datasets to achieve reliable performances. To support research in this area, Ref. [5] presents an open source dataset of plant images, of which 18,162 depict tomato leaves. A similar dataset is presented in [6]. Such datasets were actively used by several authors for training models and inferring plant diseases. Ref. [7] introduced PlantXViT, a hybrid Vision-Transformer CNN model that achieved strong performance on several plant diseases, although their evaluation was limited to publicly available datasets.Ref. [8] developed a generic system for tomato disease detection and severity estimation, demonstrating strong performance but on a private greenhouse dataset, limiting both reproducibility and field applicability. Ref. [9] introduced a non-invasive diagnostic system for tomato late blight based on leaf volatile organic compound profiling with a smartphone-integrated sensor array. In [10], the authors examined how the size and diversity of the dataset influence the performance of deep learning in the classification of plant diseases, showing that larger and more varied datasets significantly improve accuracy, although his experiments remained centered on existing public datasets rather than challenging real field conditions. More recently, a lightweight hybrid model was proposed for the classification of plant leaf disease, reporting high accuracy on the public dataset [5], but its validation remained limited to controlled conditions.

In order to increase the diversity of the dataset, augmentation techniques are applied to generate synthetic images, which help to improve the robustness and accuracy of the models. Ref. [11] proposed a CycleGAN-based translation approach to generate additional diseased leaf images from healthy ones. They combined this data augmentation with [5] and improved classification performance. However, such strategies still face important limitations when transferred to real life conditions. Although [12] achieved an impressive accuracy on [5], their dependence on controlled images highlights the gap between laboratory success and practical deployment, which motivates our focus on model evaluation with real field data.

Building on limitations identified in conventional centralized training, our second research direction explores the use of federated learning, as well as the integration of biological principles in the architecture of the model. Multiple approaches have led to promising results regarding the use of FL for the detection of leaf lesions. Ref. [13] proposed a CNN federated learning model to detect and classify tomato leaf disease at five severity levels, facilitating targeted interventions to minimize yield loss. Ref. [14] introduces a FL approach that preserves data privacy and ensures robustness by implementing a lightweight VGG16-based CNN enhanced with Efficient Channel Attention (ECA), optimized for edge devices. In [15], a decentralized FL framework takes advantage of each client validation loss as a criterion for selectively sharing the model. However, previous approaches have used publicly available datasets where, despite the inclusion of some field images, most of the photos were captured under ideal laboratory conditions.

To enhance the reliability of systems in handling real-world data, E-TomatoDet was proposed in [4], which is a transformer-enhanced, multi-scale feature detection network. E-TomatoDet addresses the challenge of disease detection in real field images, where leaf occlusion and small disease areas have a significant impact on performance. They replaced the Convolutional Neural Network (CNN) backbone with a Cross-Shaped Window Transformer (CSWinTransformer) module. CSWintransformer was originally used by [16], to enhance the ability of the model to capture global contextual information throughout the image while keeping computational demands manageable. A Comprehensive Multi-Kernel Model (CMKM) was used to extract local features at multiple scales.

Existing studies have shown significant progress in the detection of plant disease, yet most approaches still rely on laboratory datasets. These limitations highlight the need for approaches that can generalize across heterogeneous environments while respecting privacy constraints. Motivated by this gap, our work introduces a federated learning framework for tomato leaf disease detection, biologically inspired by the inherent adaptability and diversity of plant-pathogen interactions, with the aim of delivering models that are both field-ready and robust.

3. Materials and Methods

This section briefly describes the dataset employed for training and evaluation, including pre-processing and augmentation steps. In addition, this chapter subsequently outlines the complete workflow, with particular emphasis on the federated learning setup and the model training procedure.

3.1. Dataset

This article includes both public datasets and a proprietary dataset containing real-field images. The public datasets used in our approach, owned by [17,18], contain images for 9 different types of diseases, as well as images of healthy leaves. Most of the photos were taken with focus on a single leaf, with well-defined lesions, but there are also some images included in the dataset that do not focus solely on the leaf, incorporating backgrounds with imperfections.

In addition to the public datasets used, a proprietary dataset was created containing images collected in the field. Our dataset was built with images collected from different geographical areas, both for outdoor and greenhouse crops. The photos were taken periodically, to capture several stages of leaf development, as well as at different times of the day, under various lighting conditions. Moreover, the dataset contains pictures taken in multiple harvests for the same greenhouses. The images contain complex backgrounds, the leaves are not perfectly centered, and they contain traces of pesticides, and even pests. Also, some leaves were photographed on both sides because the effects of the disease could also be visible on the back of the leaf. A sample of the field dataset is shown in Figure 1.

Since the field dataset is limited in size, certain diseases are represented by only a few images. In order to increase the robustness, we first implemented data augmentation techniques, such as rotation, horizontal and vertical flipping, randomized zooming, and brightness adjustment. After applying augmentation techniques, the size of the dataset increased severalfold. Five additional images were generated based on a single reference: one with brightness adjustment, one with randomized zooming, one with channel adjustment, one flipped, and one rotated. Figure 2 shows a sample of data generated after the augmentation of a single image.

As the images in our own dataset were insufficient and did not contain all disease classes, but only five of them, the public datasets were used to pre-train a ML model. The structure of the private data set is shown in Table 1, while the structure of the public data set is presented in Table 2.

The dataset containing field images was used to test the accuracy of the model and validate its feasibility in real situations. For training, in our study all images were resized to a standard resolution of 320 × 320 pixels. Up to this point, our contribution is limited to what previous authors have accomplished. The main innovation consists of the unique combination of every possible augmentation technique for trained images, coping with all imperfection and potential plan disease conditions.

Although the self-collected field dataset provides a realistic representation of the conditions of tomato leaf encountered in practice, it also exhibits several structural limitations that influence model training and evaluation. The dataset is relatively small compared to the public dataset, reflecting the difficulty of acquiring large amounts of labeled images under real farming conditions, where disease occurrences are seasonal and unevenly distributed across fields. As a result, the number of samples per class varies substantially: diseases such as Septoria leaf spot and Early blight are represented by only a handful of original images prior to augmentation, whereas others, like Bacterial spot and Late blight, are more frequently observed and therefore more abundant in the dataset. The inherent imbalance limits the intra-class variability that the model can observe during training and reduces its ability to form robust, class-specific representations for the rarer conditions. Minority classes may exhibit considerable visual overlap with more common diseases, such as similar lesion shapes, colorations, or progression patterns, further increasing the likelihood for misclassification. Moreover, because the augmented data is derived from a small set of original samples, it can only partially compensate for the lack of diversity appearances in the raw dataset.

3.2. Federated Transfer Learning

In traditional ML, train data and test data usually have the same feature space and the same data distribution. According to [19], when the data distribution is different between the train dataset and the test dataset, the prediction can be affected. In real-life scenarios, we often deal with such small amounts of labeled data that ML models cannot be built reliably.

Transfer learning is a learning technique that is used to exploit the invariance between a resource-rich source domain and a resource-short target domain to transfer knowledge from the first to the other, as [20] mentioned. Federated Transfer Learning (FTL) was introduced by [21] to transfer knowledge in a federated manner, without compromising user privacy.

This paper takes advantage of collaboratively training a hybrid CNN model to implement FTL. We used a hybrid CNN that combines two biologically inspired design principles: Spiking Neural Network (SNN) for local, sparse, edge-aware feature detection and Graph-inspired receptive fields for spatial reasoning. The model is trained on the clean laboratory dataset to learn general visual perception features. To demonstrate the efficiency of the proposed model, we tested both on a sample of the lab dataset and on our own field dataset.

3.3. Methodology

Our approach consists of initializing a global model on a central server, training it on clean laboratory data, and spreading across multiple clients responsible for training the received model with local data, thus solving the security issue of data transmission. In this way, the client local data remains on the device, and only the model weights keep being transmitted periodically to the central server.

In our experiment, the main process can be divided into two phases: centralized transfer learning pretraining and federated learning across decentralized clients. For the first phase, a Graph–SNN model is trained on the laboratory dataset in order to learn general-purpose features such as lesion textures, edge patterns, and vein structures. The resulting weights serve as a shared initialization point among the clients. After the training is finished, in this phase the model accuracy is tested on both real field dataset and public available dataset.

In the second phase, the resulted pretrained model from phase 1 is deployed across multiple clients, each containing a unique subset of the training data. The subsets may differ in environmental variations, class distribution or sample counts, in order to simulate a non-IID data scenario. This phase can also be divided into two sub-phases: the transmission of model parameters and the model aggregation, as can be seen in Figure 3.

For the first sub-phase, the central server initializes the pre-trained global Graph-SNN model from the previous phase and sends some of its parameters to the clients who join the training. In order to facilitate the FL experiment, the dataset was split into client-specific subsets, simulating decentralized data distribution. After receiving the parameters from the server, each client initializes a local model and starts training on its own private data. Once the local training is over, the clients send the resulted parameters back to the server.

When all participants complete the training, the model aggregation sub-phase starts, where the received parameters are combined and used as input for the initial hosted model. The server uses its own test dataset to validate the novel aggregated model.

In the context of distributed agricultural image analysis, the adoption of FL requires adherence to establish data-protection standards, including GDPR principles. Although agricultural images may appear non-sensitive at first glance, field-deployment environments frequently involve data collected on private farms, greenhouses, or commercial facilities, and therefore fall under regulatory expectations regarding data ownership, processing transparency, and cross-device data transfer. The training pipeline implemented in this study is inherently privacy-preserving, as all raw images remain stored on the client side and only the model parameter updates are exchanged with the server.

3.4. Model Description

The model used in this paper to implement FTL was inspired by recent efforts of [22,23] and represents a hybrid CNN that integrates biomimetic edge filtering mechanisms with graph-like spatial reasoning. Since standard CNNs struggle to generalize subtile disease symptoms, to address this limitation, our proposed model incorporates design principles from SNN and Graph Neural Networks (GNN).

In our work, the middle block of the architecture is described as graph-like due to the usage of dilated convolutions, which enlarge the receptive field in a manner that is conceptually similar to neighborhood expansion mechanisms found in graph-based processing. In a standard convolution, each output activation aggregates information from a compact 3 × 3 region. Conversely, a dilated convolution with dilation rate r samples pixels at intervals of r, thereby incorporating information from non-adjacent spatial positions within a broader spatial neighborhood. This introduces a sparse, non-local connectivity pattern across the image grid that resembles fixed graph adjacency relationships. Although the model does not explicitly construct or learn a graph, the dilated kernel provides an efficient grid-based approximation of graph-style receptive field propagation. Consequently, the term graph-like is used in a functional sense to denote the broadened contextual aggregation afforded by dilatation while retaining the inductive biases and computational efficiency of conventional CNNs.

The first convolutional block is described as SNN-inspired because it emphasizes local edge and contrast cues, reflecting the functional behavior typically associated with early stages of spiking and biological visual systems.The analogy refers to the effect of using shallow convolutions with small receptive fields and ReLU activation, which produce sparse, high-contrast activation maps highlighting local intensity changes. This response patterns resembles the edge-selective filtering observed in simplified SNN front-ends and early biological vision pathways. Therefore, the term is intended to denote functional affinity with these early processing mechanisms, rather than an explicit implementation of spiking dynamics.

Unlike previous work, the principles were adapted within a lightweight convolutional framework tailored for image classification and federated learning on plant disease datasets. Figure 4 offers a visual representation of the described model. The input images were reshaped to 320 × 320 × 3 in order to preserve spatial and color details, essential for the analysis of leaf disease. The following layer performs initial downsampling to reduce spatial resolution while maintaining dominant visual features. Dilated Convolution layer records long-range spatial dependencies across the leaf by simulating graph-like receptive fields. The next layer is the SNN inspired convolution, which enables the model to respond selectively to the changes of the lesion texture and boundaries. At this point, the global average pooling encapsulates spatial information into a compact vector. The next dense layer learns high-level representations for disease classification, and the last layer produces a probability distribution over the ten given disease classes.

Although the architectural components of the proposed model are based on established convolutional operations, the novelty of this work lies in the integration and evaluation of a lightweight Graph–SNN-inspired feature extractor within a federated learning framework for the identification of plant disease. The compact model design, the receptive-field structure and the front-end edge emphasis were specifically selected to balance discriminative power with computational and communication constraints inherent to federated training. Empirical analysis across decentralized, federated, and domain-adapted phases further demonstrates how this architecture interacts with non-IID data, limited field sample diversity, and post-federation fine-tuning. Thus, the contribution of the work is not primarily the introduction of new convolutional operations, but the demonstration and characterization of this model class within a realistic FL deployment pipeline.

3.5. Federated Transfer Learning Pipeline

The FTL pipeline we implemented consists of a central server, which initializes the model described in the previous section, and four clients. The number of clients is realistic for this research area, many published experiments, especially in agriculture and healthcare, using 2 to 5 clients to simulate heterogeneous conditions. Ref. [24] used 3 clients to detect presence of population heterogeneity in federated settings, Ref. [25] implements a FL multi-location evapotranspiration estimation using data from 3 clients, Ref. [26] uses 3 clients to implement federated vision-language for crop disease and pest detection.

All participants own a consistent dataset derived from the one described previously. The pipeline allows collaborative learning without exposing client data and can be structured as follows: global model initialization, local client training, model upload, aggregation and synchronization.

3.5.1. Global Model Initialization

This stage represents the transfer learning component of the framework, where the global model is initialized using parameters pre-trained on the centralized dataset. At this point, general representation features, such as edge structure, shape, texture, and lesion patterns are learned.

In this stage, the server initializes the global model with pre-trained weights resulting from centralized training of on the laboratory dataset. The pre-trained weights represent the foundational knowledge that limits the cold-start problem.

The resulting global model will be used by the participants in the federated training process. To reduce communication overhead, the server sends only the model weights to the clients, instead of the entire model.

3.5.2. Clients Local Training

Upon receiving the model weights from the server, each client loads them into its own local independent instance of the model. The clients training configuration uses a batch size of 32, 3 epochs per communication round. In this stage, the model adapts its parameters to the specific client data distribution while preserving data privacy and locality.

3.6. Model Transmission

After the local training is completed, each client sends the local model parameters to the central server. Raw data is not shared between the process entities, but only model weights, so privacy constraints are ensured. This stage represents the federated update step, where the client knowledge is transferred without exposing sensitive data, via secure communication.

Model Aggregation

Each time the server receives weights from all participants, a training round is finished. In our experiments, the model was trained for 5 rounds, and the aggregation process was triggered after each round. The server performs model aggregation using the Federated averaging algorithm (FedAvg), introduced by [27]. In this stage, the global model parameters are updated as a weighted average of the received client models.

We define K as the total number of clients and

w_{t}^{(k)}

the model weights of client k after local training in round t. Let

n_{k}

be the number of training samples on client k. We consider N the total number of samples among all clients, defined as

N = \sum_{k = 1}^{K} n_{k}

, and

w_{t + 1}

the updated weights received by the global model for round t + 1.

The weights of the global model are aggregated using FedAvg [27] as follows:

w_{t + 1} = \sum_{k = 1}^{K} \frac{n_{k}}{N} \cdot w_{t}^{(k)}

In this way, we consider that the contribution of each client to the global model is proportional to the amount of data it holds, thus ensuring fairness among participants in the training process, especially if the data distribution differs significantly.

3.7. Synchronization

After the aggregation process is over, the resulted global model is redistributed to the participants. At this point, synchronization of the training states is necessary so that each client can benefit from the knowledge accumulated by the model following aggregation.

In this stage, each client aligns its local model with the global model, so that all participants start the next training round from the same global model state. In this way, consistency, stability, and the propagation of knowledge are ensured.

4. Results

This section presents the experimental results obtained from the proposed approach. We report the performance of the model on both laboratory and field data, followed by an analysis of accuracy, generalization, and limitations.

In our experiments, the entire model described in Section 3.5 was stored on the server and pre-trained in a centralized manner on the dataset that contains clear laboratory data. At the end of this phase, we tested the accuracy of the model on both public and private datasets.

Using the public data set, predominantly composed of images captured under perfect laboratory conditions, we obtained an accuracy of 88% as shown in Table 3, which is a good starting point for federated training. Since the public test dataset has the same distribution as the training dataset, the centralized model generalizes well.

Following centralized training on the public dataset, the model was evaluated on the real field dataset to assess its generalization capability. The results indicate that, despite the application of augmentation techniques to increase diversity, the accuracy obtained is lower than that achieved on the public dataset. As we can see in Table 4, at this point, the model struggles to adapt to field domain, the performance of small classes being particularly weak.

The confusion matrix summarizes the performance of a model by comparing the predicted labels with the true labels, according to [28]. Our model shows strong performance on the clean public dataset, but struggles on the real field one, as we can see, in Figure 5 and Figure 6.

While the confusion matrix quantifies the classification performance across disease classes, additional insight into the underlying feature space can be gained by visualizing the learned representations. We implemented PCA and t-SNE to project the high-dimensional features into two dimensions and assess the degree of class separability. Figure 6 confirms that train, validation and test sets share consistent feature distributions for the public dataset.

In the federated training process, each client performed 3 epochs of training, meaning the model completely passed through the entire training dataset 3 times before sending the resulted weights back to the server for aggregation. The training process consists of five rounds, which means that each client performs 3 epochs of training 5 times consecutively. After each round, when all clients finish the local training, the server aggregates the parameters obtained from each using FedAvg.

After the FTL training is over and the model is aggregated at the server level, we tested its performance on both the public dataset and the private dataset. As shown in Table 5, the global model achieved 94.45% accuracy with balanced precision, recall, and F1-scores across all classes on the public dataset, demonstrating that it successfully generalized the test data.

As we can see in Figure 7, the model improved its performance after federated training. It has learned strong disease features under controlled conditions, each category being largely correctly predicted.

Our model demonstrated consistently high precision, recall, and F1 scores across most tomato leaf disease classes, indicating robust and reliable classification performances.

The feature embedding visualizations obtained through PCA and t-SNE are shown in Figure 8 and Figure 9. As we can observe, train, validation, and test sets demonstrate highly consistent distributions under the FTL setup. Compared to the centralized model, where the embeddings appeared to be more dispersed, the FL approach yields a more compact and heterogeneous feature space. This indicates that FL training preserves representativeness across splits and enhances the alignment of feature embeddings.

When evaluating the private field dataset, the performance dropped significantly due to real-world noise, illumination changes, and occlusions. Since the field dataset is much smaller than the lab dataset and has fewer classes, training from scratch would overfit. Moreover, using the model as it is, it does waste knowledge, because its head expects 10 classes. When the model was evaluated directly on the field dataset, performance was limited due to the domain shift between real field conditions and controlled lab images. The model achieved only 48,6% accuracy and a macro average F1-score of 0.39, which means weak performance on minority classes such as Septoria. Although bacterial spot has a high precision of 0.88, the recall was modest, only 0.54, meaning frequent missclasifications.

As observed in Table 6, the global model obtained after FL performs poorly on the field dataset relative to the centrally trained mode. Although this outcome may initially appear unexpected, it is well aligned with the documented behavior of FedAvg in the presence of heterogeneous and non-IID client data. Ref. [29] demonstrated that FedAvg can exhibit substantial accuracy degradation relative to a centralized model when trained on non-IID data, attributing this gap to client drift and the dominance of locally biased gradient updates during aggregation. Similarly, Ref. [30] systematically benchmarked FedAvg across several image-classification datasets and showed that federated models frequently underperform centralized baselines when clients have imbalanced or distributionally skewed data partitions. This behavior has been further confirmed by [31], who demonstrated that non-IID data substantially reduce FL accuracy in image classification tasks, prompting the development of GAN-based strategies to mitigate distributional divergence. These findings align closely with our experimental results: although the model converges well on clean and homogeneous laboratory domain, the aggregated representations generalize poorly to the visually complex and highly variable field images, This behavior reinforces the sensitivity of FedAvg to distributed mismatch and highlights the need for post-aggregation adaptation strategies, such as the fine-tuning step implemented in the third phase of our experiment.

In order to adapt the global model to field conditions, a fine-tuning stage was applied after the final federated aggregation. This step was carried out centrally on the server and uses the complete proprietary dataset, which contains only five classes: bacterial spot, early blight, healthy, late blight, and Septoria leaf. We retained the backbone as a fixed feature extractor and removed the final classification layer. A new classification head with five output neurons and softmax activation was attached and trained on our private dataset. To avoid overfitting, all backbone layers were frozen and only the new head was optimized.

After applying this strategy, as shown in Table 7, the adapted model reached 62% accuracy and a macro F1 score of 0.56. Visible improvements were produced for the healthy class, where the F1 score increased to 0.80 and Septoria, to 0.43. The confusion matrix highlights that misclassifications now occur between visually similar conditions, such as bacterial spot and late blight. By applying fine-tuning, the pre-trained features effectively realign to the field domain and enhance sensitivity to minority classes.

The comparative analysis of the confusion matrices presented in Figure 5 and Figure 10 further illustrates the evolution of class-specific performance across the five field classes. Before fine-tuning, the predictions were broadly dispersed, with each class exhibiting substantial misclassification into multiple alternative categories. Bacterial spot showed the highest concentration of correct predictions, but also notable confusion with healthy and late blight. Early blight showed a diffuse error pattern, with samples distributed across all four other classes. Healthy images were frequently misassigned as Late Blight, while Late Blight itself showed pronounced overlap with bacterial spot and healthy. The minority class, Septoria leaf spot, exhibited minimal correct predictions and a high degree of scattering across the confusion matrix.

After the fine-tuning stage, the overall structure of the confusion matrix becomes more coherent, with a clearer diagonal trend, indicating an increased proportion of correctly classified samples across most classes. However, Figure 10 also shows that errors persist in characteristic ways that reflect the inherent difficulty of the field domain. Misclassifications remain common between visually similar conditions, and the two minority classes, Septoria leaf spot and Early blight, although showing slight improvement, continue to exhibit widely dispersed error patterns. These observations point to underlying challenges such as the subtile visual distinctions between certain diseases, the limited diversity and imbalance of the field dataset, and the residual effects of the domain shift between laboratory and field images. The patterns visible in the fine-tuned confusion matrix provide a clear starting point for a broader analysis of model limitations, data constraints, and adaptation needs, as examined in detail in the following section.

5. Discussion

The current research explored a novel FTL approach for tomato disease detection using a biologically inspired Graph-CNN hybrid convolutional architecture. In the broader context of lightweight models used for the detection of plant-disease in distributed or federated settings, MobileNetV2 [32] has emerged as one of the most widely adopted backbones due to its favorable balance between accuracy and computational efficiency. Recent studies have demonstrated its suitability for classification of crop-diseases within federated pipelines. Ref. [33] report that MobileNetV2 performs competitively in federated frameworks for the detection of rice-leaf disease in multiple cross-silo clients, while [34] show that it achieves strong performance and communication efficiency in federated learning applied to multi-crop datasets. In contrast, the model proposed in our work follows a different architectural rationale. Rather than relying solely on depthwise separable convolutions for efficiency, it integrates an edge-sensitive front-end inspired by simplified spiking-processing behavior together with dilated graph-like convolutions designed to capture both fine-grained texture cues and expanded spatial context. This design targets the specific challenges encountered in real-world scenarios, particularly the non-IID nature and strong domain shift between laboratory and field environments. Our methodology was performed in three phases: centralized training, federated training, and fine-tuning stage. Each phase provided valuable insight into the generalization and efficiency of model deployment.

The study proposed by [34] offers a valuable FL benchmark for comparison, as their experimental setup is structurally similar to ours. Importantly, the authors evaluate a broad range of pretrained deep-architectures, including VGG-16, ResNet-50, DenseNet-121, MobileNet-V2, Inception-V3, and Vision Transformer, and report consistently high results on multi-crop PlantVillage dataset [18]. The dataset on which they performed the experiments consists of clean leaf images captured under controlled laboratory conditions with uniform backgrounds and well-balanced class distributions. Their CNN baseline achieved approximately 99.55% accuracy, but also consistently strong precision, recall, and F1-scores, often exceeding 0.98 for most disease categories. In our study, the proposed model, also CNN based, shows comparably robust behavior on the laboratory tomato dataset, achieving 94.45% accuracy and class-level precision, recall, and F1-scores typically in the 0.90–0.97 range. These results align with the performance trends observed by [34] and reinforce that, under curated conditions, both conventional and hybrid CNN models achieve high discriminative performance regardless of architectural variations. However, on our real-field dataset, the model attains significantly low values, accuracy of 0.48 after FL and 0.62 after fine-tuning, alongside severe reductions in class level precision and recall, particularly for minority diseases such as Septoria leaf spot and Early blight. The strong performance gap across precision, recall, and F1-score confirms that FL behaves very differently when deployed on real agricultural imagery affected by nature variations, occlusions, pesticides and inconsistent illumination, conditions which are absent in the dataset used by [34].

A second relevant point of comparison is the work of [33], who proposed an FTL framework for rice-leaf disease classification using MobileNet-V2 and EfficientNet-B3 as backbone models. Unlike our field dataset, which contains severely underrepresented disease categories, most notably Septoria leaf spot with only 28 original samples, their dataset includes substantially more samples per class, offering a far more balanced distribution across the four target categories. Although our dataset achieves numerically similar class sizes only after augmentation, the augmented images cannot compensate for the limited morphological diversity present in the original samples. The difference in dataset composition directly affects the performance, as their models achieve approximately 98% accuracy on IID data and around 90% on non-IID partitions, while our model, trained and evaluated on more imbalanced tomato field imagery, reaches 62.02% accuracy even after fine-tuning. Furthermore, Ref. [33] allow each client to perform its own customized preprocessing pipeline, including individual choices of data cleaning, normalization, resizing, and augmentation. This flexibility introduces a degree of client-specific data curation that tends to reduce inter-client variability. Our results highlight the challenges inherent in FL under true domain shift and severe class imbalance, conditions closer to real-world agricultural deployments.

The centralized experiments revealed a clear divergence in model performance between laboratory and field datasets. When trained on publicly available datasets, our model achieved an accuracy of 88% and demonstrated balanced precision and recall across all ten classes. However, this phase resulted in a substantially lower accuracy on the field dataset. Despite applying data augmentation, several classes, such as Septoria or Early blight, remained challenging for the model. The reduced performance of the model in this scenario highlights the difficulty of direct transfer techniques that perform well on curated laboratory dataset to real-world agricultural settings.

The results obtained in the second phase of the experiment show a distinct improvement on the lab dataset, where the accuracy increased to approximately 94%. This improvement indicates that the federated setup effectively enriched the global model without compromising the generalization capabilities learned from clean laboratory data. However, the field dataset exhibited the opposite trend, with the accuracy declining from the centralized baseline of about 60% to roughly 49% after federated learning. This reduction can be explained by the non-IID nature of the distributed dataset. The field images differ markedly from the lab images in their visual complexity, noise characteristic, and overall data distribution. During federated aggregation, updates originating from the larger and visually cleaner lab dataset have a stronger influence on the global model. Consequently, the shared model becomes increasingly aligned with the representations that are well suited to the lab domain, but less effective for capturing the variable conditions of real world environments. This reflects a well known challenge FL, where client heterogeneity and distributional mismatch can induce domain drift and gradient conflicts, ultimately leading to performance degradation on minority or more complex data sources, as shown by [35].

Although our final field dataset accuracy of 62% does not reach the values often cited in leaf-disease classification literature, the level of performance is reasonable given the substantial domain shift and the limited amount of real field data available in our setting. Most high-accuracy reports originate from controlled or semi-controlled datasets, where models are trained and tested on visually homogeneous conditions. As noted by [36], many of the results obtained on laboratory datasets are optimistic because they do not represent the full complexity of real-world scenarios. In contrast, the images in our field dataset were collected under genuinely heterogeneous cultivation conditions, including both greenhouses and open-field plots. Photographs were taken at different times of the day and therefore exhibit substantial variation in illumination, shadow patterns, and color temperature. In addition, many leaves have traces of pesticide residues, dust, water drops, or the presence of insects, and symptoms are often occluded by overlapping foliage or stems. These factors introduce complex background clutter and visual noise while altering the appearance and textures of symptoms. Even if data augmentation increases the number of training samples, it cannot replace the diversity contained in truly distinct real-world scenarios. Rotations, flipping, brightness adjustments, or zooming operations generate transformed versions of existing images, but do not introduce new biological or environmental variability. In our case, the field dataset contains a limited number of original images, particularly for minority classes such as Septoria, and these images already exhibit constrained symptom diversity. Since augmentation only reproduces variations along predefined transformation axes and cannot generate new disease manifestations, the ability of the model to generalize remains bounded by the limited diversity of the original dataset.

Our results also align with previous findings in the plant disease domain, where field-acquired datasets remain scarce and yield lower performance compared to controlled laboratory benchmarks. For example, Ref. [37] highlights that the leaf classification performance decreases when moving away from controlled settings. In addition, the recent review by [36] emphasizes that only a small subset of studies employ true field-level tomato datasets, and many of the frequently cited accuracies claims rely on more idealized conditions. In our study, the centralized model reached approximately 60% accuracy on the field dataset, which dropped to about 49% during federated training due to domain shift and client heterogeneity, but subsequently recovered to 62% following the fine-tuning stage. This progression underscores both the inherent challenge of the field domain and the practical effectiveness of our federated adaptation strategy. Under these conditions, an accuracy of 62% constitutes a realistic benchmark for deployment in agricultural environments.

Fine-tuning played a critical role in recovering the performance on field dataset. The procedure represents a post-federation domain adaptation stage, in which federated backbone is frozen and a new classification head containing five classes is trained exclusively on field data. This approach allows the model to retain the representations learned collaboratively during federated training while adjusting the decision boundaries to the specific characteristics of the field domain.

Despite the improvements obtained through head-only fine-tuning, the minority classes in the field dataset-most notably Septoria leaf and Early blight-continued to demonstrate weak performances. Although Septoria improved from F1-score of 0.22 before fine-tuning to 0.42 afterward, and Early blight reached an F1-score of 0.28, these values remain substantially lower than those achieved by majority classes. This gap indicates that the model faces considerable difficulty in learning the subtle visual representations associated with these diseases under field conditions. The challenge is compounded by the severe class imbalance present in the dataset. For example, only 28 original Septoria images were available prior augmentation, restricting the diversity of authentic symptom presentations on which the model could train.

Although augmentation increased the number of training samples, it did not generate additional biological variability or novel lesion morphologies and therefore could not fully compensate for the limited diversity of the original images from the minority-class. In addition, the fine-grained and sometimes overlapping characteristics of Septoria and Early blight lesions are often obscured by field-related factors, such as variable illumination, surface residues, and background clutter. These environmental complexities further affect the ability of the model to learn robust discriminative boundaries for minority classes.

In addition to domain shift issues, our FL setup exhibited practical challenges frequently encountered in real deployments, such as occasional missing client updates and limited class overlap between the training and evaluation datasets. Missing updates occurred when one or more clients were unable to complete a training round or upload their local model parameters. We handled these cases by performing aggregation using only the available client updates during that round, following the standard FedAvg protocol. Clients who missed a round automatically rejoined the subsequent round by downloading the latest global model. This approach prevents training interruption while maintaining robustness to intermittent client availability, a behavior widely considered essential in operational FL environments. The second issue concerned the test dataset rather than the federated clients. Although in the second phase, all clients had access to the full set of ten laboratory classes, the real field dataset used for the evaluation in phase 3 contained only 5 classes. To reconcile the discrepancy, we performed a centralized head-only fine-tuning step to align the decision boundaries of the model with the reduced class structure of the target domain, thereby mitigating the impact of limited class overlap between FL training distribution and the downstream field evaluation scenario.

To further enhance sensitivity and recall for minority classes, additional measures beyond head-only optimization are likely necessary. A first potential strategy includes the acquisition of more diverse field images for rare classes, which would increase the true biological variability available for training. Similarly, more advanced loss-balancing techniques, such as focal loss, class-balanced loss, or hard-mining, could be implemented to place greater emphasis on underrepresented samples during training, thereby encouraging the model to better learn minority-class patterns. Another promising direction is to allow selective unfreezing of deeper backbone layers during fine-tuning, enabling the feature extractor itself to adapt directly to the particularities of field-domain minority classes rather than relying solely on the classifier head for adjustment. In addition, the usage of other federated strategies, such as FedProx [35] or related variants, could help mitigate domain drift and cross-client imbalance by allowing clients with scarce or highly variable data to retain locally adapted model components.

6. Conclusions

The present research introduces a federated transfer learning framework for tomato leaf disease classification that incorporates a biologically inspired Graph–SNN hybrid architecture. The system enables collaborative learning by distributing a pre-trained model across four simulated clients, without sharing raw images. Each client owns distinct non-identical distributed data (non-IID). Our federated strategy captures the variability of diverse real-field images while preserving data privacy, making it suitable for deployment in agricultural scenarios with limited data access. Although the federated experiments were conducted on a limited scale, the results highlight the potential of the framework to be extended to larger, more heterogeneous deployments.

The hybrid model architecture combines dilated convolutions that simulate spatial reasoning with sparse edge convolutional filters inspired by SNNs. This approach ensures both contextual understanding within leaf images and local lesion detection. In this way, a more robust classification under real-world occlusions, noise, and lightning variations is conducted. The system is suitable for deployment on low-power edge devices because the global average pooling and shallow dense layers ensure computational efficiency.

Experimental results show that centralized training provides strong initialization, while federated training improves performance over multiple rounds of local training and global aggregation. Key challenges are highlighted in this paper, such as uneven performance across disease classes, missing client updates, and limited class overlap in test dataset. Our study demonstrates that by combining FTL with biologically inspired model architectures, a scalable privacy-preserving plant disease detection system could be used in distributed agricultural environments.

Author Contributions

Conceptualization, A.-M.C. and C.D.; methodology, A.-M.C.; software, A.-M.C.; validation, A.-M.C. and C.D.; formal analysis, A.-M.C.; investigation, A.-M.C.; resources, A.-M.C.; data curation, A.-M.C.; writing—original draft preparation, A.-M.C.; writing—review and editing, C.D. and A.-M.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset generated and analyzed during the current study is not publicly available due to confidentiality considerations but is available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

The Morning Star Company. 2024 Post Season Global Tomato Crop Update. 2024. Available online: https://www.morningstarco.com/2024-post-season-global-tomato-crop-update/ (accessed on 2 March 2025).
McMahan, H.B.; Moore, E.; Ramage, D.; Hampson, S.; Agüera y Arcas, B. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
Yang, Q.; Liu, Y.; Cheng, Y.; Kang, Y.; Chen, T. Federated Learning; Synthesis Lectures on Artificial Intelligence and Machine Learning; Morgan & Claypool Publishers: San Rafael, CA, USA, 2020. [Google Scholar] [CrossRef]
Sun, H.; Fu, R.; Wang, X.; Wu, Y.; Al-Absi, M.A.; Cheng, Z.; Chen, Q.; Sun, Y. Efficient deep learning-based tomato leaf disease detection through global and local feature fusion. BMC Plant Biol. 2025, 25, 311. [Google Scholar] [CrossRef]
Hughes, D.P.; Salathé, M. An open access repository of images on plant health to enable the development of mobile disease diagnostics. arXiv 2015, arXiv:1511.08060. [Google Scholar]
Gehlot, M.; Saxena, R.K.; Gandhi, G.C. “Tomato-Village”: A dataset for end-to-end tomato disease detection in a real-world environment. Multimed. Syst. 2023, 29, 3305–3328. [Google Scholar] [CrossRef]
Thakur, P.S.; Khanna, P.; Sheorey, T.; Ojha, A. Explainable vision transformer enabled convolutional neural network for plant disease identification: PlantXViT. arXiv 2022, arXiv:2207.07919. [Google Scholar] [CrossRef]
Wspanialy, P.; Moussa, M. A detection and severity estimation system for generic diseases of tomato greenhouse plants. Comput. Electron. Agric. 2020, 178, 105701. [Google Scholar] [CrossRef]
Li, Z.; Paul, R.; Ba Tis, T.; Saville, A.C.; Hansel, J.C.; Yu, T.; Ristaino, J.B.; Wei, Q. Non-invasive plant disease diagnostics enabled by smartphone-based fingerprinting of leaf volatiles. Nat. Plants 2019, 5, 856–866. [Google Scholar] [CrossRef] [PubMed]
Barbedo, J.G.A. Impact of dataset size and variety on the effectiveness of deep learning and transfer learning for plant disease classification. Comput. Electron. Agric. 2018, 153, 46–53. [Google Scholar] [CrossRef]
Min, B.; Kim, T.; Shin, D.; Shin, D. Data augmentation method for plant leaf disease recognition. Appl. Sci. 2023, 13, 1465. [Google Scholar] [CrossRef]
Kanda, P.; Xia, K.; Kyslytysna, A.; Owoola, E. Tomato leaf disease recognition on leaf images based on fine-tuned residual neural networks. Plants 2022, 11, 2935. [Google Scholar] [CrossRef] [PubMed]
Mehta, S.; Kukreja, V.; Yadav, R. A Federated Learning CNN Approach for Tomato Leaf Disease with Severity Analysis. In Proceedings of the 2023 Second International Conference on Augmented Intelligence and Sustainable Systems (ICAISS), Trichy, India, 23–25 August 2023; pp. 309–314. [Google Scholar] [CrossRef]
Piccialli, F.; Della Bruna, C.; Chiaro, D.; Qi, P.; Savoia, M. AGRIFOLD: AGRIculture Federated learning for Optimized Leaf disease Detection. Expert Syst. Appl. 2025, 289, 128371. [Google Scholar] [CrossRef]
Kabala, D.M.; Hafiane, A.; Bobelin, L.; Canals, R. Loss-Guided Model Sharing and Local Learning Correction in Decentralized Federated Learning for Crop Disease Classification. arXiv 2025, arXiv:2505.23063. [Google Scholar] [CrossRef]
Dong, X.; Bao, J.; Chen, D.; Zhang, W.; Yu, N.; Yuan, L.; Chen, D.; Guo, B. CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 12124–12134. [Google Scholar]
Motwani, A. Tomato Dataset. Available online: https://www.kaggle.com/datasets/ashishmotwani/tomato (accessed on 24 February 2025).
Mohanty, S.P.; Hughes, D.P.; Salathé, M. Using deep learning for image-based plant disease detection. Front. Plant Sci. 2016, 7, 1419. [Google Scholar] [CrossRef]
Shimodaira, H. Improving predictive inference under covariate shift by weighting the log-likelihood function. J. Stat. Plan. Inference 2000, 90, 227–244. [Google Scholar] [CrossRef]
Scheck, T.; Seidel, R.; Hirtz, G. Learning from theodore: A synthetic omnidirectional top-view indoor dataset for deep transfer learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA, 1–5 March 2020; pp. 943–952. [Google Scholar]
Liu, Y.; Kang, Y.; Xing, C.; Chen, T.; Yang, Q. A Secure Federated Transfer Learning Framework. IEEE Intell. Syst. 2020, 35, 70–82. [Google Scholar] [CrossRef]
Li, J.; Yu, Z.; Zhu, Z.; Chen, L.; Yu, Q.; Zheng, Z.; Tian, S.; Wu, R.; Meng, C. Scaling up dynamic graph representation learning via spiking neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 8588–8596. [Google Scholar]
Wang, B.; Jiang, B. Spiking gats: Learning graph attentions via spiking neural network. arXiv 2022, arXiv:2209.13539. [Google Scholar] [CrossRef]
Nolte, D.; Bazgir, O.; Ghosh, S.; Pal, R. Federated learning framework integrating REFINED CNN and Deep Regression Forests. Bioinform. Adv. 2023, 3, vbad036. [Google Scholar] [CrossRef] [PubMed]
Tausif, M.; Iqbal, M.W.; Bashir, R.N.; AlGhofaily, B.; Elyassih, A.; Khan, A.R. Federated learning based reference evapotranspiration estimation for distributed crop fields. PLoS ONE 2025, 20, e0314921. [Google Scholar] [CrossRef]
Li, L.; Li, J.; Chen, D.; Pu, L.; Yao, H.; Huang, Y. VLLFL: A Vision-Language Model Based Lightweight Federated Learning Framework for Smart Agriculture. arXiv 2025, arXiv:2504.13365. [Google Scholar] [CrossRef]
Sun, T.; Li, D.; Wang, B. Decentralized Federated Averaging. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 4289–4301. [Google Scholar] [CrossRef] [PubMed]
Evidently AI. Confusion Matrix—Classification Metrics Guide. 2025. Available online: https://www.evidentlyai.com/classification-metrics/confusion-matrix (accessed on 5 March 2025).
Zhu, H.; Xu, J.; Liu, S.; Jin, Y. Federated learning on non-IID data: A survey. Neurocomputing 2021, 465, 371–390. [Google Scholar] [CrossRef]
Casella, B.; Esposito, R.; Cavazzoni, C.; Aldinucci, M. Benchmarking fedavg and fedcurv for image classification tasks. arXiv 2023, arXiv:2303.17942. [Google Scholar]
Chuenbubpha, T.; Boonchoo, T.; Haga, J.; Rattanatamrong, P. Solving non-IID in federated learning for image classification using GANs. In Proceedings of the 2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE), Phitsanulok, Thailand, 28 June–1 July 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 333–338. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Aggarwal, M.; Khullar, V.; Goyal, N.; Gautam, R.; Alblehai, F.; Elghatwary, M.; Singh, A. Federated transfer learning for rice-leaf disease classification across multiclient cross-silo datasets. Agronomy 2023, 13, 2483. [Google Scholar] [CrossRef]
Mamba Kabala, D.; Hafiane, A.; Bobelin, L.; Canals, R. Image-based crop disease detection with federated learning. Sci. Rep. 2023, 13, 19220. [Google Scholar] [CrossRef]
Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst. 2020, 2, 429–450. [Google Scholar]
Jelali, M. Deep learning networks-based tomato disease and pest detection: A first review of research studies using real field datasets. Front. Plant Sci. 2024, 15, 1493322. [Google Scholar] [CrossRef] [PubMed]
Singh, D.; Jain, N.; Jain, P.; Kayal, P.; Kumawat, S.; Batra, N. PlantDoc: A dataset for visual plant disease detection. In Proceedings of the 7th ACM IKDD CoDS and 25th COMAD, Hyderabad, India, 5–7 January 2020; pp. 249–253. [Google Scholar]

Figure 1. Field images.

Figure 2. Field images after augmentation.

Figure 3. Federated process description.

Figure 4. Model architecture.

Figure 5. Results before starting federated training. (a) Confusion matrix before federated training on public dataset. (b) Confusion matrix before federated training on private dataset.

Figure 6. Results before starting federated training. (a) PCA for centralized model after training on public dataset. (b) t-SNE for centralized model after training on public dataset.

Figure 7. Confusion matrix after federated training on public dataset.

Figure 8. Evaluation curves of the federated Graph SNN model on the lab dataset.

Figure 9. Results after federated training. (a) PCA for model after federated learning final aggregation. (b) t-SNE for model after federated learning final aggregation.

Figure 10. Confusion matrix after fine-tuning.

Table 1. Own dataset specifications.

Host	Class	Before Augmentation	After Augmentation
Tomato	Bacterial spot	374	2244
Tomato	Early blight	60	360
Tomato	Healthy	107	643
Tomato	Late blight	186	1116
Tomato	Septoria leaf	28	168

Table 2. Public dataset specifications.

Host	Class	Train	Valid
Tomato	Bacterial spot	3355	425
Tomato	Early blight	2501	480
Tomato	Healthy	4461	481
Tomato	Late blight	4230	463
Tomato	Leaf mold	2834	470
Tomato	Mosaic virus	2611	448
Tomato	Septoria leaf spot	3952	436
Tomato	Target spot	3688	457
Tomato	Two-spotted spider mite	3852	435
Tomato	Yellow leaf curl virus	5659	490

Table 3. Classification report on public dataset at centralized training.

Class	Precision	Recall	F1-Score	Support
Bacterial_spot	0.88	0.90	0.89	200
Early_blight	0.94	0.77	0.85	200
Healthy	0.85	0.92	0.88	200
Late_blight	0.78	0.80	0.79	200
Leaf_mold	0.88	0.89	0.88	200
Mosaic_virus	0.96	0.98	0.97	200
Septoria_leaf	0.87	0.94	0.90	200
Target_spot	0.90	0.80	0.85	200
Two_spotted_spider_mite	0.85	0.91	0.87	200
Yellow_leaf_curl	0.94	0.93	0.93	200

Approximate accuracy: 0.88.

Table 4. Classification report on field dataset at centralized training.

Class	Precision	Recall	F1-Score	Support
Bacterial_spot	0.86	0.71	0.78	2244
Early_blight	0.57	0.28	0.38	360
Healthy	0.40	0.55	0.46	643
Late_blight	0.62	0.61	0.62	1116
Septoria_leaf	0.00	0.00	0.00	168
Micro avg	0.66	0.60	0.63	4531
Macro avg	0.49	0.43	0.44	4531
Weighted avg	0.68	0.60	0.63	4531

Approximate accuracy: 0.6036.

Table 5. Classification report on public dataset at federated training.

Class	Precision	Recall	F1-Score	Support
Bacterial_spot	0.95	0.92	0.93	200
Early_blight	0.96	0.92	0.94	200
Healthy	0.99	0.92	0.95	200
Late_blight	0.84	0.82	0.83	200
Leaf_mold	0.87	0.98	0.92	200
Mosaic_virus	0.97	0.99	0.98	200
Septoria_leaf	0.95	0.97	0.96	200
Target_spot	0.94	0.94	0.94	200
Two_spotted_spider_mite	0.97	0.98	0.97	200
Yellow_leaf_curl	0.96	0.99	0.97	200

Approximate accuracy: 0.9445.

Table 6. Classification report on field dataset after federated training, before fine-tuning.

Class	Precision	Recall	F1-Score	Support
Bacterial_spot	0.87	0.53	0.66	2244
Early_blight	0.35	0.30	0.32	360
Healthy	0.20	0.27	0.23	643
Late_blight	0.46	0.55	0.50	1116
Septoria_leaf	0.14	0.53	0.22	168
Micro avg			0.48	4531
Macro avg	0.40	0.44	0.39	4531
Weighted avg	0.61	0.48	0.52	4531

Approximate accuracy: 0.4862.

Table 7. Classification report on field dataset after federated training, after fine-tuning.

Class	Precision	Recall	F1-Score	Support
Bacterial_spot	0.85	0.64	0.73	2244
Early_blight	0.19	0.50	0.28	360
Healthy	0.74	0.86	0.79	643
Late_blight	0.63	0.46	0.53	1116
Septoria_leaf	0.31	0.66	0.42	168
Micro avg			0.62	4531
Macro avg	0.54	0.62	0.55	4531
Weighted avg	0.71	0.62	0.64	4531

Approximate accuracy: 0.6202.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cristea, A.-M.; Dobre, C. Federated Transfer Learning for Tomato Leaf Disease Detection Using Neuro-Graph Hybrid Model. AgriEngineering 2025, 7, 432. https://doi.org/10.3390/agriengineering7120432

AMA Style

Cristea A-M, Dobre C. Federated Transfer Learning for Tomato Leaf Disease Detection Using Neuro-Graph Hybrid Model. AgriEngineering. 2025; 7(12):432. https://doi.org/10.3390/agriengineering7120432

Chicago/Turabian Style

Cristea, Ana-Maria, and Ciprian Dobre. 2025. "Federated Transfer Learning for Tomato Leaf Disease Detection Using Neuro-Graph Hybrid Model" AgriEngineering 7, no. 12: 432. https://doi.org/10.3390/agriengineering7120432

APA Style

Cristea, A.-M., & Dobre, C. (2025). Federated Transfer Learning for Tomato Leaf Disease Detection Using Neuro-Graph Hybrid Model. AgriEngineering, 7(12), 432. https://doi.org/10.3390/agriengineering7120432

Article Menu

Federated Transfer Learning for Tomato Leaf Disease Detection Using Neuro-Graph Hybrid Model

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Dataset

3.2. Federated Transfer Learning

3.3. Methodology

3.4. Model Description

3.5. Federated Transfer Learning Pipeline

3.5.1. Global Model Initialization

3.5.2. Clients Local Training

3.6. Model Transmission

Model Aggregation

3.7. Synchronization

4. Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI