Next Article in Journal
High-Precision Complex Orchard Passion Fruit Detection Using the PHD-YOLO Model Improved from YOLOv11n
Previous Article in Journal
The Effect of Additional Night and Pre-Harvest Blue and Red LEDs and White Lighting During the Day on the Morphophysiological and Biochemical Traits of Basil Varieties (Ocimum basilicum L.) Under Hydroponic Conditions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

FA-Unet: A Deep Learning Method with Fusion of Frequency Domain Features for Fruit Leaf Disease Identification

1
School of Rail Transportation, Shandong Jiaotong University, Jinan 250357, China
2
Institute of Automation, Chinese Academy of Sciences, Beijing 100000, China
3
PRIME-REL Electronics Technology Co., Ltd., Weihai 264315, China
*
Author to whom correspondence should be addressed.
Horticulturae 2025, 11(7), 783; https://doi.org/10.3390/horticulturae11070783
Submission received: 14 May 2025 / Revised: 19 June 2025 / Accepted: 26 June 2025 / Published: 3 July 2025
(This article belongs to the Section Plant Pathology and Disease Management (PPDM))

Abstract

In the recognition of fruit leaf diseases, image recognition technology based on deep learning has received increasing attention. However, deep learning models often perform poorly in complex backgrounds, and in some cases, they even outperform traditional algorithms. To address this issue, this paper proposes a Frequency-Adaptive Attention (FA-attention) mechanism that leverages the significance of frequency-domain features in fruit leaf disease regions. By enhancing the processing of frequency domain features, the recognition performance in complex backgrounds is improved. Specifically, FA-attention combines Fourier transform with the attention mechanism to extract frequency domain features as key features. Then, this mechanism is integrated with the Unet model to obtain feature maps strongly related to frequency domain features. These feature maps are fused with multi-scale convolutional feature maps and then used for classification. Experiments were conducted on the Plant Village (PV) dataset and the Plant Pathology (PP) dataset with complex backgrounds. The results indicate that the proposed FA-attention mechanism achieves significant effects in learning frequency domain features. Our model achieves a recognition accuracy of 99.91% on the PV dataset and 89.59% on the PP dataset. At the same time, the convergence speed is significantly improved, reaching 94% accuracy with only 20 epochs, demonstrating the effectiveness of this method. Compared with classical models and state-of-the-art (SOTA) models, our model performs better on complex background datasets, demonstrating strong generalization capabilities.

1. Introduction

Fruit is a vital economic crop in agriculture, but annual losses from fruit diseases can reach up to 30% [1]. Early disease prevention is crucial for plant protection, as it helps avoid significant yield losses before severe infection occurs. Most plant diseases first manifest visible symptoms on leaves, allowing for early detection through visual observation or computer vision techniques. Therefore, accurate recognition of fruit leaf diseases has become a key component in effective fruit disease management.
Researchers study vision-based fruit leaf disease recognition technology due to its cost-effectiveness and efficiency. A range of methods, including traditional image processing techniques [2], pattern recognition [3], and artificial intelligence (AI) technologies [4], have been explored by researchers. Among these, AI, particularly deep learning, has shown promising outcomes [5,6]. However, the performance of deep learning in fruit leaf disease recognition degrades under real-world conditions like uneven lighting, diverse viewpoints, and background interference. Addressing these challenges, researchers have proposed various improvement strategies.
In [7], the authors proposed a preprocessing method that combines denoising and hybrid contrast enhancement based on Convolutional Neural Networks (CNNs), which achieved a 99.4% accuracy rate when applied to fruit leaf disease classification. Reference [8] combines VGG16 with AlexNet to design a deep learning model for the classification and recognition of pepper leaf and fruit diseases, which accelerates the computational speed on the basis of performance improvement. In [9], the author designed an improved ResNetV2 to identify apple leaf diseases. By comparing it with various models, such as VGG16, InceptionV3, MobileNetV2, etc., the effectiveness of the improved network structure was proven. These methods mainly aim to improve the recognition accuracy for fruit leaf disease scenarios through data augmentation, preprocessing modules, and the design of improved deep learning models.
Meanwhile, researchers have discovered that through the refined processing and screening of deep features, they can extract features that are even more crucial to classification results. The literature [10] demonstrates that by extracting and integrating deep information from various scales, important semantic features can be provided. Subsequently, through embedded attention mechanisms, non-essential information is suppressed, and important information is enhanced to improve recognition accuracy. In deep information processing, strategies such as spatial and channel attention [11] are commonly employed to enhance critical information, thereby increasing the contribution of key features to classification results.
With the analysis of these deep features, researchers have found that some key deep features share similarities with those extracted by traditional image processing methods, such as texture features [12], morphological features [13], and frequency domain features [14].
In [15], the authors propose FcaNet, which integrates channel attention into the frequency domain, combining Discrete Cosine Transform (DCT) with deep learning models. This novel approach leverages frequency domain features to introduce a new deep learning model that has demonstrated strong performance on the ImageNet and COCO datasets. Considering the importance of frequency features in fruit disease analysis, we designed the FA-attention module, which integrates Fourier transform methods with attention mechanisms. This module autonomously extracts frequency domain features from images and combines them with the U-Net network to create the FA-Unet network. This approach emphasizes the key frequency domain features of diseased regions through feature selection to enhance accuracy. The main contributions of this paper are as follows:
  • By combining Fourier transform with the attention mechanism, we propose the FA-attention method and integrate it with the U-Net network. Compared to the baseline without this module, the accuracy is improved by 3.18%, and the model starts to converge as early as the 20th epoch, showing faster convergence.
  • Considering the fusion of frequency-domain features and multi-scale convolutional features, we developed AFF4 by extending the Attentional Feature Fusion (AFF) method. This module effectively integrates frequency-domain and multi-scale features, serving as the core component of our proposed FA-UNet network. Based on this architecture, we achieve significant improvements in segmentation accuracy.
  • The generalization ability of the model was validated on a dataset of complex background images, and based on the results, the process of frequency domain feature extraction in our model and the applicability of the FA-attention mechanism in complex background images were analyzed.

2. Materials and Methods

2.1. Experimental Data Preparation

The Plant Village (PV) dataset [16] and the Plant Pathology (PP) dataset [17,18] were chosen for our study. The Plant Village dataset is a public dataset of plant disease images, covering 14 plant species and 26 disease classes, with a total of 54,000 images. It includes common crops and fruits, such as maize, tomato, and apple. The dataset was manually collected, classified, and annotated by botanists and horticulturists, and it is widely used for plant disease research and agricultural monitoring. Due to the fact that the images in the PV dataset were taken in a laboratory setting, they are not sufficient for recognizing plant leaf diseases in real-world scenarios. Therefore, this study also selected the PP dataset to validate for generalization capability. The PP dataset is a competition dataset from FGVC7 and FGVC8, comprising 18,632 images of apple leaf diseases with complex backgrounds. The dataset composition used in this study is shown in Table 1.

2.2. Model Establishment

We employed Unet as the backbone to validate the proposed FA-attention module. Combining the structural characteristics of the Unet network, we designed an improved model called FA-Unet. Additionally, we incorporated a data fusion mechanism to enhance the integration of frequency domain features with deep convolutional features.
As shown in Figure 1, the framework of the method adopted in this study mainly includes the following parts: data input and transformation (step 1), data augmentation (step 2), frequency information extraction (step 3), feature extraction based on FA-Unet (step 4), feature fusion (step 5), and disease classification (step 6).
In Step 1, the original images were processed using Fast Fourier Transform (FFT). These raw images are then fed into the data augmentation module, while the FFT results are utilized for frequency-domain feature extraction.
In Step 2, to enhance the model’s ability to recognize various factors, such as different lighting and shooting angles in real-world data, we performed data augmentation on the original images. Techniques such as brightness adjustment, rotation, scaling, and Gaussian filtering were employed to expand the training dataset, improve sample diversity, and enhance the model’s generalization capability.
In Step 3, frequency domain distribution information was simultaneously introduced as an additional input into the model. We performed FFT transformation on the images to extract frequency domain distribution information. This information undergoes convolution operations at different scales and is introduced into four FA-attention modules.
In Step 4, the images processed through data augmentation were input into the FA-Unet network. Building upon the Unet architecture, the FA-Unet incorporates the FA-attention module between the contracting and expansive paths by operating on the correspondingly cropped feature maps from the contracting path during concatenation. This integration enhances the model’s ability to process frequency domain attention information, effectively extracting critical frequency domain features from images and adaptively adjusting the representation capacity of image features.
In Step 5, the feature maps outputted by the Unet expansive path are fed into an AFF4 feature fusion module for feature abstraction and fusion.
In Step 6, the fused features are then passed through fully connected layers and an output layer, where the number of neurons in the output layer corresponds to the number of disease categories. Applying the softmax function to the output layer’s outputs computes the probability scores for each category, determining the disease type to which the image belongs.

2.3. Data Augmentation

In real-world images, variations in lighting, shooting angles, and image clarity significantly influence recognition outcomes. Therefore, to enhance the model’s generalization capability, we performed data augmentation on the original images. The data augmentation methods employed in this design include random brightness adjustments, noise addition, rotation, and scaling.

2.4. Acquisition and Introduction of Frequency Domain Information

There are various frequency domain transformation methods for images, including Fourier transform, Laplace transform, and others. Among them, Fourier transform is a classical method widely used in image processing applications such as frequency domain filtering [19], frequency domain feature extraction [20], and edge detection [21]. In this study, Fourier transform was employed to convert images into the frequency domain. Subsequently, an attention mechanism was applied to automatically select frequency domain information that is more valuable for disease classification.
Fourier transform can be applied to both continuous and discrete signals. For images represented as discrete matrices, the transformation process can be described by the following formula:
F ( u , v ) = x = 0 M 1 y = 0 N 1 f ( x , y ) · e j 2 π u x M + v y N
In this formula, f ( x , y ) represents the pixel value of the image at position ( x , y ) , F ( u , v ) represents the transformed frequency domain representation, and M and N are the number of rows and columns of the image, respectively.
To achieve faster computation speed and more stable numerical performance, DFT (Discrete Fourier Transform) typically utilizes the FFT (Fast Fourier Transform) method. We denote a feature map matrix X of size H × W and perform FFT operations on it:
X F = F F T ( X )
After transforming, we divide the frequency domain matrix into 4 × 4 patches, essentially partitioning it into different frequency ranges. We then compute amplitude-frequency information within each of these 16 regions to derive frequency domain distribution details. Considering the multi-scale up-sampling process in UNet, we convolve these distribution details at different stages of up-sampling to accommodate multi-scale requirements. Then, we feed the results after multi-scale convolution into FA-attention modules at different stages, as shown in Figure 2.

2.5. FA-Attention

In the frequency domain, abrupt changes in pixel values in images typically manifest as high-frequency features, while regions with gradual pixel variations are characterized by low-frequency features. Therefore, common frequency domain filters such as low-pass, band-pass, and Butterworth filters are used to selectively process specific frequency regions in images. However, these methods rely on a limited number of image analyses or expert knowledge, which can be subjective and lack flexibility.
In this study, we propose an FA-attention module designed to autonomously select frequency domain data matrices, aiming to highlight key features that are more effective for classification outcomes.
As shown in Figure 3, the matrix X F transformed by FFT is multiplied by a set of self-trained weights W, thereby enhancing different frequency domain intervals to varying extents. This is represented by the following equation:
X F ( u , v ) = X F ( u , v ) W ( u , v )
where the ⊙ denotes the Hadamard product (element-wise multiplication). Each element of F ( u , v ) is multiplied by the corresponding element in W ( u , v ) , achieving enhancement or suppression in the frequency domain.
Then, the enhanced matrix is subjected to inverse FFT, and a new feature map Z is obtained, as shown in the following formula:
Z ( x , y ) = IFFT [ X F ( u , v ) ]
Due to the enhanced frequency domain features in the output feature map, we expect that during the training process, it will increasingly resemble the frequency characteristics of fruit leaf disease areas.

2.6. Feature Fusion Based on FA-Attention

The operation of the FA-attention module on matrix frequency domain features can be designed either at the input layer of the model to extract frequency domain features from the original image or in the deep part of the model to enhance the frequency domain features of the deep features. In this design, we have conducted a frequency domain feature enhancement operation, specifically targeting the concatenation between the contracting path and the expansive path across different scales.
As shown in Figure 4, we have added a feature fusion mechanism to the expansive path of the U-Net. This mechanism performs feature fusion on the feature maps obtained from different scales in the expansive path to enhance frequency-domain features under different receptive fields.
We designed the AFF4 feature fusion module based on the AFF feature fusion method proposed in reference [22], as well as the up-sampling method in the U-Net model’s expansive path.
Four different scales of feature maps are sequentially fused through the AFF4 module. In this mechanism, the feature maps from smaller scales are up-sampled to restore spatial resolution, addressing cross-scale feature propagation. These up-sampled features then pass through the AFF module for multi-scale fusion. The fused feature maps repeat this process and further integrate with higher-scale features.
By utilizing the AFF feature fusion mechanism, we optimize the original up-sampling method in the U-Net model, thereby enhancing the aggregation capability of the expansive path for features from different scales.

3. Experiments and Results

The experiment was conducted using Python 3.7 and the PaddlePaddle 2.4.1 deep learning framework. It was executed in a Linux environment on a system equipped with a Tesla V100 GPU, which has 32 GB of video memory, a 4-core CPU, and 32 GB of RAM.

3.1. Data Augmentation

This paper conducted training and testing on the PV dataset and PP dataset separately, comparing the model’s performance on datasets with simple and complex backgrounds. We split the dataset into training, validation, and test sets at a ratio of 0.8:0.1:0.1.
In this study, we used Gaussian filtering, brightness adjustment, rotation, and scaling methods, as shown in Figure 5. Figure 5a shows an original image of an apple leaf disease from the PP dataset. Figure 5b–d display the same image after Gaussian filtering, brightness adjustment, rotation, and scaling.
We exclusively applied data augmentation to the training set. Each original training image was transformed through three operations: Gaussian blur, scaling and rotation, and brightness adjustment. Through data augmentation, the training set expanded to four times its original size.

3.2. Training and Testing Parameter Settings

In our experiments, we set the batch size to 32, and we trained the model for 100 epochs. The CrossEntropyLoss function is used to assess the accuracy of predictions by measuring the discrepancy between the predicted probabilities and true labels.
We employed a combination of SGD (Stochastic Gradient Descent) and Adam optimizers to leverage their respective strengths. The training begins with the Adam optimizer using an initial learning rate of 0.001. During the training process, if the validation loss fails to decrease for 10 consecutive epochs, the optimizer switches to SGD with a fixed learning rate of 0.0001. A weight decay of 0.0005 is applied throughout the training to mitigate overfitting by penalizing large weights.

3.3. Training, Testing, and Generalization Ability Verification Experiments

We conducted training and testing using the PV dataset, and the results are presented in Figure 6. After 100 epochs of training, the proposed model successfully classified and identified various types of fruit leaf diseases. The model achieved an accuracy of 99.91% on the test set, with a loss of approximately 0.002. Notably, the model surpassed 99% accuracy around the 20th epoch, indicating strong convergence and stability throughout the training process.
As shown in Figure 7, we validated the generalization ability using the PP dataset, achieving a maximum accuracy of 89.59%.
As shown in Figure 8 and Figure 9, we used the t-Distributed Stochastic Neighbor Embedding (t-SNE) method to project the original data and the features extracted by our model into a two-dimensional space for both datasets. The t-SNE visualization analysis indicates that the chaotic distribution of the original signals improved significantly after feature extraction by our model, resulting in better classification performance. Similarly, on the PP dataset, the different categories of data were also well differentiated.
As shown in Figure 10, we plotted the testing results in a confusion matrix. Our model achieved 99.91% test accuracy on the simple-background PV dataset. It maintained a robust 89.59% accuracy on the significantly more challenging complex-background PP dataset. Notably, the model showed reduced effectiveness for categories with limited training samples, a limitation we plan to address in future improvements.

3.4. Ablation Experiment

To validate the effectiveness of each component of our model, we designed ablation experiments for various modules of the model. The ablation experiments include Unet lacking data augmentation, Unet with data augmentation, Unet with data augmentation and FA-attention, and Unet with data augmentation, FA-attention, and feature fusion. The experimental results are shown in Table 2.
Based on the ablation experiment results, we found that data augmentation contributes significantly to accuracy improvement, especially for datasets with complex backgrounds, enhancing accuracy by 0.49% and 9.18% on the PV and PP datasets, respectively. FA-attention has a substantial effect on increasing model accuracy, achieving 2.18% and 8.19% improvements on the PV and PP datasets, respectively. After incorporating feature fusion, there was a notable performance boost on the PP dataset, reaching 1.56%, while the PV dataset only saw a 0.51% improvement.

3.5. Comparative Experiment and Analysis

We conducted comparative experiments using both classic models and state-of-the-art (SOTA) models. The classic models selected for fruit leaf disease classification include fine-tuned Darknet53 [7], a VGG16-based model [23], EfficientNet [24], and CNN [25]. The SOTA models include a YOLOv5-CA backbone [26], PDDNET-LVE [27], DINOV2 [28], and HSSNet [29].
We reproduced Darknet53, VGG16, EfficientNet, YOLOv5-CA, PDDNet-LVE, and HSSNet, conducting training and testing experiments on both the PV and PP datasets. We cite the published results of the CNN and DINOV2 models, which were only evaluated on the PV dataset.
The experimental results are shown below.
As shown in Table 3, for the PV dataset, where the images were taken in a controlled experimental environment with a uniform background, all models achieved high accuracy, with our proposed model also attaining 99.91%. In the PP dataset, which has a complex background, our proposed model achieved an accuracy of 89.59%, outperforming both the classic and SOTA models. It should also be noted that in experiments conducted on the simple-background PV datasets, HSSNet achieved a 0.02% higher accuracy than our model. However, on the complex-background PP datasets, it underperformed against our model by 1.01%. Our model exhibits superior comprehensive performance when handling complex backgrounds.
This demonstrates that our proposed model has a notable advantage in generalization capability. Based on the results of training, testing, generalization validation, and comparative experiments, our model not only achieved high accuracy on the PV dataset but also showed better generalization performance on the PP dataset. Moreover, during training, our model exhibited faster convergence. This highlights the importance of frequency domain features for expressing the key characteristics of diseases and their invariance under complex backgrounds.

4. Analysis and Discussion

4.1. Analysis of Frequency Domain Feature Invariance for Leaf Disease

We selected fruit leaves under different rotation angles, lighting conditions, translations, and scaling, and we used the proposed model to predict them. We then extracted and visualized the feature maps after FFT, and the feature maps were output by FA-attention.
Figure 11a shows an image from the PV dataset, which has been subjected to Gaussian noise addition, brightness adjustment, rotation, and scaling, respectively. Figure 11b shows the results of FFT visualization obtained by performing FFT transformation on the feature map matrix. It can be observed that when the target fruit leaf undergoes translation or rotation, there is only a minor change in its frequency domain features. Figure 11c displays the attention heatmap generated by overlaying the FA-attention output feature map matrix with the original image. The figure demonstrates that as the original image undergoes rotation, scaling, or brightness adjustment, the focus of attention does not change significantly and remains concentrated on the diseased areas of the leaf.
In fact, when an image is rotated, only the coordinates undergo a certain rotation, and this change has almost negligible impact on the frequency domain features. This enables our model to better cope with variations in factors such as rotation, scaling, and lighting in complex background images.
Therefore, we chose frequency domain transformation combined with the attention mechanism to automatically select key frequency domain features, which serve as the core module of our proposed mode. This forms the core module of our proposed model, which can counteract factors such as rotation, translation, and the scaling of images in real-world scenarios, thus performing well even in complex backgrounds.

4.2. Attention Analysis of Disease Regions in Complex Backgrounds

We selected images with complex backgrounds, extracted deep feature maps from the model, and visualized them to analyze the attention on fruit leaf disease areas. We used the classic self-attention module from the transformer model and compared it with FA-attention, as shown in Figure 12.
Images with complex backgrounds typically have higher resolutions, and the fruit disease areas exhibit more fine-grained effects within the image. Moreover, complex background images are more susceptible to interference from background lighting, edges, and other leaves in the background, which makes the classification of such images more challenging.
In our comparison experiment, We selected four images with complex backgrounds for comparison. These four images each presented challenging conditions, such as lighting shadow interference, cluttered background leaves, leaf occlusion and curling, and background light spot interference. The comparison results show that the proposed model demonstrates stronger attention toward the disease areas when dealing with complex backgrounds.

4.3. Discussion

Frequency domain features, as a key characteristic, play an important role in fields such as digital signal processing [30] and image recognition [31]. In fruit leaf disease areas, they often exhibit significant frequency domain specificity [32] and exhibit a certain invariance under complex scene variations (see Figure 11). If an attention mechanism can be designed to select specific frequency domain features corresponding to the disease area from the frequency domain information, it would have a positive impact on the recognition of fruit leaf diseases.
This study combines frequency domain transformation and attention mechanisms to design the FA-attention mechanism. This mechanism enhances frequency domain attention and guides the model to focus on the typical frequency domain features of fruit leaf disease areas, thereby giving more attention to the fruit disease regions (see Figure 11 and Figure 12).
To further enhance the guiding effect of this mechanism on deep learning models, we selected U-Net as the backbone network and embedded the FA-attention mechanism into the skip connection section. In the feature concatenation of the corresponding layers in the contracting and expansive paths, the extraction and fusion of frequency domain features are enhanced. Meanwhile, to address the issue of feature resolution fusion in each layer of the expansive path, we designed a feature fusion module based on AFF, fully utilizing the structural characteristics of the multi-scale features of U-Net.
We conducted training and testing experiments (see Figure 7 and Figure 10), ablation experiments (see Table 2), and comparison experiments (see Table 3) on the simple-background PV dataset and the complex-background PP dataset, further validating the effectiveness of this mechanism. At the same time, we also identified the limitations of this method when dealing with small sample categories, which will become the focus of our next research steps.
Through a series of experiments in this study, we found that frequency domain features are more effective than other features when dealing with factors such as lighting changes, angle variations, and interference from a complex background.

5. Conclusions

This study focuses on the problem of leaf disease recognition against complex backgrounds. Considering that the frequency-domain features exhibit greater invariance compared to spatial-domain features in complex scenarios, we propose an FA-attention mechanism to improve recognition accuracy. We employ UNet as the backbone network, integrate AFF4 for feature fusion, and incorporate the FA-attention mechanism to design the FA-UNet architecture.
Experimental validation utilized both the simple-background Plant Village (PV) dataset and complex-background Plant Pathology (PP) dataset, achieving 99.91% and 89.59% accuracy, respectively. To enable deep analysis, we conducted ablation experiments, comparative experiments, and frequency-domain attention visualization analysis. The experimental results demonstrate that while our model achieves high accuracy on simple-background datasets, it exhibits even stronger performance in complex-background scenarios.
Overall, this study combines Fourier transform with deep learning methods, utilizing the attention mechanism to focus on key frequency domain features, along with multi-scale feature fusion, which enables the model to perform better on complex background datasets. In future work, we will collect more complex scene data and fully consider methods to address small sample categories, further optimizing the model.

Author Contributions

Conceptualization, X.L. and Z.L.; methodology, Z.L. and X.L.; software, Z.L.; validation, Z.L. and S.G.; investigation, W.W. and X.L.; data curation, W.W. and F.Z.; writing—original draft preparation, X.L. and Z.L.; writing—review and editing, Z.L. and W.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Shandong Provincial Key R&D Program, Grant No. 2024TSGC0932.

Data Availability Statement

Data is available upon request due to privacy.

Conflicts of Interest

Author Wenliang Zhang was employed by the company PRIME-REL Electronics Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Li, Z.; Paul, R.; Ba Tis, T.; Saville, A.C.; Hansel, J.C.; Yu, T.; Ristaino, J.B.; Wei, Q. Non-invasive plant disease diagnostics enabled by smartphone-based fingerprinting of leaf volatiles. Nat. Plants 2019, 5, 856–866. [Google Scholar] [CrossRef] [PubMed]
  2. Iqbal, Z.; Khan, M.A.; Sharif, M.; Shah, J.H.; ur Rehman, M.H.; Javed, K. An automated detection and classification of citrus plant diseases using image processing techniques: A review. Comput. Electron. Agric. 2018, 153, 12–32. [Google Scholar] [CrossRef]
  3. Nagasubramanian, G.; Sakthivel, R.K.; Patan, R.; Sankayya, M.; Daneshmand, M.; Gandomi, A.H. Ensemble Classification and IoT-Based Pattern Recognition for Crop Disease Monitoring System. IEEE Internet Things J. 2021, 8, 12847–12854. [Google Scholar] [CrossRef]
  4. Yağ, İ.; Altan, A. Artificial Intelligence-Based Robust Hybrid Algorithm Design and Implementation for Real-Time Detection of Plant Diseases in Agricultural Environments. Biology 2022, 11, 1732. [Google Scholar] [CrossRef]
  5. Liu, J.; Wang, X. Plant diseases and pests detection based on deep learning: A review. Plant Methods 2021, 17, 22. [Google Scholar] [CrossRef] [PubMed]
  6. Chen, Y.; Huang, Y.; Zhang, Z.; Wang, Z.; Liu, B.; Liu, C.; Huang, C.; Dong, S.; Pu, X.; Wan, F.; et al. Plant image recognition with deep learning: A review. Comput. Electron. Agric. 2023, 212, 108072. [Google Scholar] [CrossRef]
  7. Rehman, S.; Khan, M.; Alhaisoni, M.; Armghan, A.; Alenezi, F.; Alqahtani, A.; Vesal, K.; Nam, Y. Fruit Leaf Diseases Classification: A Hierarchical Deep Learning Framework. Comput. Mater. Contin. 2023, 75, 1179–1194. [Google Scholar] [CrossRef]
  8. Bezabh, Y.A.; Salau, A.O.; Abuhayi, B.M.; Mussa, A.A.; Ayalew, A.M. CPD-CCNN: Classification of pepper disease using a concatenation of convolutional neural network models. Sci. Rep. 2023, 13, 15581. [Google Scholar] [CrossRef]
  9. Alsayed, A.; Alsabei, A.; Arif, M. Classification of Apple Tree Leaves Diseases using Deep Learning Methods. Int. J. Comput. Sci. Netw. Secur. 2021, 21, 324–330. [Google Scholar] [CrossRef]
  10. Dai, G.; Tian, Z.; Fan, J.; Sunil, C.K.; Dewi, C. DFN-PSAN: Multi-level deep information feature fusion extraction network for interpretable plant disease classification. Comput. Electron. Agric. 2024, 216, 108481. [Google Scholar] [CrossRef]
  11. Nawaz, M.; Nazir, T.; Javed, A.; Tawfik Amin, S.; Jeribi, F.; Tahir, A. CoffeeNet: A deep learning approach for coffee plant leaves diseases recognition. Expert Syst. Appl. 2024, 237, 121481. [Google Scholar] [CrossRef]
  12. Anitha, K.; Srinivasan, S. Feature Extraction and Classification of Plant Leaf Diseases Using Deep Learning Techniques. Comput. Mater. Contin. 2022, 73, 233–247. [Google Scholar] [CrossRef]
  13. Yang, B.; Wang, Z.; Guo, J.; Guo, L.; Liang, Q.; Zeng, Q.; Zhao, R.; Wang, J.; Li, C. Identifying plant disease and severity from leaves: A deep multitask learning framework using triple-branch Swin Transformer and deep supervision. Comput. Electron. Agric. 2023, 209, 107809. [Google Scholar] [CrossRef]
  14. Xu, P.; Fu, L.; Xu, K.; Sun, W.; Tan, Q.; Zhang, Y.; Zha, X.; Yang, R. Investigation into maize seed disease identification based on deep learning and multi-source spectral information fusion techniques. J. Food Compos. Anal. 2023, 119, 105254. [Google Scholar] [CrossRef]
  15. Qin, Z.; Zhang, P.; Wu, F.; Li, X. FcaNet: Frequency Channel Attention Networks. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 763–772. [Google Scholar] [CrossRef]
  16. Hughes, D.P.; Salathe, M. An open access repository of images on plant health to enable the development of mobile disease diagnostics. arXiv 2015, arXiv:1511.08060. [Google Scholar] [CrossRef]
  17. Kaeser-Chen, C.; Pathology, F.; Maggie; Dane, S. Plant Pathology 2020—FGVC7. Available online: https://kaggle.com/competitions/plant-pathology-2020-fgvc7 (accessed on 1 November 2024).
  18. Thapa, R.; Zhang, K.; Snavely, N.; Belongie, S.; Khan, A. Plant Pathology 2021–FGVC8. Available online: https://kaggle.com/competitions/plant-pathology-2021-fgvc8 (accessed on 1 November 2024).
  19. Patanè, G. Fourier-Based and Rational Graph Filters for Spectral Processing. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 7063–7074. [Google Scholar] [CrossRef]
  20. Moon, J.H.; Lee, G.; Lee, S.M.; Ryu, J.; Kim, D.; Sohn, K.A. Frequency Domain Deep Learning With Non-Invasive Features for Intraoperative Hypotension Prediction. IEEE J. Biomed. Health Inform. 2024, 28, 5718–5728. [Google Scholar] [CrossRef]
  21. Wu, Q.Y.; Yang, J.Z.; Hong, J.Y.; Meng, Z.; Zhang, A.N. An edge detail enhancement strategy based on Fourier single-pixel imaging. Opt. Lasers Eng. 2024, 172, 107828. [Google Scholar] [CrossRef]
  22. Dai, Y.; Gieseke, F.; Oehmcke, S.; Wu, Y.; Barnard, K. Attentional Feature Fusion. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2021; pp. 3559–3568. [Google Scholar] [CrossRef]
  23. Sujatha, R.; Krishnan, S.; Chatterjee, J.M.; Gandomi, A.H. Advancing plant leaf disease detection integrating machine learning and deep learning. Sci. Rep. 2025, 15, 11552. [Google Scholar] [CrossRef]
  24. Sandhya Devi, R.S.; Vijay Kumar, V.R.; Sivakumar, P. EfficientNetV2 Model for Plant Disease Classification and Pest Recognition. Comput. Syst. Sci. Eng. 2022, 45, 2249–2263. [Google Scholar] [CrossRef]
  25. Xiao, J.R.; Chung, P.C.; Wu, H.Y.; Phan, Q.H.; Yeh, J.L.A.; Hou, M.T.K. Detection of Strawberry Diseases Using a Convolutional Neural Network. Plants 2021, 10, 31. [Google Scholar] [CrossRef] [PubMed]
  26. Zhang, Z.; Qiao, Y.; Guo, Y.; He, D. Deep Learning Based Automatic Grape Downy Mildew Detection. Front. Plant Sci. 2022, 13, 872107. [Google Scholar] [CrossRef] [PubMed]
  27. Shafik, W.; Tufail, A.; De Silva Liyanage, C.; Apong, R.A.A.H.M. Using transfer learning-based plant disease classification and detection for sustainable agriculture. BMC Plant Biol. 2024, 24, 136. [Google Scholar] [CrossRef]
  28. Bai, C.; Zhang, L.; Gao, L.; Peng, L.; Li, P.; Yang, L. DINOV2-FCS: A model for fruit leaf disease classification and severity prediction. Front. Plant Sci. 2024, 15, 1475282. [Google Scholar] [CrossRef]
  29. Gao, X.; Tang, Z.; Deng, Y.; Hu, S.; Zhao, H.; Zhou, G. HSSNet: A End-to-End Network for Detecting Tiny Targets of Apple Leaf Diseases in Complex Backgrounds. Plants 2023, 12, 2806. [Google Scholar] [CrossRef] [PubMed]
  30. Sun, J.; Chang, J.; Wei, Y.; Lin, S.; Wang, Z.; Mao, M.; Wang, F.; Zhang, Q. Feature Domain Transform Filter for the Removal of Inherent Noise Bound to the Absorption Signal. Anal. Chem. 2022, 94, 14290–14298. [Google Scholar] [CrossRef]
  31. Xiong, G.; Wang, F.; Yu, W.; Truong, T.K. Singularity-Exponent-Domain Image Feature Transform. IEEE Trans. Image Process. A Publ. IEEE Signal Process. Soc. 2021, 30, 8510–8525. [Google Scholar] [CrossRef]
  32. Lin, H.; Tse, R.; Tang, S.K.; Qiang, Z.; Pau, G. Few-Shot Learning for Plant-Disease Recognition in the Frequency Domain. Plants 2022, 11, 2814. [Google Scholar] [CrossRef]
Figure 1. Overview of the proposed method.
Figure 1. Overview of the proposed method.
Horticulturae 11 00783 g001
Figure 2. Frequency domain information. (a) Original image. (b) Frequency spectrum after FFT. (c) Three-dimensional visualization of the frequency spectrum. (d) Spectrum density plot. (e) Spectrum density plot simplified to 16 frequency intervals.
Figure 2. Frequency domain information. (a) Original image. (b) Frequency spectrum after FFT. (c) Three-dimensional visualization of the frequency spectrum. (d) Spectrum density plot. (e) Spectrum density plot simplified to 16 frequency intervals.
Horticulturae 11 00783 g002
Figure 3. FA-attention mechanism.
Figure 3. FA-attention mechanism.
Horticulturae 11 00783 g003
Figure 4. Feature Fusion.
Figure 4. Feature Fusion.
Horticulturae 11 00783 g004
Figure 5. Data augmentation. (a) Original image. (b) Gaussian filtering. (c) Brightness adjustment. (d) Rotation and scaling.
Figure 5. Data augmentation. (a) Original image. (b) Gaussian filtering. (c) Brightness adjustment. (d) Rotation and scaling.
Horticulturae 11 00783 g005
Figure 6. Training loss and test accuracy on the PV dataset.
Figure 6. Training loss and test accuracy on the PV dataset.
Horticulturae 11 00783 g006
Figure 7. Training loss and test accuracy on the PP dataset.
Figure 7. Training loss and test accuracy on the PP dataset.
Horticulturae 11 00783 g007
Figure 8. T-SNE for feature visualization in the PV dataset. (a) Original data. (b) Results of feature extraction by our model.
Figure 8. T-SNE for feature visualization in the PV dataset. (a) Original data. (b) Results of feature extraction by our model.
Horticulturae 11 00783 g008
Figure 9. T-SNE for feature visualization in the PP dataset. (a) Original data. (b) Results of feature extraction by our model.
Figure 9. T-SNE for feature visualization in the PP dataset. (a) Original data. (b) Results of feature extraction by our model.
Horticulturae 11 00783 g009
Figure 10. Confusion matrix. (a) Confusion matrix for experiments on the PV dataset. (b) Confusion matrix for experiments on the PP dataset.
Figure 10. Confusion matrix. (a) Confusion matrix for experiments on the PV dataset. (b) Confusion matrix for experiments on the PP dataset.
Horticulturae 11 00783 g010
Figure 11. FFT visualization and feature maps after FA-attention. (a) Original image. (b) FFT visualization. (c) Attention heatmap of feature maps after FA-attention.
Figure 11. FFT visualization and feature maps after FA-attention. (a) Original image. (b) FFT visualization. (c) Attention heatmap of feature maps after FA-attention.
Horticulturae 11 00783 g011
Figure 12. FA-attention visualized. (a) Original Image with complex-background. (b) Attention heatmap of feature maps after self-attention. (c) Attention heatmap of feature maps after FA-attention.
Figure 12. FA-attention visualized. (a) Original Image with complex-background. (b) Attention heatmap of feature maps after self-attention. (c) Attention heatmap of feature maps after FA-attention.
Horticulturae 11 00783 g012
Table 1. Dataset preparation used for classification.
Table 1. Dataset preparation used for classification.
Class LabelDisease TypeNumber of Simple-Background Samples
(PV Dataset)
Number of Complex-Background Samples
(PP Dataset)
0apple, black rot621-
1apple, cedar rust2751860
2apple, scab6304826
3apple, health16454624
4apple, frog eye leaf spot-3181
5apple, powdery mildew-1184
6apple, powdery mildew complex-1602
7cherry, powdery mildew1052-
8cherry, healthy854-
9grape, black rot1180-
10grape, black Measles1383-
11grape, blight Spot1076-
12grape, healthy423-
13peach, bacterial spot2297-
14peach, healthy360-
Table 2. Ablation experiment.
Table 2. Ablation experiment.
NO.ModelTest Accuracy 
(PV Dataset)
Generalization Validation Accuracy 
(PP Dataset)
Params
0Unet96.73%70.66%18,473,100
1Unet + Data Argumentation97.22%79.84%18,473,100
2Unet + Data Argumentation + FA-attention99.40%88.03%18,998,740
3Unet + Data Argumentation + FA-attention + Feature Fusion99.91%89.59%18,474,452
Table 3. Performance comparison of the methods.
Table 3. Performance comparison of the methods.
ModelMethodAccuracy
(PV Datasets)
Accuracy
(PP Datasets)
Darknet [7]fine-tuned Darknet5399.31%85.10%
VGG16 [23]VGG16-based model99.79%81.27%
EfficientNet [24]EfficientNet97.50%80.69%
CNN [25]Customized CNN99.60%-
YOLOv5-CA [26]YOLOv5-CA backbone with coordinate attention99.81%89.55%
PDDNet-LVE [27]PDDNet with lead voting ensemble99.70%85.11%
DINOV2 [28]DINOV2 with Class-Patch Feature Fusion Module99.67%-
HSSNet [29]YOLOv7 backbone with H-SimAM feature fusion and SP-BiFormer block99.93%88.58%
ProposedUnet with FA-attention and feature fusion99.91%89.59%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, X.; Wu, W.; Zhu, F.; Guan, S.; Zhang, W.; Li, Z. FA-Unet: A Deep Learning Method with Fusion of Frequency Domain Features for Fruit Leaf Disease Identification. Horticulturae 2025, 11, 783. https://doi.org/10.3390/horticulturae11070783

AMA Style

Li X, Wu W, Zhu F, Guan S, Zhang W, Li Z. FA-Unet: A Deep Learning Method with Fusion of Frequency Domain Features for Fruit Leaf Disease Identification. Horticulturae. 2025; 11(7):783. https://doi.org/10.3390/horticulturae11070783

Chicago/Turabian Style

Li, Xiaowei, Wenlin Wu, Fenghua Zhu, Shenhao Guan, Wenliang Zhang, and Zheng Li. 2025. "FA-Unet: A Deep Learning Method with Fusion of Frequency Domain Features for Fruit Leaf Disease Identification" Horticulturae 11, no. 7: 783. https://doi.org/10.3390/horticulturae11070783

APA Style

Li, X., Wu, W., Zhu, F., Guan, S., Zhang, W., & Li, Z. (2025). FA-Unet: A Deep Learning Method with Fusion of Frequency Domain Features for Fruit Leaf Disease Identification. Horticulturae, 11(7), 783. https://doi.org/10.3390/horticulturae11070783

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop