1. Introduction
With the deepening of agricultural mechanization, people can produce enough economic crops to meet daily needs. However, the production safety of crops is still threatened by many factors such as climate change, plant diseases, etc. [
1]. Plant diseases, in particular, can seriously affect the agricultural economy, with mild cases affecting only a few plants and severe cases resulting in a significant decrease in crop yield across entire fields [
2]. Traditional methods for detecting plant diseases rely on on-site judgments by agricultural experts or identification by farmers based on their own experience. Conventional subjective methods are burdensome and require significant labor, while inexperienced farmers may make misjudgments and indiscriminately use pesticides, which leads to the escalation of disease spread. Environmental factors and planting methods also easily interfere with growers’ recognition of diseases. To tackle these challenges, the use of deep learning technology for automatic plant disease recognition has emerged as a prominent agricultural trend. Quick, accurate, and effective identification of plant diseases and pests is crucial for improving crop yield and quality.
Researchers have proposed various techniques for detecting and classifying plant diseases in practical agricultural applications [
3,
4]. For example, [
5] used pre-processing segmented lesion regions from diseased leaves. After completing feature extraction, they combined stepwise discriminant analysis, Bayesian discriminant analysis, principal component analysis, and Fisher discriminant analysis to classify strawberry disease leaves. The classification accuracy of this method reaches 94.71%. Choi et al. [
6] classified five apple foliage diseases using eight features (color, texture, shape) with a BP neural network, which achieved a 92.6% average recognition accuracy. The above two methods adopt traditional manual feature extraction methods such as principal component analysis and texture feature extraction. Artificial feature extraction methods can achieve very good results on specific problems. But it also has obvious drawbacks such as requiring experts to invest a lot of time and effort in designing and adjusting features; the process is easily influenced by expert experience, and important features may be overlooked. These disadvantages have greatly limited the application of artificial feature extraction methods.
Deep-learning techniques have made remarkable advancements in plant disease recognition research in recent years. Dwivedi et al. [
7] employed an attention mechanism within a residual module to effectively learn disease features and detect three diseases in tomato plants using the PlantVillage dataset. Notably, their approach achieved an outstanding detection accuracy of 98% across the three types of diseases. Luo et al. [
8] used the residue network to identify maize diseases and used the TEL-Resnet network to identify and classify different leaves with pests and diseases. Ferentinos et al. [
9] showcase GLDDN, a grape leaf disease detection network that utilizes dual attention mechanisms for feature evaluation, detection, and classification. Experimental results on the dataset demonstrate its superiority by achieving a 99.93% accuracy for three types of grape leaf disease detection. In the work of Gadekallu et al. [
10], a hybrid-principal component analysis–whale optimization algorithm was employed to extract essential features from the dataset. These features were then input into a deep neural network for tomato disease classification, resulting in an accuracy of 94% for tomato disease recognition. Zhang et al. [
11] proposed using a dual attention semantic segmentation network for corn recognition. The average cross-parallel ratio and average pixel recognition accuracy of the model are improved. In addition, deep learning generalization and regularization are also important methods for improving recognition accuracy [
12]. Zheng et al. [
13] used MobileViT, a lightweight neural network, to complete real-time automatic modulation classification. This method has strong robustness and high accuracy.
Traditional artificial feature extraction and deep learning-based methods are currently the most commonly used methods for plant disease recognition. They can achieve good results when applied to specific problems, but there are still some difficulties in practical applications: (1) The problem with training datasets. The performance of deep learning frameworks largely depends on the quality of a training dataset. High-quality datasets improve the accuracy and robustness of models by providing comprehensive and diverse data, thereby achieving better generalization and reducing bias in deep learning models. Collecting images and creating datasets require a significant amount of manpower and resources. (2) Noise interference problem. There are often many interfering factors in images taken from nature. These interference factors make it difficult to extract disease features from images. When training directly with these pictures, there are problems with learning disabilities and low recognition accuracy.
Therefore, in order to improve the speed and accuracy of plant disease identification, it is necessary to optimize existing identification methods. Image enhancement methods can solve the problems of training datasets and noise interference encountered during the recognition process. We can improve the accuracy of feature extraction by removing noise from images through image segmentation and enhancement. The dataset can also be expanded through methods such as rotation, brightness adjustment, and perspective transformation.
Images collected from nature commonly exhibit blurred features due to lighting issues. Therefore, before inputting images into the training model, the images must be processed. The processed image should have more prominent disease features to improve recognition accuracy. The Retinex method is the most classic image enhancement method, which enhances the contrast of images through logarithmic operations and Gaussian filtering [
14]. On this basis, multi-scale analysis techniques are introduced to remove background noise. Wang et al. [
15] proposed an underwater image enhancement framework based on metalantis. Their method is divided into three stages. This method first performs virtual underwater image synthesis and then estimates the depth map of the underwater image. Finally, reinforcement learning is used for underwater image enhancement. Jin et al. [
16] introduced a new zero reference color self-calibration framework for enhancing low-light images. It effectively emphasizes the channel representation containing fine-grained color information, which achieves natural results in a progressive manner.
Currently, most researchers performing tomato plant disease classification only use image datasets from laboratory environments and do not include data from other domains. Moreover, the imbalanced distribution of samples in the dataset can hinder model generalization and accuracy, resulting in suboptimal performance on real-world test data. To address this issue, data augmentation techniques like flipping and rotation are extensively employed to enrich datasets and improve model robustness. However, when sample data are insufficient, traditional image enhancement techniques cannot generate new image features within a category. To enable models to learn new features from enhanced images, generative adversarial networks (GANs) have been proposed for application. The GANs still have many limitations such as the inability to generate changes for specific targets.
Ref. [
17] used several basic image enhancement techniques, such as image rotation, brightness adjustment, perspective transformation, and affine transformation, to enhance their dataset and train the model to achieve better results. Using these base image augmentation techniques tends to generate highly correlated samples. These techniques are usually only applied to one image at a time, and overuse may lead to overfitting, making the enhanced image features lack variability.
The generative adversarial network (GAN) proposed by [
18] has demonstrated remarkable capabilities in diverse image synthesis tasks. GANs unlock additional information about image features, resulting in more varied generated images and subsequently enhancing the performance of deep learning models. The primary learning goal of generative adversarial networks is to create synthetic samples with similar feature distributions to the training images.
Ref. [
19] used DCGAN to increase the original dataset by two times to verify whether the GAN network could help improve the learning capabilities of the model. They trained on the InceptionV3 model, and they found that the accuracy of the enhanced dataset was improved by about 20%, compared with the original dataset. They show that GANsen-based data augmentation is beneficial for enhancing model learning performance.
Ref. [
20] used Cycle-GAN to convert healthy apple images into diseased fruit images so that their proposed plant disease detection model improved by about 5% in the F1-score. Although Cycle-GAN has achieved excellent performance in image synthesis and style transfer, there is no mechanism to label specific objects in the image to be transformed, so the generated images may contribute little to the robustness of the model in plant disease diagnosis. This means that background regions that people are not interested in may also change together during this process.
In addition, some researchers have expanded the existing large datasets to alleviate the problem of insufficient data. Nevertheless, a model’s performance could be limited owing to the significant disparity between the collected natural environment pictures and the existing laboratory environment pictures and the notable distinction between the background environments in the field and laboratory settings [
21].
Summarily, traditional image augmentation techniques lack feature changes and tend to generate highly correlated examples, which can lead to overfitting during training. However, with generative adversarial networks, existing studies are enhanced in the same category so that the new features after enhancement are limited. Based on Cycle-GAN, we introduce a segmentation algorithm so that the network can realize the transformation of specified regions of different categories of images. Thus, how to add as much obvious feature information as possible in images is a major challenge for image enhancement methods, and also a key problem for improving recognition accuracy.
In order to solve the above problems, we introduce a segmentation algorithm based on Cycle-GAN so that the network can realize the transformation of the specified region of different categories of images. Our study presents a deep learning framework for learning image features associated with tomato diseases through the training of a dataset that comprises tomato disease images. Through this approach, even in remote areas, farmers can easily capture leaf images that may be affected by diseases, using only mobile devices to determine the type of disease without relying on professional technicians. In our proposed method, we use techniques such as image augmentation, image segmentation, and image transformation to enhance the dataset and then train the pre-trained MobileNet [
22] model on these enhanced datasets to separate multiple diseases from leaf images. Our contribution can be summarized as follows:
(1) This paper presents an automatic leaf segmentation algorithm (AISG) based on EISeg, which is designed to address the color characteristics of crop leaves in real-world environments. The AISG algorithm effectively removes environmental noise from leaves, enhancing the practical performance of CNN and Cycle-GAN networks.
(2) We introduce a leaf segmentation algorithm based on the Cycle-GAN network so that the network can realize the transformation of specific regions in the image. This image enhancement method realizes the transformation between different categories, enhances the image features, and has a particular application value.
(3) We collected crop images in the natural setting to augment the PlantVillage dataset, creating a new dataset. Transfer learning was employed for training the models. The capability of the method was evaluated through comparative experiments, thus validating its performance and potential for crop disease recognition.
The study is structured as follows: 
Section 2 introduces the image enhancement methods and the classification model system used. 
Section 3 provides details on the dataset, experimental design, result analysis, and comparisons with other methods. 
Section 4 presents the conclusion and future work. 
Section 4 is the conclusion and future work of this paper.
  2. Methodology
We propose a cash crop disease recognition system to overcome the challenges arising from complex background compositions in crop leaf images taken in field environments, the diverse range of crops, and the uneven distribution of samples in existing disease datasets. The method proposed in this paper combines the advantages of image processing and deep learning techniques. This method can solve the constraints of complex backgrounds, uneven samples, and other factors, thereby achieving accurate disease identification. The flowchart of the proposed method is shown in 
Figure 1.
Our work can be divided into three main parts. Firstly, an automatic segmentation method of crop leaves based on the EISeg tool [
23] was used to separate the disease leaves from the environmental background to facilitate the model in learning the significant features of the disease. Then, the Cycle-GAN network is used to realize the mutual transformation between different categories, and the small sample data are enhanced to improve the model’s generalization ability. Finally, the pre-trained MobileNet model was fine-tuned to realize the classification of crop diseases.
  2.1. The AISG Algorithm
In the process of image acquisition in practical applications, there will inevitably be confusion about features of other objects in the images, such as the hands of the photographer, non-diseased leaves, soil, etc., which will significantly impact the classification results of the model. Hence, performing image segmentation is essential as it mitigates the impact of the objects above on the model’s discrimination, reduces computational overhead, and accelerates the classification process for each image. Based on the EISeg interactive segmentation method, aiming at its lack of human–computer interaction requirements, this paper proposes an improved EISeg interactive segmentation method.
EISeg is a point-and-click interactive segmentation method. In practical applications, click-based methods commonly utilize positive and negative categories of user clicks. Positive clicks highlight the target object (foreground), while harmful clicks separate non-target regions (background). This method only requires a few clicks to complete the object segmentation task. 
Figure 2 is an example of this method. The EISeg algorithm has simple steps, but it requires someone to help click to complete the segmentation. So, the EISeg algorithm does not conform to the current trend, which greatly limits its application. The green dots in the picture represent positive clicks, indicating that the selected part is the foreground. The red dots represent negative clicks, indicating that the user has selected the background section.
Regarding this topic, we add some image processing steps before calling the EISeg model to make up for the lack of manual interaction of the EISeg model so that it can meet the needs of automatic, practical application. Before feeding the crop disease image into the model, we use the super-green factor to replace the manual calibration of the leaf and background pixels. For most crop leaves, the green component (G) in the color space (RGB) is much larger than the other components. According to this feature, during the initialization of the mask picture, the picture is processed by the green factor (2G-R-B), and the pixels whose super-green factor is inferior to the Border T are marked as the background. Otherwise, they are marked as the foreground. However, if only the super-green element is used for calibration, it will cause missegmentation when facing the background of many leaves, as shown in 
Figure 3.
Building upon the preceding analysis, this study will propose the following steps to enhance the segmentation method. The algorithm’s schematic diagram is depicted in 
Figure 4:
Step 1: Use the rectangle function of OpenCV to select the region of the target image. The rectangle’s dimensions are decreased by one-fourth of the original image’s side length, and an approximate plan for the target object’s location is devised.
Step 2: Using the super-green factor to determine the inside and outside of the rectangle, the green pixels outside the box are denoted as the possible background, and the non-green pixels are indicated as the background. In the inside of the rectangle, the points representing the foreground are marked according to the super-green factor to generate the mask map. The blue dots inside the rectangle indicate that the foreground is selected. The red and yellow dots outside the rectangle indicate that the background is selected. The yellow dots are green pixels, while the red dots are non green pixels.
Step 3: Call EISeg algorithm for image segmentation.
  2.2. Cycle-GAN as Synthetic Image Generator
In deep learning model training, enough labeled data are needed to achieve better model performance. However, in practice, scarce or imbalanced data are common in the agricultural field, and labeled data are expensive or difficult to collect. Traditional image enhancement algorithms can only be varied within a specific category, but more desirable variations can improve performance [
24]. Drawing upon the aforementioned analysis, our study will introduce a novel image enhancement method based on Cycle-GAN, which can realize the transformation from one category to another. The Cycle-GAN model inherits the idea of adversarial training of GAN and realizes the mapping between the source domain and the target domain without the pair relationship with the dual training and learning mode. This feature of Cycle-GAN allows it to migrate without paired datasets.
Cycle-GAN is a generative adversarial network variant that maps images from one domain to another without matching the correspondence between the two domains. Specifically, Cycle-GAN works by splitting the mapping of a shot from one field to another into two mappings: One model facilitates the transformation from domain A to domain B, while the other enables the transformation from domain B to domain A. These mappings are designed to invert each other, so Cycle-GAN can ensure consistency and similarity by converting images to each other. The network structure of Cycle-GAN is shown in 
Figure 5. By training these mappings, Cycle-GAN can generate high-quality images, enabling many interesting applications such as style transfer, image conversion, and image enhancement.
Table 1 displays the architecture of the generator, while 
Table 2 presents the configuration of the discriminator in Cycle-GAN.
 In this study, we found that the image transformation achieved by Cycle-GAN is often global. It will cause unnecessary background noise information to be transformed, which is harmful to the learning of the model. Therefore, before performing the Cycle-GAN transformation to generate different types of images, the image will be segmented once. The unwanted background will be removed in advance, to ensure the image transformation is carried out in the target area.
  2.3. Disease Detection in Plant with MobileNet
MobileNet is a lightweight, efficient convolutional model that is ideal for mobile devices. Therefore, we chose MobileNet as the training model. Incorporating pre-trained MobileNet, a lightweight convolutional network trained on the ImageNet dataset, for tomato crop disease recognition through the transfer learning technique.The MobileNet architecture used is described as follows:
MobileNet has undergone three generations of updates, and MobileNetv1 uses depthwise separable convolutions to build lightweight networks [
25]. MobileNetV2 introduces a novel inverted residual with a linear bottleneck unit, resulting in improved network accuracy and speed despite the increased number of layers. MobileNetV3 takes advantage of both machine learning techniques and manual fine-tuning to construct a more efficient and lightweight network.
The model used in this paper is MobileNetv3, and the description of each layer is shown in the 
Table 3. It comprises 15 Bneck layers, one standard convolutional layer, and three pointwise convolutional layers, taking 224 × 224-pixel images as input. The first convolutional layer unrolls the 2242 × 3 image input, and the middle is 15 Bneck layers to learn the image features. It is followed by a pooling layer and two BN convolutional layers. Each Bneck contains two pointwise convolutional layers and one deep convolutional layer.