Next Article in Journal
Intrinsic and Extrinsic Factors Affecting Neutral Detergent Fiber (NDF) Digestibility of Vegetative Tissues in Corn for Silage
Previous Article in Journal
Insecticide Monitoring in Cattle Dip with an E-Nose System and Room Temperature Screen-Printed ZnO Gas Sensors
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

VLDNet: An Ultra-Lightweight Crop Disease Identification Network

College of Information Engineering, Northwest A&F University, Xianyang 712100, China
*
Author to whom correspondence should be addressed.
Agriculture 2023, 13(8), 1482; https://doi.org/10.3390/agriculture13081482
Submission received: 13 June 2023 / Revised: 18 July 2023 / Accepted: 21 July 2023 / Published: 26 July 2023

Abstract

:
Existing deep learning methods usually adopt deeper and wider network structures to achieve better performance. However, we found that this rule does not apply well to crop disease identification tasks, which inspired us to rethink the design paradigm of disease identification models. Crop diseases belong to fine-grained features and lack obvious patterns. Deeper and wider network structures will cause information loss of features, which will damage identification efficiency. Based on this, this paper designs a very lightweight disease identification network called VLDNet. The basic module VLDBlock of VLDNet extracts intrinsic features through 1 × 1 convolution, and uses cheap linear operations to supplement redundant features to improve feature extraction efficiency. In inference, reparameterization technology is used to further reduce the model size and improve inference speed. VLDNet achieves state-of-the-art model (SOTA) latency-accuracy trade-offs on self-built and public datasets, such as equivalent performance to Swin-Tiny with a parameter size of 0.097 MB and 0.04 G floating point operations (FLOPs), while reducing parameter size and FLOPs by 297 times and 111 times, respectively. In actual testing, VLDNet can recognize 221 images per second, which is far superior to similar accuracy models. This work is expected to further promote the application of deep learning-based crop disease identification methods in practical production.

1. Introduction

Traditional crop disease identification relies mainly on the long-term accumulated experience of farmers, as the symptoms of diseases are complex and varied, requiring high levels of professional knowledge from agricultural producers. Manual observation and judgment of disease types may be subject to strong subjectivity and is also time-consuming. Therefore, using modern information technology to achieve efficient and accurate crop disease identification is of great importance. Traditional image processing methods require manual disease spot segmentation, feature extraction, and classifier construction, which consume a lot of time in terms of data preprocessing and are greatly affected by objective conditions, making feature extraction difficult [1,2].
In recent years, the rapid development of deep learning has provided new solutions for agriculture disease identification. Convolutional neural networks (CNN) have powerful feature extraction capabilities and have achieved successful applications in the fields of image classification [3], object detection [4], semantic segmentation [5], etc. In the field of agriculture, some scholars have successfully applied deep learning to crop disease identification. For example, ref. [6] used healthy and diseased leaf images to train CNN models for disease detection and diagnosis, achieving an identification accuracy of 99.53% when classifying 17,548 plant leaf images using VGGNet [7]. Ref. [8] used more than 40,000 images to train the GoogleNet model and obtained identification accuracies ranging from 75% to 100% on different plants. Ref. [9] proposed a new deep neural network structure consisting of two sub-models that separated the tree leaves from the background in the original image. Then, various popular pre-trained models were used to extract features and classify diseases. They achieved an 87.45% identification accuracy in the 2019 AI Challenger competition. Early research in crop disease identification used relatively simple CNN structures, which had a large number of parameters and lower identification accuracy in complex environments. Some studies have also used the Vision Transformer [10] for disease identification, which has become popular recently and has achieved good results. For example, ref. [11] proposed a method that combines CNN and Transformer structures for kiwi disease identification, achieving an identification accuracy of 98.78% on a self-built dataset. Ref. [12] proposed PlantXViT based on traditional CNN and Visual Transformer for apple, corn, and rice disease identification, with average identification accuracies exceeding 93.55%, 92.59%, and 98.33%, respectively.
Through the analysis of the above research work of crop disease recognition based on deep learning, we found that these models usually directly or indirectly use models that perform well on the publicly available ImageNet dataset. These models achieve excellent identification performance on ImageNet by designing very deep and wide networks to learn different feature patterns for various objects, including cats and dogs. After the input image enters the network model, the detailed information gradually decreases, and the final network model uses advanced semantic information for the final decision. However, as the features of crop leaf surface diseases are usually discrete and similar, and often small, there is no obvious pattern, which means that models that achieve good identification performance on ImageNet may not necessarily be able to improve the performance of crop disease identification solely by stacking network layers and increasing model width. On the contrary, this may lead to the loss of detailed disease features and undoubtedly increase the model’s parameters and FLOPs, ultimately leading to a decrease in identification accuracy.
The above disease recognition model also faces the problem of a large number of parameters and complex calculation. In order to reduce model parameters and FLOPs, some researchers have proposed lightweight deep learning algorithms that can help deep learning models to be applied to different edge devices. For example, ref. [13] combined the advantages of Transformer and CNN to propose lightweight apple disease recognition, and obtained competitive results. Ref. [14] developed WearNet, a lightweight convolutional neural network, to enable automatic scratch detection of contact sliding parts such as metal molding. Compared with the existing networks, WearNet can achieve 94.16% excellent classification accuracy with smaller model size and faster detection speed. Ref. [15] proposed a lightweight sheep face recognition model, SheepFaceNet, which achieved 97.75% recognition accuracy with 0.60 MB parameters. Ref. [16] used depthwise separable convolutions to construct a lightweight CNN model for plant disease leaf classification. Compared with traditional CNN models, the network has fewer parameters. Ref. [17] proposed a lightweight SimpleNet model that performs well in automatic wheat spike disease identification. These works greatly reduce the number of model parameters and FLOPs, contributing to the deployment of deep learning models on edge devices. However, related research shows that these indicators may not have a good correlation with the model’s inference speed. Efficiency indicators such as FLOPs do not consider memory access costs and parallelism, which can have a significant impact on latency during inference. Therefore, ref. [18] implemented real-time disease identification using Faster-RCNN and Yolov4, which is highly applicable to edge devices, but the identification accuracy cannot meet the needs of actual production.
In this context, we need to rethink the design paradigm of crop disease identification models and study algorithms that are computationally efficient and have high identification accuracy to serve the development of smart agriculture. This paper proposes a relatively shallow and narrow lightweight disease identification network VLDNet. The basic structure of VLDNet, VLDBlock, extracts intrinsic features through 1 × 1 convolution and uses cheap linear operations to supplement redundant features, improving feature extraction efficiency. During inference, reparameterization techniques are used to reduce model parameters and computational costs and increase inference speed. VLDNet achieved SOTA model latency-accuracy trade-offs on self-built and publicly available datasets, achieving performance comparable to Swin-Tiny with 0.097 MB parameters and 0.04 G FLOPs, respectively, while reducing parameters and FLOPs by 297 times and 111 times, respectively. In actual testing, VLDNet can recognize 221 images per second, far superior to similar-accuracy models. This paper’s work is expected to further promote the application of crop disease identification in actual production.
The contributions of this paper include:
  • We found no strict correlation between the depth, width design of the disease identification model, and the model performance.
  • We proposed a shallow, narrow crop disease identification model: VLDNet.
  • VLDNet achieved a good latency-accuracy tradeoff.

2. Materials and Methods

2.1. Datasets

2.1.1. PlantVillage Dataset

PlantVillage [19] is a publicly available dataset that contains 54,306 images of 38 classes of diseases and healthy crop leaves. These images are complete leaf images with a single background and without any interference from obstructions or background factors. This makes the PlantVillage dataset a good data foundation for research into disease identification. Information on the crop species and disease categories covered in this dataset can be found in the Table A1. In this paper, the dataset is divided into training, validation, and test sets in a ratio of 7:2:1. It should be noted that, due to the enormous size of the dataset, no data augmentation operations were performed. Some examples of this dataset are shown in Figure 1.

2.1.2. Building Our Own Dataset

This dataset was collected from the apple and kiwi fruit experimental stations at the Northwest A&F University in Shaanxi Province, China. Healthy and diseased leaves were photographed using BM-500GE/BB-500GE (JAI Company, Copenhagen, Denmark) color digital cameras. The images have a resolution of 2456 × 2058 pixels, and a total of 4180 images were obtained. The dataset is divided into training, validation, and test sets in a ratio of 7:2:1. This includes six types of apple leaf disease images: spot, brown spot, flower leaf, gray spot, rust, and healthy, as well as four types of kiwi leaf disease images: brown spot, flower leaf, anthracnose, and leaf blight. Examples of the images can be found in Figure 2.
To expand the dataset and improve the identification performance of the model, the necessary data augmentation strategies were applied to the training set, including random cropping, brightness adjustment, rotation, flipping, and adding salt and pepper noise and Gaussian noise to simulate the impact of the shooting equipment on the simulated results. A total of 14,600 images were obtained from the augmented training set. Detailed information on each type of image before and after enhancement is provided in Table 1. In order to reduce training time, the image sizes in the dataset were adjusted from 2456 × 2058 to 224 × 224.

2.2. Methods

2.2.1. Do Deeper and Wider Networks Have Better Performance?

In this section, experiments were conducted on the PlantVillage dataset to preliminarily validate whether deeper and wider networks can achieve better performance. EfficientNet, PVTv2, and ResNet series models, each with different depths and widths, were selected for the experiment. The experimental results are shown in Figure 3. There is little difference in identification accuracy between the EfficientNet models, and their identification accuracy curves almost overlap. The loss curves of different sizes of EfficientNet models are also very consistent, with smaller models converging faster. For the PVTv2 model, the identification accuracy curves of models of different sizes are not stable, showing a fluctuating trend, but the final identification accuracy is almost the same. Moreover, the relatively smaller PVTv2-b1 obtained better identification results, which confirms our hypothesis. For the ResNet model, the identification accuracy of small-sized models is basically the same as or even higher than that of large-sized models. The convergence speed is also not significantly different, which further verifies our conjecture that deeper and wider models cannot achieve better results for crop disease identification tasks. Even if deeper and wider models can achieve a slight identification accuracy advantage, this comes at the cost of increasing parameter size and computational complexity several times, which is unacceptable for edge devices.

2.2.2. VLDNet

To further investigate the relationship between network model width, depth, and model performance, we propose a lightweight disease recognition model called VLDNet. The overall structure of VLDNet is shown in Figure 4. It includes a VLDBlock, four VLDBottlenecks, an average pooling layer, and a fully connected layer. The VLDBlock is built upon the MobileNet-V1 building block. It consists of a 3 × 3 depth convolution, followed by a 1 × 1 pointwise convolution. Each operation is repeated four times. It also introduces parameterized skip connections and Batch Normalization (BN), using the ReLU activation function. The VLDBlock has two different structures during training and testing. During training, it has a branch with 1 × 1 pointwise convolution and batch normalization. During inference, all parameterized branches are removed by parameterization. The VLDBlock forms the feature extraction network of VLDNet, enhancing the effectiveness and efficiency of feature extraction. Based on the VLDBlock, we propose VLDBottleneck following the idea of ResNet. VLDBottleneck consists of two stacked VLDBlocks. The first VLDBlock serves as an expansion layer to increase the number of channels. The second VLDBlock reduces the number of channels to match the residual connection. A residual connection is added between the input and output of the two VLDBlocks. Each VLDBlock is followed by a BN layer and ReLU activation function, except for the second VLDBlock. A stride = 2 depth convolution is inserted between the two VLDBlocks for downsampling. VLDBottleneck is used to extract disease features while reducing the model’s parameter count. The average pooling layer mainly aims to reduce computation and extract essential features. The fully connected layer transforms the feature map into a vector representation of the disease and outputs the disease category.
VLDBlock is a structural reparameterization of the Ghost module [20], shown in Figure 5b. Deep convolutional neural networks often consist of many convolutional layers, which leads to significant computational costs. Although recent works such as MobileNet [21] and ShuffleNet [22] have introduced depthwise convolutions or shuffle operations to construct efficient convolutional neural networks using smaller convolutional kernels, the remaining 1 × 1 convolutional layers still consume a considerable amount of memory and parameters. This process is illustrated in Figure 5a and can be expressed mathematically as follows:
Y = X * F + B
The shape of the input data X is X R c × h × w , where c, h, and w represent the number of channels, height, and width, respectively. The * symbol denotes the convolution operation. B represents the bias term. Y R h × w × n is the output feature map with n channels. F R c × k × k × n represents the convolutional kernels in this layer, where h and w are the output height and width, respectively, and k × k is the size of the convolutional kernel F. In this process, the FLOPs are typically very large.
In fact, generating redundancy during the calculation of feature maps is necessary for the performance of the network, as seen in Figure 6, where there are many similar feature maps. However, it is not necessary to generate these redundant feature maps one by one using a large number of parameters and FLOPs. Therefore, this paper uses a simple linear transformation to achieve mutual conversion between redundant feature maps, as shown in Figure 5b.
The intrinsic feature maps and redundant feature maps together constitute the feature maps. The process of generating m inherent feature maps Y R h × w × m is shown in Equation (2). The inherent feature maps are the remaining feature maps after subtracting redundant feature maps from the total feature maps.
Y = X * F
This process is generated by a primary convolution, with convolution kernel parameters identical to those in Equation (2). Here, F R c × k × k × m represents the convolution kernel used. The process of generating n redundant feature maps from m inherent feature maps using an inexpensive linear transformation is described by Equation (3):
y i j = Φ i , j ( y i ) ,     i = 1 , , m ,   j = 1 , . s .
Here, y i represents the i-th inherent feature map in Y , and Φ i , j is the j-th linear transformation used to generate the j-th feature map y i j in Equation (3). Using Equation (3), we can obtain n = m * s feature maps Y = [ y 11 , y 12 , y m s ] . Linear transformations operate on each channel, and their computational cost is much lower than that of normal convolutions. The structure of the linear transformations is shown in Figure 5b.
Although reducing FLOPs and parameters may lower the computational complexity of the model, ref. [23] has shown that these metrics are not well correlated with the efficiency of the model. This is because metrics like FLOPs do not take into account memory access costs and parallelism, which can have a significant impact on latency during inference [24]. Therefore, this paper proposes the reparameterization of structure Figure 5b to build the basic structure VLDBlock of VLDNet, in order to further reduce the cost and inference time of the model.
The structural reparameterization technique [25,26] is an effective neural network technique that decouples training and inference, greatly facilitating the deployment of deep neural networks in practical applications. During training, for a given backbone network, the structural reparameterization technique increases the model’s representational power by adding multiple branches or specific layers with various neural network components to the backbone network. During inference, the added branches or layers can be merged into the backbone network’s parameters through equivalent transformations, significantly reducing the number of parameters or computational costs without affecting performance and accelerating inference.
In this paper, during training, for a convolution layer with kernel size K = {1,3}, input channels C i n , output channels C o u t , the weight matrix can be represented as W C o u t × C i n × K × K , and the bias represented as B D . The BatchNorm (BN) layer includes accumulated mean μ , variance σ , and bias β . As convolution layers and BN are linear operations during inference, they can be merged, and the corresponding weights are   W ^ = W * γ σ , bias is   B ^ = ( B μ ) * γ σ + β , where γ is the scaling factor. For skip connections, BN is merged into identity 1 × 1 kernel with 0-padding. After merging BN into each branch, the corresponding weight matrix is W = i M W i ^ , and the bias is B = i M B i ^ , where M is the number of network branches, as shown in Figure 7. This way, the number of parameters and computational cost of the model are significantly reduced, and the inference speed is also increased.

3. Results

3.1. Evaluation Indicators and Experimental Parameter Settings

3.1.1. Evaluation Indicators

The evaluation metrics used in this paper include accuracy, balanced accuracy, recall, precision, F1 score, geometric mean, parameters, and FLOPs. This is calculated as follows:
A c c u r a c y = ( T P + T N ) / ( T P + T N + F P + F N )
P r e c i s i o n = T P / ( T P + F P )
R e c a l l = T P / ( T P + F N )
F 1   S c o r e = 2 / ( ( 1 / p r e c i s i o n ) + ( 1 / r e c a l l ) )
F L O P s = 2 h w × ( C i n × K 2 + 1 ) × C o u t
S p e c i f i c i t y = T N / ( T N + F P )
B a l a n c e d   A c c u r a c y = 1 n 1 n A c c u r a c y i , i ϵ [ 1 , n ]
G e o m e t r i c   M e a n = S p e c i f i c i t y × R e c a l l
TP represents true positives, FP represents false positives, TN represents true negatives, and FN represents false negatives. Accuracy represents the proportion of correctly classified samples. Balanced accuracy is a measure of the proportion of correctly classified data on imbalanced datasets. n is the number of classes. Recall represents the proportion of all positive samples that are correctly identified by the classifier. Precision represents the proportion of samples that the classifier correctly identifies as positive out of all the samples it classifies as positive. The F1 Score is a measure that takes into account both precision and recall. Geometric mean calculates the geometric mean of the sensitivity of each class to take into account the predictive power of the model on different classes. Parameters refer to the number of adjustable parameters in a model, including weights and biases. Smaller numbers of parameters mean less requirements for hardware. FLOPs represent the computational complexity of a model, with lower values indicating simpler calculations. Among the formula variables, h, w, and C i n represent the height, width, and number of channels of input feature maps, respectively, while C o u t represents the number of output feature map channels, and K represents the convolution kernel width. By selectively controlling FLOPs and parameters, we can reduce the size of a model while maintaining its performance.

3.1.2. Experimental Parameter Setting

The experiment was conducted on Ubuntu 20.04 with an Intel Core i9 10900X processor, 48GB RAM (Dell T5820 graphics workstation, Round Rock, TX, USA), and two GeForce RTX 3090 GPUs (NVIDIA, Santa Clara, CA, USA). The deep learning framework used was PyTorch, and training was carried out using Cuda 11.1. Please refer to Table 2 for information on the other settings.

3.2. Experimental Results on PlantVillage Dataset

To validate the effectiveness of our proposed VLDNet model, we conducted experiments on the PlantVillage dataset and compared the results with those of recent studies. As shown in Table 3, our model achieved an impressive identification accuracy of 99.26%, far surpassing a range of large-scale models. For instance, compared to VGG16, reported by [27], VLDNet achieved an accuracy that was 17.43% higher, with an almost negligible parameter count. Compared to DECA-ResNet18 [28], which achieved the highest identification accuracy on this dataset, our model’s identification accuracy was only 0.64% lower, but our model’s parameter count was reduced by 499 times. This demonstrates the advantage of VLDNet in model lightweighting. VLDNet achieves high identification accuracy and has fewer parameters, making it ideal for deployment on resource-limited edge devices.

3.3. Experimental Results on Self-Built Dataset

3.3.1. Comparison between the Proposed Model and Lightweight SOTA

To further validate the performance of the VLDNet model, we conducted experiments on a self-built dataset. As shown in Table 4, VLDNet achieved an identification accuracy of 98.32%, outperforming a range of widely used lightweight CNNs such as MobilenetV2 (96.17%), MobilenetV3 (96.70%) [37], etc., with identification accuracy that was higher than them by 2.15% and 1.62%, respectively, while having a smaller model size. Compared to Swin-Tiny, which has the best identification accuracy (98.77%), VLDNet’s identification accuracy was only 0.45% lower, while its parameter count, and FLOPs were reduced by 297 and 111 times, respectively. This, once again, demonstrates the advantage of VLDNet in lightweighting. Compared to the smallest MobileViT-XXS [38] (97.97%), VLDNet achieved an identification accuracy that was 0.35% higher, but its parameter count and FLOPs were only 7.6% and 16% of it, respectively. It can be seen that VLDNet achieves a good balance between computational efficiency and identification accuracy of the model.

3.3.2. Comparison between the Proposed Model and Heavyweight SOTA

Table 5 shows the comparative experimental results of VLDNet with other heavy-weight SOTA models on our self-built dataset. We can see that, even when facing models with much larger parameters and computational complexity, VLDNet’s identification performance is still competitive. For example, it achieved higher identification accuracy than a range of large-scale models, such as ViT-base, PVTV2-b5 [39], ConViT-Base [40], etc. This validates our hypothesis that deeper and wider network structures may not necessarily achieve better results in crop disease identification tasks. Compared to the best performing model VGG16, VLDNet’s identification accuracy was only 1.06% lower, but at a very high cost of resource consumption and computation, which is unacceptable for edge devices. However, VLDNet achieves a high identification accuracy with very little resource consumption and consumption, which is very friendly to edge devices.

3.4. Five-Fold Cross-Validation on Self-Built Dataset

In order to further validate the performance of VLDNet, we conducted a five-fold cross-validation on our self-built dataset. The experimental parameters were set to the default values provided in Table 2. A total of 50 epochs of experiments were performed, and the results of the test set are shown in Figure 8. F1–F5 represent models trained using different validation sets. It can be observed that the models trained with different validation sets converge quickly. The final identification accuracy rate is consistently above 98%, which is essentially consistent with the previous experimental results. This demonstrates the excellent performance of VLDNet.

3.5. Do Deeper and Wider Networks Lead to Better Identification Results?

The experimental results in Section 2.2.1 and Section 3.3.2 have already shown that deeper and wider networks may not necessarily improve disease identification accuracy, and sometimes may even lead to a decrease in accuracy. For example, on our self-built dataset, PVTV2-b5 had a lower identification accuracy than PVTV2-b0. To further validate this issue, we conducted additional experiments in this section. The experiments were conducted on our self-built dataset using VLDNet, with the network width controlled by the parameter α = {1,2,4}, which increased the network width by multiplying the number of channels in each layer by α, while keeping the network depth constant. The experimental results are shown in Figure 9, which shows that there is no difference in identification accuracy between models of different widths, and that shallower networks actually converge faster. This once again confirms our hypothesis and demonstrates the effectiveness of VLDNet’s design.

3.6. Model Inference Time Testing

In this section, we tested the inference time of different models on an NVIDIA GTX 1650Intel (R), Core (TM) i7-10700 CPU @ 2.90 GHz. As shown in Figure 10, the red arrow points to the VLDNet model, which has the fastest inference speed while maintaining a high identification accuracy of 98.32%. In actual measurements, VLDNet can recognize 221 images per second, with an average of 4.52 ms per image, which is more than 37% faster than Resnet18 (6.2 ms), which has the closest inference speed. Compared to PVTV2-b5 (98.24%) with a similar identification accuracy, the inference speed of VLDNet is increased by 11.6 times. Compared to VGG16 (99.38%), which has the best identification performance, although the identification accuracy of VLDNet is 1.06% lower, its inference speed is increased by 8.2 times. VLDNet not only has fewer model parameters and lower FLOPs, but also has a fast inference speed, thanks to the use of inexpensive operations to supplement redundant features and the operation of reparameterization.

3.7. Ablation Experiments

In order to validate the necessity and effectiveness of each design in VLDNet, this section conducted ablation experiments on our self-built dataset. The experimental results are shown in Table 6. Even with a shallow and narrow ordinary model without using linear operations to supplement redundant features and reparameterization, an identification accuracy as high as 98.33% can be achieved while keeping the model parameters and FLOPs low. This result again validates our hypothesis about the relationship between model depth, width, and performance. When linear operations are used to supplement redundant features, the number of model parameters and FLOPs is reduced to half of the original, while the identification accuracy is not significantly affected, indicating the effectiveness of this operation. Furthermore, based on this, using structural reparameterization reduces the number of model parameters and FLOPs to 12% and 11% of the original, respectively. The identification accuracy of the model also did not change significantly. The ablation experiment verifies that each design in VLDNet is effective and necessary.

3.8. Visual Display of Identification Results

In this section, the Grad-CAM method was used for visualization to observe the classification basis of the VLDNet model. The experimental results are shown in Figure 11. Grad-CAM [41] is a deep neural network visualization method based on gradient localization. It calculates the weight of each feature map in the last convolutional layer for the image category and obtains the weighted sum of each feature map. Then, it maps the weighted feature maps to the original image in the form of a heatmap to explain the classification basis of the deep neural network model. Figure 10 shows that the VLDNet model accurately focuses on the area where the disease occurs in each disease image, which is consistent with our judgment basis. This indicates that the VLDNet model has good performance in crop disease classification.

4. Discussion

This paper rethinks the relationship between the depth and width design of disease identification models and model performance. It found that the commonly used paradigm of designing models with wide and deep structure to improve identification accuracy is not applicable in disease identification tasks. Based on this finding, this paper proposes the lightweight disease identification model VLDNet, which achieves a good balance between efficiency and accuracy. The paper first experiments with ResNet and other models on the public dataset PlantVillage to verify their hypothesis about the non-strict correlation between the depth and width design of disease identification models and their performance. Based on this, the paper proposes the VLDNet, which uses the basic module VLDBlock to extract inherent features using 1 × 1 convolutional operations, and then supplements redundant features using inexpensive linear operations, thereby improving the efficiency of feature extraction. The paper also uses structural reparameterization during inference to reduce the number of model parameters and FLOPs and accelerate model inference. VLDNet achieves good identification results on both the PlantVillage and self-built datasets, with fewer model parameters, simpler computation, and faster inference speed. In addition, the paper conducts ablation experiments to verify the necessity and effectiveness of each design of VLDNet, and the visualization graphs also prove that VLDNet can accurately focus on diseased areas.
Compared to large-scale models such as [28,33,34], our model performs better or on par with them in terms of identification accuracy. Although the differences in identification accuracy are not significant, our model has much fewer parameters and FLOPs than theirs, and its inference speed is faster. Some works choose to use strategies such as pruning, quantization, and distillation to obtain lightweight models [27,42]. Although these methods can reduce the number of model parameters and computational complexity, the identification accuracy of the model may decrease. In addition, even if large-scale models are lightweighted through these strategies, it is still difficult to achieve complete lightweighting. In contrast, our model has the same identification accuracy as large-scale models, while having much fewer parameters and FLOPs than them. There are also some works that specialize in designing lightweight models to reduce params and FLOPs, such as [16,17], and have achieved high identification accuracy, but their actual inference speed needs to be verified. In addition, the lightweight network designed by [16] achieved high recognition accuracy in complex backgrounds, but the number of model parameters and FLOPs still cannot be compared with ours. Ref. [18] and our model achieved real-time identification effects, but empirical evidence suggests that our model has higher inference efficiency, which means that our model has lower device requirements and a wider range of applications. Our model uses structural reparameterization for optimization, but this requires redesigning and adjusting the structure and parameters in the neural network, which is more complex than traditional neural network models. Additionally, further improvement is needed in our model’s support for large-scale data.
This paper discovers that the paradigm of designing models with wide and deep structure to improve identification accuracy is not applicable in disease identification tasks. The authors hope that researchers can further verify this discovery and apply it to guide the design of disease models. The proposed lightweight identification model VLDNet has very low requirements for deployment devices, which will help accelerate the deployment process of deep learning-based disease identification models on edge devices and promote the development of smart agriculture.

5. Conclusions

This paper proposes a rethinking of the design paradigm for crop disease identification models based on deep learning. The experiment verifies that there is a non-strict correlation between the depth and width design of disease identification models and their performance. Based on this, the paper designs the VLDNet, a lightweight disease identification model that achieves a good balance between efficiency and performance. VLDNet performed well on both public and self-built datasets. This discovery is crucial for the efficient design of models, especially for the deployment of deep learning-based identification methods on edge devices in smart agriculture. Our future research direction is to apply structure reparameterization techniques to Vision Transformers. This is because Vision Transformers have stronger expressive power, which has the potential to further enhance the performance of disease recognition tasks based on deep learning. We will continue to explore efficient model design methods, improve and optimize VLDNet, and better serve the development of smart agriculture.

Author Contributions

Conceptualization, X.L. and Y.Z.; methodology, X.L.; software, X.L.; validation, X.L. and Y.Z.; formal analysis, X.L. and Y.P.; investigation, Y.Z. and Y.P.; resources, Y.Z.; data curation, Y.Z.; writing—original draft preparation, X.L.; writing—review and editing, X.L.; visualization, Y.Z.; supervision, S.L.; project administration, S.L.; funding acquisition, S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by National Key Research and Development Program of China (Grants number 2020YFD1100600, Grants number 2020YFD1100601).

Data Availability Statement

The data presented in this study are available upon request from the corresponding author. The data are not publicly available due to the privacy policy of the authors’ institution.

Acknowledgments

We thank all of the funders.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. The PlantVillage dataset Label ID and corresponding Label Name.
Table A1. The PlantVillage dataset Label ID and corresponding Label Name.
Label IDLabel NameLabel IDLabel Name
1Apple_Apple_scab21Potato_Early_blight
2Apple_Black_rot22Potato_healthy
3Apple_Cedar_apple_rust23Potato_Late_blight
4Apple_healthy24Raspberry_healthy
5Blueberry_healthy25Soybean_healthy
6Cherry_(including_sour)_healthy26Squash_Powdery_mildew
7Cherry_(including_sour)_Powdery_mildew27Strawberry_healthy
8Corn_(maize)_Cercospora_leaf_spot Gray_leaf_spot28Strawberry_Leaf_scorch
9Corn_(maize)_Common_rust_29Tomato_Bacterial_spot
10Corn_(maize)_healthy30Tomato_Bacterial_spot
11Corn_(maize)_Northern_Leaf_Blight31Tomato_healthy
12Grape_Black_rot32Tomato_Late_blight
13Grape_Esca_(Black_Measles)33Tomato_Leaf_Mold
14Grape_healthy34Tomato_Septoria_leaf_spot
15Grape_Leaf_blight_(Isariopsis_Leaf_Spot)35Tomato_Spider_mites Two-spotted_spider_mite
16Orange_Haunglongbing_(Citrus_greening)36Tomato_Target_Spot
17Peach_Bacterial_spot37Tomato_Tomato_mosaic_virus
18Peach_healthy38Tomato_Tomato_Yellow_Leaf_Curl_Virus
19Pepper_bell_Bacterial_spot
20Pepper_bell_healthy

References

  1. Sharif, M.; Khan, M.A.; Iqbal, Z.; Azam, M.F.; Lali, M.I.U.; Javed, M.Y. Detection and classification of citrus diseases in agriculture based on optimized weighted segmentation and feature selection. Comput. Electron. Agric. 2018, 150, 220–234. [Google Scholar] [CrossRef]
  2. Patil, J.K.; Kumar, R. Analysis of content based image retrieval for plant leaf diseases using color, shape and texture features. Eng. Agric. Environ. Food 2017, 10, 69–78. [Google Scholar] [CrossRef]
  3. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
  4. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
  5. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015. [Google Scholar]
  6. Ferentinos, K.P. Deep learning models for plant disease detection and diagnosis. Comput. Electron. Agric. 2018, 145, 311–318. [Google Scholar] [CrossRef]
  7. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  8. Barbedo, J.G.A. Plant disease identification from individual lesions and spots using deep learning. Biosyst Eng. 2019, 180, 96–107. [Google Scholar] [CrossRef]
  9. Huang, S.; Liu, W.; Qi, F.; Yang, K. Development and validation of a deep learning algorithm for the recognition of plant disease. In Proceedings of the 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), Changsha, China, 10–12 August 2019. [Google Scholar]
  10. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Houlsby, N. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
  11. Li, X.; Chen, X.; Yang, J.; Li, S. Transformer helps identify kiwifruit diseases in complex natural environments. Comput. Electron. Agric. 2022, 200, 107258. [Google Scholar] [CrossRef]
  12. Thakur, P.S.; Khanna, P.; Sheorey, T.; Ojha, A. Explainable vision transformer enabled convolutional neural network for plant disease identification: PlantXViT. arXiv 2022, arXiv:2207.07919. [Google Scholar]
  13. Li, X.; Li, S. Transformer Help CNN See Better: A Lightweight Hybrid Apple Disease Identification Model Based on Transformers. Agriculture 2022, 12, 884. [Google Scholar] [CrossRef]
  14. Li, W.; Zhang, L.; Wu, C.; Cui, Z.; Niu, C. A new lightweight deep neural network for surface scratch detection. Int. J. Adv. Manuf. Technol. 2022, 123, 1999–2015. [Google Scholar] [CrossRef]
  15. Li, X.; Zhang, Y.; Li, S. SheepFaceNet: A Speed–Accuracy Balanced Model for Sheep Face Recognition. Animals 2023, 13, 1930. [Google Scholar] [CrossRef] [PubMed]
  16. Kamal, K.C.; Yin, Z.; Wu, M.; Wu, Z. Depthwise separable convolution architectures for plant disease classification. Comput. Electron. Agric. 2019, 165, 104948. [Google Scholar]
  17. Bao, W.; Yang, X.; Liang, D.; Hu, G.; Yang, X. Lightweight convolutional neural network model for field wheat ear disease identification. Comput. Electron. Agric. 2021, 189, 106367. [Google Scholar] [CrossRef]
  18. Khan, A.I.; Quadri, S.M.K.; Banday, S.; Shah, J.L. Deep diagnosis: A real-time apple leaf disease detection system based on deep learning. Comput. Electron. Agric. 2022, 198, 107093. [Google Scholar] [CrossRef]
  19. Hughes, D.; Salathé, M. An open access repository of images on plant health to enable the development of mobile disease diagnostics. arXiv 2015, arXiv:1511.08060. [Google Scholar]
  20. Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, DC, USA, 14–19 June 2020. [Google Scholar]
  21. Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
  22. Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
  23. Dehghani, M.; Arnab, A.; Beyer, L.; Vaswani, A.; Tay, Y. The efficiency misnomer. arXiv 2021, arXiv:2110.12894. [Google Scholar]
  24. Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
  25. Ding, X.; Guo, Y.; Ding, G.; Han, J. Acnet: Strengthening the kernel skeletons for powerful cnn via asymmetric convolution blocks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
  26. Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021. [Google Scholar]
  27. Too, E.C.; Yujian, L.; Njuki, S.; Yingchun, L. A comparative study of fine-tuning deep learning models for plant disease identification. Comput. Electron. Agric. 2019, 161, 272–279. [Google Scholar] [CrossRef]
  28. Gao, R.; Wang, R.; Feng, L.; Li, Q.; Wu, H. Dual-branch, efficient, channel attention-based crop disease identification. Comput. Electron. Agric. 2021, 190, 106410. [Google Scholar] [CrossRef]
  29. Wang, G.; Sun, Y.; Wang, J. Automatic image-based plant disease severity estimation using deep learning. Comput. Intell. Neurosci. 2017, 2017, 2917536. [Google Scholar] [CrossRef] [Green Version]
  30. Gandhi, R.; Nimbalkar, S.; Yelamanchili, N.; Ponkshe, S. Plant disease detection using CNNs and GANs as an augmentative approach. In Proceedings of the 2018 IEEE International Conference on Innovative Research and Development (ICIRD), Bangkok, Thailand, 11–12 May 2018. [Google Scholar]
  31. Elhassouny, A.; Smarandache, F. Smart mobile application to recognize tomato leaf diseases using Convolutional Neural Networks. In Proceedings of the 2019 International Conference of Computer Science and Renewable Energies (ICCSRE), Agadir, Morocco, 22–24 July 2019. [Google Scholar]
  32. Chen, J.; Chen, J.; Zhang, D.; Sun, Y.; Nanehkaran, Y.A. Using deep transfer learning for image-based plant disease identification. Comput. Electron. Agric. 2020, 173, 105393. [Google Scholar] [CrossRef]
  33. Mohameth, F.; Bingcai, C.; Sada, K.A. Plant disease detection with deep learning and feature extraction using plant village. J. Comput. Commun. 2020, 8, 10–22. [Google Scholar] [CrossRef]
  34. Zhao, Y.; Sun, C.; Xu, X.; Chen, J. RIC-Net: A plant disease classification model based on the fusion of Inception and residual structure and embedded attention mechanism. Comput. Electron. Agric. 2022, 193, 106644. [Google Scholar] [CrossRef]
  35. Thakur, P.S.; Sheorey, T.; Ojha, A. VGG-ICNN: A Lightweight CNN model for crop disease identification. Multimed. Tools Appl. 2023, 82, 497–520. [Google Scholar] [CrossRef]
  36. Li, E.; Wang, L.; Xie, Q.; Gao, R.; Su, Z.; Li, Y. A novel deep learning method for maize disease identification based on small sample-size and complex background datasets. Ecol. Inform. 2023, 75, 102011. [Google Scholar] [CrossRef]
  37. Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Adam, H. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
  38. Mehta, S.; Rastegari, M. Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv 2021, arXiv:2110.02178. [Google Scholar]
  39. Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Shao, L. Pvt v2: Improved baselines with pyramid vision transformer. Comput. Vis. Media 2022, 8, 415–424. [Google Scholar] [CrossRef]
  40. D’ascoli, S.; Touvron, H.; Leavitt, M.L.; Morcos, A.S.; Biroli, G.; Sagun, L. Convit: Improving vision transformers with soft convolutional inductive biases. In Proceedings of the International Conference on Machine Learning (ICML), Online, 18–24 July 2021. [Google Scholar]
  41. Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
  42. Arun, R.A.; Umamaheswari, S. Effective multi-crop disease detection using pruned complete concatenated deep learning model. Expert Syst. Appl. 2023, 213, 118905. [Google Scholar] [CrossRef]
Figure 1. Examples of PlantVillage dataset.
Figure 1. Examples of PlantVillage dataset.
Agriculture 13 01482 g001
Figure 2. Example of the self-built dataset.
Figure 2. Example of the self-built dataset.
Agriculture 13 01482 g002
Figure 3. (a,d) are the experimental results of EfficientNet on the PlantVillage dataset, (b,e) are the experimental results of PVTv2 on the PlantVillage dataset, and (c,f) are the experimental results of ResNet on the PlantVillage dataset.
Figure 3. (a,d) are the experimental results of EfficientNet on the PlantVillage dataset, (b,e) are the experimental results of PVTv2 on the PlantVillage dataset, and (c,f) are the experimental results of ResNet on the PlantVillage dataset.
Agriculture 13 01482 g003
Figure 4. Overall architecture of VLDNet.
Figure 4. Overall architecture of VLDNet.
Agriculture 13 01482 g004
Figure 5. Traditional convolution and Ghost module [20].
Figure 5. Traditional convolution and Ghost module [20].
Agriculture 13 01482 g005
Figure 6. Redundant feature maps generated during the convolution operation.
Figure 6. Redundant feature maps generated during the convolution operation.
Agriculture 13 01482 g006
Figure 7. Reparameterization process.
Figure 7. Reparameterization process.
Agriculture 13 01482 g007
Figure 8. (a) is the accuracy on the test set. (b) is the loss on the test set.
Figure 8. (a) is the accuracy on the test set. (b) is the loss on the test set.
Agriculture 13 01482 g008
Figure 9. Identification accuracy and loss of VLDNet models with different widths.
Figure 9. Identification accuracy and loss of VLDNet models with different widths.
Agriculture 13 01482 g009
Figure 10. Inference time—identification accuracy for different models.
Figure 10. Inference time—identification accuracy for different models.
Agriculture 13 01482 g010
Figure 11. Visual display of identification results.
Figure 11. Visual display of identification results.
Agriculture 13 01482 g011
Table 1. Statistics of self-built dataset.
Table 1. Statistics of self-built dataset.
Label ID Type of DiseaseOriginal Training SetThe Augmented Training Set
1Apple Black rot3701850
2Apple Brown spot4352175
3Apple Healthy4752375
4Kiwi Anthracnose186930
5Kiwi Brown spot70350
6Kiwi Leaf ulcer99495
7Kiwi Mosaic leaf61305
8Apple Mosaic leaf3751875
9Apple Rust4382190
10Apple Spotted leaf fall4112055
292614,600
Table 2. Experimental parameter configuration.
Table 2. Experimental parameter configuration.
ConfigValue
OptimizerAdam
Loss functionCrossEntropyLoss
Initial learning rate0.0001
Momentum0.0005
Weight decay0.05
Dropout0.6
Batch size64
Learning rate schedulecosine decay
Training epochs250
Image resolution224 × 224
Table 3. Comparison of results between VLDNet and other studies on PlantVillage dataset.
Table 3. Comparison of results between VLDNet and other studies on PlantVillage dataset.
StudyYearDatasetMethodAccuracy (%)Parameters (MB)
[29]2017PlantVillageInception-V38023.83
[30]2018PlantVillageMobileNet923.3
[27]2019PlantVillageVGG1681.83138.3
[31]2019Tomato leaf diseaseMobileNet88.43.3
[32]2020PlantVillageINC-VGGN91.83-
[33]2020PlantVillageVGG1697.82138.3
[33]2020PlantVillageGoogleNet95.36.62
[33]2020PlantVillageResnet5095.3825.5
[28]2021PlantVillageDECA-ResNet1899.7448.6
[34]2022Part of PlantVillageRIC-Net99.5519.1
[35]2022PlantVillageVGG-ICNN99.166
[36]2023PlantVillageMobileNetV299.102.3
[36]2023PlantVillageMDCDenseNet99.407.3
Ours2023PlantVillageVLDNet99.260.097
Table 4. Comparison of identification results with other lightweight models on self-built datasets.
Table 4. Comparison of identification results with other lightweight models on self-built datasets.
ModelParameters (MB)FLOPs (G)Acc (%)Balanced AccPrecision (%)Recall (%)F1 ScoreG-Mean
Resnet1811.51.7198.320.982398.2998.320.98310.9833
MobilenetV22.230.3296.170.960896.1596.170.96160.9617
MobilenetV35.40.2296.700.966196.6896.720.96700.9671
EfficientNet-B05.30.4196.300.962196.5796.300.96430.9631
EfficientNet-B17.730.5696.820.967396.6596.820.96730.9683
DeiT-Tiny5.681.0596.470.963896.3696.470.96410.9648
ViT-Tiny9.701.0696.910.968296.7896.900.96840.9691
Swin-Tiny294.598.770.984898.8098.710.98750.9875
PVT-Tiny12.331.8297.710.975297.6597.710.97680.9772
PVTV2-b03.670.5298.410.981298.3998.460.98420.9847
PVTV2-b114.011.9998.730.986498.7698.790.98780.9879
MobileViT-XXS1.270.2597.970.977498.0198.010.98010.9802
MobileViT-XS2.310.6998.680.984998.6398.700.98660.9870
MobileViT-S5.571.3998.730.983498.7498.730.98740.9873
MobileViTV2-501.360.3597.180.970797.1997.220.97210.9723
MobileViTV2-752.850.7897.710.973297.6397.670.97650.9768
MobileViTV2-1004.881.3898.680.983998.6598.670.98660.9868
ConViT-Ti9.50.9896.910.962296.9696.910.96930.9691
Ours0.0970.0498.320.981398.3098.320.98310.9833
Table 5. Comparison of identification results with other heavyweight models on self-built dataset.
Table 5. Comparison of identification results with other heavyweight models on self-built dataset.
ModelParameters (MB)FLOPs (G)Acc (%)Balanced AccPrecision (%)Recall (%)F1 ScoreG-Mean
ResNet-10144.557.6898.780.985998.8698.710.98750.9874
VGG1613815.599.380.991999.3599.420.99380.9943
Densenet1217.92.7797.350.970697.3897.260.97320.9728
DeiTBase86.5616.4796.210.961296.2796.080.96170.9673
ViT-base86.5616.4796.380.961996.3796.360.96380.9637
PVT-Large61.49.898.410.982298.4898.320.98440.9843
PVTV2-b462.569.5998.940.985698.9198.930.98920.9893
PVTV2-b581.9611.1298.240.981598.2598.240.98250.9825
Swin-Base87.714.8198.790.983098.8198.730.98770.9875
ConViT-Base86.3916.4297.350.972697.3897.260.97320.9729
Ours0.0970.0498.320.981398.3098.320.98310.9833
Table 6. Results of ablation experiments.
Table 6. Results of ablation experiments.
Linear OperationReparameterizationParameters (MB)FLOPs (G)Accuracy (%)
0.7470.35498.35
0.3730.17698.33
0.0970.04098.32
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, X.; Zhang, Y.; Peng, Y.; Li, S. VLDNet: An Ultra-Lightweight Crop Disease Identification Network. Agriculture 2023, 13, 1482. https://doi.org/10.3390/agriculture13081482

AMA Style

Li X, Zhang Y, Peng Y, Li S. VLDNet: An Ultra-Lightweight Crop Disease Identification Network. Agriculture. 2023; 13(8):1482. https://doi.org/10.3390/agriculture13081482

Chicago/Turabian Style

Li, Xiaopeng, Yichi Zhang, Yuhan Peng, and Shuqin Li. 2023. "VLDNet: An Ultra-Lightweight Crop Disease Identification Network" Agriculture 13, no. 8: 1482. https://doi.org/10.3390/agriculture13081482

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop