Plant-Leaf Disease Detection Based on Texture Enhancement Using ATD-Net

Li, Yuheng; Zhang, Xiafen

doi:10.3390/agriengineering8050160

Open AccessArticle

Plant-Leaf Disease Detection Based on Texture Enhancement Using ATD-Net

by

Yuheng Li

^*

and

Xiafen Zhang

College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China

^*

Author to whom correspondence should be addressed.

AgriEngineering 2026, 8(5), 160; https://doi.org/10.3390/agriengineering8050160

Submission received: 15 February 2026 / Revised: 3 April 2026 / Accepted: 8 April 2026 / Published: 22 April 2026

(This article belongs to the Section Computer Applications and Artificial Intelligence in Agriculture)

Download

Browse Figures

Versions Notes

Abstract

Early plant leaf disease detection and timely control is important for agricultural yield and stability. Yet, it is difficult for manual labor to monitor the health of the plant leaf 24 h a day. Existing detection approach cannot meet the demands of texture enhancement features. Therefore, this paper proposes a new detection approach which undergoes three-layer transformations: convolutional layer, attention mechanism layer and loss function layer. Firstly, ADown is used to extract fine-grained texture features from suspected leaves to reduce computational load. Secondly, Gabor texture enhancement is proposed to extract and enhance the contour and the directional texture of suspected areas using multi-directional filtering, followed by a combination Transformer to enhance the global context modeling capability. Thirdly, a dynamic boundary loss function (DBL) is employed to dynamically adjust the probability distribution of bounding box regression through adaptive temperature coefficient and information entropy, thereby improving the positioning accuracy of the detection box. The experiments show that ATD-Net achieved an average accuracy of 87.42% (mAP50) and an accuracy of 85.96%, with a computational complexity of 6.5 GFLOPs. The visualization results and ablation experiments show that the collaborative work of the proposed modules significantly improves the detection robustness in complex backgrounds, early diseases, and small target scenes. Compared to the original model, ATD-Net achieves a performance improvement of 1.1% at mAP50 and a speed increase of 17.7%. The model size remains almost unchanged, at 5.2 MB. It is an efficient and promising solution for future real-time disease recognition in complex agricultural environments.

Keywords:

plant-leaf disease; ADown; Gabor; dynamic boundary loss; texture enhancement

1. Introduction

The main organs of plants that carry out photosynthesis, respiration, and transpiration are their leaves. The plant’s general health is immediately reflected in their physiological and metabolic state. Diseases that infect leaves have the potential to spread and cause large financial losses. Using the Padwick technique, A Y Bandara et al. calculated the economic losses resulting from 23 main soybean diseases in 28 major US states that produce soybeans between 1996 and 2016, estimating the overall damages to be around 95.48 billion USD [1]. Tea anthracnose leads to a yield loss of 30% to 50%, according to research by Shi N et al. [2]. Crop yields can be reduced by 30% to 50% due to tea anthracnose, which is caused by a complex of fungal infections. Plant leaf diseases can result from a number of reasons, such as abnormal light exposure, inappropriate temperatures, poor soil quality, and bacterial infections. Naturally, leaf area may also be measured fast and precisely using automatic digital picture analysis [3]. One-by-one, thorough on-site examinations are necessary for manual inspection. Complex outdoor agriculture can occasionally be challenging for people to reach. Additionally, human inspection takes a lot of time and is prone to error, particularly when it comes to delicate, early-stage leaf diseases. As a result, cameras have been developed to automatically take pictures of susceptible leaves. Nonetheless, there is still much to learn about the automatic identification of plant leaf diseases from these kinds of photos, which is essential for maintaining steady agricultural output and advancing agricultural growth.

Sahu AK et al. [4] used Gaussian filtering on grayscale images for feature extraction and illness classification in medicinal plants in the context of leaf disease detection; nevertheless, the feature classifier’s data was quite constrained. Using fine-grained image characteristics based on Local Binary Patterns (LBP), Rachmad et al. [5] investigated the classification of maize leaf diseases. Nevertheless, this approach lacks versatility and necessitates manual parameter tweaking. These semi-automated detection techniques require significant manual parameter adjustment and classifier screening, despite the fact that they save time and lower financial losses [6]. Moreover, they are frequently impractical for identifying minute targets such minor texture-, color-, or dust-related illness markers. Shrestha et al. [7] collected over 3000 damaged leaves from 15 different plant species in order to investigate the relationship between various degrees of Convolutional Neural Networks (CNNs). The method has significant feature extraction times despite achieving 88.8% accuracy. Yin et al. [8] responded by proposing an SSD detector for jujube tree disease detection that uses texture features acquired by transfer learning in the backbone network to cut the detection time down to 0.14 s. Nevertheless, this approach increased the model depth by adding a lot more preset anchor boxes to the feature maps. To address this, Wu et al. developed a spatial collaborative attention detection model based on DETR (DS-DETR), which utilizes pre-trained Transformer structures to extract disease features [9]. With an average accuracy gain of 9.52% over the baseline DETR model, the model evaluates disease severity by comparing the ratio of plant leaf area to lesion area.

In the realm of deep learning-based detection, semantic segmentation, image classification, and object detection have all played significant roles. Semantic segmentation provides precise boundary information for leaf lesions, making it particularly suitable for irregularly shaped disease spots. Ref. [10] proposed a lightweight unsupervised learning framework that automatically generates apple disease cues by utilizing an improved high-frequency attention mechanism and contrastive learning. Another study constructed a pixel-level annotated leaf disease dataset, employing both supervised and weakly supervised learning for the semantic segmentation of lesions [11]; a fusion segmentation network combining CNN and Vision Transformer (ViT) architectures. Ref. [12] used patch segmentation, histogram-based lesion localization, and ROF filtering noise reduction to accurately identify plant leaf diseases. A DbneAlexnet classifier was used for classification after segmenting lesions using a U-Net network optimized with a Gradient-Golden Search Optimization technique. Another method has confirmed the effectiveness of the integrated framework for disease segmentation and classification of tomato plant leaves [13]. Semantic segmentation has advantages, but it also has drawbacks, including high annotation costs and difficulty with edge device deployment. Another method for identifying plant leaf diseases is image classification. By combining logistic regression with hue moment data and a ResNet-50 network to create bounding boxes for areas of interest (ROIs), research on infected wheat plants obtained 99.8% classification accuracy [14,15]. A multivariate GrabCut algorithm was used to handle image occlusion and accomplish accurate segmentation, and the algorithm created an enhanced INC-VGGN network linked with a Kohonen learning layer. However, it is challenging to directly use image classification with precision spraying settings since it cannot offer location information about the disease. Using a multivariate GrabCut algorithm handles image occlusion and accomplishes accurate segmentation. On the other hand, object detection methods provide higher computing efficiency and more accurate illness location. With the addition of unsupervised pre-training, spatially modulated co-attention, and relative position encoding, DETR—a Transformer-based detection technique—performed well in tomato leaf disease segmentation tasks [16]. An effective method for accurately diagnosing rice illness was developed using a single-stage Faster R-CNN model [17]. The MEAN-SSD model, which reconstructs Inception modules, has shown robust detection performance on apples’ simple and complex leaf diseases [18]. YOLO-JD improved the feature extraction module through spatial pyramid pooling, achieving significant progress in multi-scale feature extraction and fusion [19]. He et al. proposed YOLOv11-RCDWD, which utilizes the RepLKNet module as its backbone and optimizes the attention mechanism, achieving an 85.4% recall rate [20]. YOLOv11 has demonstrated improvements in detection speed and accuracy, leading to enhanced feature extraction capability, inference speed, and task versatility [21]. For the C3K2 module, the input feature map is divided into two parts, which are subsequently concatenated to fuse features [22], enabled by the use of depthwise separable convolutions (DWC) [23] and model pruning for lightweight design [24]. For the Neck, optimization of the C2PSA module allows it to excel in detection performance within complex backgrounds or small target scenarios [25].

The aforementioned approach usually chooses plant damaged leaves under specific parameters, such as clean backdrops and indoor plant illnesses; however, it performs poorly in real-world settings. This research creatively presents the ATD-Net network to efficiently enhance the detection performance of plant diseases in various scenarios, situations, and categories. Our approach strikes a compromise between detection accuracy and model complexity by including downsampling convolution, Gabor filters, and dynamic boundary loss adjustment regression boxes. The following are the primary contributions: (1) Proposed downsampling convolution (ADown) to feed back the features of interest to Gabor and project them onto Transformer; (2) Employed innovative dynamic boundary loss function was proposed to dynamically adjust disease prediction boundaries in the backpropagation of multi-layer features, further improving detection accuracy; (3) Conducted multiple dataset experiments and designed ATD-Net with edge-device compatibility constraints (model size 5.2 MB, 6.5 GFLOPs, and architecture aligned with NVIDIA Jetson Xavier NX requirements), providing a lightweight solution suitable for future deployment in agricultural applications.

2. Data Acquiring

The data set is crucial for testing our methodology. In actual agricultural production, plant disease monitoring requires continuity and timeliness, which requires sufficient data and high-quality samples. However, obtaining high-quality disease sample data is quite difficult. According to agricultural standards, early diseases require more frequent monitoring (such as daily inspections). During high-risk disease seasons (such as warm and humid periods), relevant personnel should visit the fields every three to five days, take photos, and record disease data. Although automated systems installed on drones or fixed stations can perform high-frequency and large-scale monitoring, manual inspection cannot meet this frequency requirement. Therefore, our methods of obtaining data include online search and self-capture, which provide necessary assistance for model detection.

2.1. Self-Built Dataset

The self-built dataset was sourced from an outdoor wild tea field in Shanghai, China, and online resources. We used a standard mobile phone to take the pictures, which are all in JPG format, have a resolution of 640 × 640, and weigh around 40 KB. We gathered 8302 photos of plant illnesses, comprising 14 disease categories. Each type of disease is represented by a number of photos in the dataset, which is annotated following data cleaning. Every species in the database has a roughly uniform distribution.

In real-world application settings, the device’s input image resolution should be at least 640 × 640 pixels, and in order to prevent overexposure or excessive shadows, a camera with at least 5 million pixels should be employed. The model handles 1080 P video in real-time at 30 frames per second on the NVIDIA Jetson Xavier NX. To enable real-time on-site reconnaissance, it is advised to utilize milliseconds per image for mobile devices if the inference time is less than 50. The gathered photos can be utilized on drones or stationary monitoring stations once the model has been trained. To maintain inference speed, GPU acceleration is advised. In order to conduct experiments with the aforementioned datasets, the three overall datasets were divided into proportions of 8:1:1 for self-built dataset training, testing, and validation, respectively. These datasets provide a sound foundation for evaluating the model which can detect multiple plant diseases in real-world field scenarios, enhancing its professionalism and applicability.

With the assistance of Professor Guo Jianwei’s team at Yunnan Forestry University, we manually annotated the disease areas using the LabelImg(1.8.6) tool, confirmed the accuracy and completeness of disease classification, and annotated the bounding box in order to further enhance the quality and reliability of the dataset [26]. Common plant leaf disease data examples from our self-built dataset are shown in Figure 1, which includes several disease spots, complicated scenarios, and lighting settings. The screen display of two distributions from our own dataset is shown in Figure 2.

2.2. PlantVillage Dataset

PlantVillage offers a wealth of standardized leaf data covering fourteen different economic crop varieties. There are 26 distinct illnesses among the 54,306 photos of plant diseases. Experts have annotated the data, which comes from the official Kaggle page. Although PlantVillage has a big scale, excellent quality, and a fair distribution of categories, it is limited by a monotonous background [27].

2.3. PlantDoc Dataset

In contrast to PlantVillage, PlantDoc emphasizes the intricacy of real-world leaves with background noise from weeds and dirt. It includes 13 species in 18 categories, 17 of which are diseased and one of which is healthy [28]. However, the total number of 2598 is not enough to satisfy the number and distribution requirements of the model.

To make it easier to describe the illness distribution across three datasets, we have gathered statistics and made a line graph (Figure 2).

3. System Architecture

The three layers of ATD-Net—the convolutional layer, the attention mechanism layer, and the loss function layer—are proposed in this study. Enhancing the textural characteristics of plant leaf diseases is the goal.

This paper uses YOLOv11n (the nano variant) as our baseline model, selected for its lightweight design (model size 5.2 MB, 2.58 M parameters) and suitability for deployment on resource-constrained devices. The YOLO series are used to extract picture characteristics in the convolutional layer. In order to preserve YOLO’s lightweight characteristics in the fine-grained situation of plant leaf disease detection, this study reconstructs its convolution structure into an ADown convolution. The ADown convolution is a downsampling module that makes use of pooling and multiple convolutions. It is essential for adding features to the model that take the place of conventional regular convolution.

The attention mechanism layer weighs the discriminative textural characteristics of sick leaves, such as their round, irregular, water-stained, and fuzzy edges, as well as their roughness, dryness, concavity, and convexity in the disease region. After collecting the down sampled information, ATD-Net additionally takes into account the textural feature that plant leaf diseases possess in order to demonstrate the goal of multi-feature network design. To acquire local positional information, the baseline YOLOv11 uses the Transformer attention mechanism. However, it struggles to capture long-distance characteristics, multi-scale flexibility, and adequate spatial relationship modeling, particularly for fine-grained plant leaf texture disease features.

Texture characteristics taken from the two layers above are connected to the detection head in the loss function layer. The loss parameters for bounding box regression are dynamically modified to produce more precise bounding box localization. Even while basic YOLOv11 optimizes the detection method of region proposal reclassification and uses the Intersection over Union (IoU) loss, it still has trouble dynamically adjusting bounding boxes with confirmed weights, which leads to comparatively low bounding box localization accuracy. To solve this issue, a single-stage detector prediction branch is used to estimate the precise location of crop illnesses and construct focal loss through three stages: quality estimation, classification, and localization. This approach is motivated by Generalized Focal Loss (GFL). Experiments have shown that this method helps improve plant disease detection performance.

We optimized and rebuilt the original YOLOv11 model using the previously specified design philosophy. We allowed the modified convolutional layers to collect features, enhance textural features using Gabor, and then optimize the loss function while maintaining the original model’s other structures. Figure 3 illustrates how this model optimization approach offers theoretical support for later experimental verification.

4. Texture Feature Enhancement

After optimizing and reconstructing the model, the next step is to execute feature extraction, feature fusion, and feature enhancement.

4.1. Feature Extraction

In this work, important textural characteristics of leaf diseases are extracted and preserved using adaptive downsampling (ADown) [29]. The input picture is converted to tensor form, the input channel is set to 16, and the output is still in tensor form with the same number of output channels. This is mostly due to ADown’s dual branch parallel structure, which may alleviate the issue of texture feature loss in conventional sampling procedures and more precisely capture texture features that change to illness characteristics. ADown is displayed in Figure 4.

ADown uses a 2 × 2 convolution kernel for average pooling, creating a steady input for branch sampling, and doing minor smoothing and denoising in ordr to suppress local peak responses. The same spatial data is then divided into two functionally different routes, which extract convolutional semantic features and pool saliency characteristics, respectively. Controlling computational complexity is advantageous for this method. A convolution with a stride of two is used to do downsampling and semantic extraction for the convolutional route of the left branch. Saliency areas like borders, textures, and tiny places are highlighted for the pooling route of the right branch. Ultimately, SoftMax smoothing processing is followed by feature fusion and output.

Suppose the input dimension is 4, denoted as

X \in R^{B \times C \times H \times W}

, Where B represents the batch size, C denotes the number of channels, H stands for the image height, and W signifies the image width. ⊕ denotes the summation of feature channels.

The known parameter quantity consists of two parts: weights and bias terms. Compared with standard convolution quantization, we separately count the parameter quantities of standard convolution and ADown convolution.

For

{C o n v}_{3 \times 3}

,

P a r a m s_s t a n d a r d = (C \times C \times 3 \times 3) + C

For

{A D C o n v}_{3 \times 3}

, Calculate the left branch and the right branch separately.

\begin{matrix} P a r a m s_a d o w n (l e f t) = (C / 2 \times C / 2 \times 3 \times 3) + C / 2 \\ P a r a m s_a d o w n (r i g h t) = (C / 2 \times C / 2 \times 1 \times 1) + C / 2 \end{matrix}

After sorting out the results, we obtain the results.

\begin{matrix} P a r a m s_{s t a n d a r d} = 9 C^{2} + C \\ P a r a m s_a d o w n = (5 C^{2} / 2) + C \end{matrix}

FLOPs is also an important indicator reflecting parameter changes, and by comparing it with standard convolution, the attenuation changes in parameters can be discovered. The standard convolution uses a 3 × 3 convolution kernel with 64 input channels and 128 output channels. The input and output feature map sizes are 160. ADown convolution uses a 3 × 3 convolution kernel, with an input channel of 32 and an output channel of 32. The size of the input and output feature maps is 160.

{F L O P s}_{C o n v} = 2 \times C_{i n} \times C_{o u t} \times K_{h} \times K_{w} \times H_{o u t} \times W_{o u t}

Put the data into the equation and calculate

{F L O P s}_{C o n v}

is 3.775 GFLOPs.

\begin{matrix} {F L O P s}_{A D o w n} = 2 \times C_{l e f t_i n} \times C_{l e f t_o u t} \times K_{l e f t_h} \times K_{l e f t_w} \times H_{l e f t_o u t} \times W_{l e f t_o u t} \\ + 2 \times C_{r i g h t_i n} \times C_{r i g h t_o u t} \times K_{r i g h t_h} \times K_{r i g h t_w} \times H_{r i g h t_o u t} \times W_{r i g h t_o u t} \end{matrix}

Put the data into the equation and calculate

{F L O P s}_{A D o w n}

is 0.524 GFLOPs. Compared with

{F L O P s}_{C o n v}

,

{F L O P s}_{A D o w n} s^{'}

GFLOPs attenuation rate is 86.12%.

It is evident that ADown reduces the computational parameters of the model and greatly facilitates the extraction of important feature information since the number of ADown parameters produced via calculation is fewer than that of ordinary convolution parameters.

4.2. Texture Fusion

Original YOLOv11 does not work well, as there is an uneven distribution of attention weights, a loss of fine-grained texture features during model training, and the submergence of key features. Therefore, this paper employs a Gabor filtering module to enhance the texture features of leaves, which improves the bio visual biomimetic characteristics of plant diseased leaves and simulates the human visual system’s perception mechanism of texture better [30]. In addition, Gabor has multi-scale and multi-directional analysis capabilities, which can generate various combinations of filtering kernels, thus filling the gap in deep learning. This paper rebuilds the Gabor algorithm, incorporating new algorithm into a separate module before the Transformation. Thus, YOLOv11 uncaptured long-distance features can be captured, and powerful receptive fields can be provided [31]. Figure 5 shows the detailed processing structures and steps.

The difference between the two branches is that branch two extracts more texture features, as it utilizes the feature extracted from the improved backbone layer for smoothing processing before passing it to the Transformer. It enhances the advantage of multi-directional texture analysis and provides dependencies for the Transformer to extract long-range features later [32]. In order to achieve multi-directional texture, the processed features were segmented by SPPF, with some features directly passing to the Transformer for feature capture, while the other features undergo fine-grained texture feature extraction through Gabor filtering.

Suppose the sampling point is

N = H \times W

, and if

C > 1

then convert it to a grayscale value

X_{gray}

, the pixel coordinates is

(x, y)

, and the core center is in the origin. Then key steps for implementing the Gabor filtering module are as follows.

Firstly, calculate the feature information in the direction of

θ_{i}

on the

x_{θ}

and

y_{θ}

coordinate axes; finally, complete the coordinate transformation.

θ_{i} = \frac{i π}{N_{θ}}, i = 0, 1, 2, \dots, N_{θ} - 1

(1)

x_{θ} = x c o s θ + y s i n θ

(2)

y_{θ} = - x s i n θ + y c o s θ

(3)

After obtaining the directional sampling points and rotational coordinates, it is necessary to apply the parameters to the spatial domain Gabor kernel. By performing a two-dimensional Fourier transform on

g_{θ}

and centering it, the Gabor kernel can be used as convolution weights. The Gabor parameters shown in Figure 5 were determined through grid search on the mAP50 metric on the validation set of a self-built dataset. A combination search was conducted within a reasonable range for scale σ, wavelength λ, and kernel size, and the parameter set that optimized the detection performance on the validation set was ultimately selected.

In the following formula,

σ

is the filtering scale,

λ

is the wavelength, and

ψ

is the phase.

g_{θ} (x, y) = e^{(- \frac{x_{θ}^{2} {+ y}_{θ}^{2}}{{2 σ}^{2}})} c o s (\frac{2 π}{λ} x_{θ} + ψ)

(4)

Then, come to the processing of spatial grayscale transformation, multi-directional convolution, normalization, and activation function.

X_{g r a y}

obtained the grayscale image of the disease image. Then,

T

aims to calculate the product of the grayscale image and the direction vector based on

X_{g r a y}

. Finally, T’ performs batch normalization and smoothing processing using an activation function.

X_{g r a y} = \frac{1}{C} \sum_{c = 1}^{C} X (:, c :, :, :)

(5)

T = X_{g r a y} * x_{θ_{0}} + X_{g r a y} * x_{θ_{1}} + \dots + X_{g r a y} * x_{θ_{N_{θ - 1}}}

(6)

T' = R e L U (B N (T))

(7)

where * denotes convolution.

Secondly, the original features are fused with Gabor texture features through

1 \times 1

convolution, and linear mapping is performed based on a multi-head Transformer:

X^{'} = R e L U (B N ({C o n v}_{1 \times 1} ([X, T'])))

(8)

X^{'} \overset{{C o n v}_{1 \times 1}}{⎯} > [Q, K, V] ⎯ > {\begin{matrix} Q \in R^{B \times H_{h e a d s} \times d_{k} \times N}, \\ K \in R^{B \times H_{h e a d s} \times d_{k} \times N} \\ V \in R^{B \times H_{h e a d s} \times d_{h} \times N} \end{matrix}

(9)

In the aforementioned formula,

Q

,

K

,

V

represent the query, key, and value in the Transformer, respectively. The dimension of each attention head is

d_{h} = \frac{C}{H_{heads}}

, and the dimension of the key or query is

d_{k} = ⌊ α d_{h} ⌋

, where

α

denotes the scaling factor of the attention head.

Finally, combine the information from each attention head and output the projection results through value aggregation and positional encoding.

\begin{matrix} Y = {C o n v}_{1 \times 1} (M e r g e_H e a d (R e s h a p e (V * S o f t m a x (\frac{Q^{T} K}{\sqrt{d_{k}}}))) \\ + {D W C o n v}_{3 \times 3} (r e s h a p e (V))) \end{matrix}

(10)

Figure 6 will express the mathematical expression in an algorithmic way, making Gabor easier to understand.

A technique for improving texture features is used, which is based on the theoretical analysis mentioned above. In order to show the textural properties of plant disease regions, this method uses principal component analysis (PCA) and statistical analysis techniques to generate feature maps from the intermediate layers of a deep neural network. Figure 7 shows the experimental results, which shows that this technique may successfully capture the textural characteristics of diseased regions, offering a crucial foundation for the identification of plant diseases.

4.3. Bounding Box Regression Optimization

Inspired a dynamic boundary loss module with a dynamic bounding box adjustment emphasis [33]. It adjusts to the detection requirements of fine-grained multi-scale plant disease characteristics based on the fused features. In order to complete the repositioning of coordinates and bounding boxes, the model must be able to dynamically modify the discrete probability distribution. The model must recalculate the regression values using temperature coefficients and weighted convolution after adjusting the probability distribution for each bounding box coordinate based on the input coordinates, including entropy calculation, adaptive EMA momentum, and the normalization of entropy.

From the perspective of theoretical design objectives, the obtained feature tensor x is split into dimensional features that conform to YOLO object detection, followed by the calculation of the basic probability distribution. To embody the ‘dynamic’ idea, we introduce a temperature coefficient during the probability distribution calculation to control the smoothness of the probability distribution,

P_{i}

is the probability value of the ith bin after temperature adjustment.

P_{i} = \frac{e^{\frac{z_{i}}{T}}}{\sum_{j = 1}^{c 1} e^{\frac{z_{j}}{T}}}

(11)

where

z_{i}

is the logical value of the ith bin,

T

is the temperature parameter, and

T = c 1 = 16

. Based on the probability, it is convenient to calculate the information entropy, estimate the difference between the predicted probability and the true probability, and complete the next step of entropy normalization and power transformation:

H (P) = - \sum_{i = 1}^{c 1} P_{i} l o g P_{i}

(12)

\hat{H} = {[\frac{H (P)}{l o g (c 1)}]}^{γ}

(13)

where

γ

controls the degree of nonlinearity influenced by entropy, and

γ

is randomly reselected from the set

{0.5, 0.8, 1.1, 1.6, 2.0}

in each batch and is only used for inference. If

γ \geq 1

, then it amplifies the influence of entropy, resulting in a smaller adjustment coefficient for high-entropy samples. If

γ < 1

, then it diminishes the influence of entropy, leading to a relatively larger adjustment coefficient for high-entropy samples.

Thus, the dynamic coefficient can be adjusted by combining the number of training rounds (t) and the average entropy of the current batch (b)

{\hat{H}}_{b a t c h}

, and complete the calculation of the adaptive EMA mechanism.

H_{E M A}^{(t)} = H_{E M A}^{(t - 1)} + 1 % \cdot {\hat{H}}_{b a t c h}

(14)

Finally, adjust the probabilities again to complete the calculation of weights and regression values, and ultimately determine the coordinate output and repositioning.

P^{'} = P_{i} \cdot (1 - H_{E M A}^{(t)}) + Δ t a r g e t

(15)

The result obtained is as follows, shown in Figure 8.

5. Disease Detection

The model training phase follows the framework creation, the investigation of the core module mechanism, and the verification of mathematical derivations. Based on the dataset we gathered, 100 iterations of iterative model training are carried out using sensible training techniques and hyperparameter tuning. In addition to predicting the test set data, the trained model is also utilized to measure important metrics including model accuracy and generalization capacity.

5.1. Experiment Environment

ATD-Net model on multi class datasets is run on the operating system of Linux Ubuntu 22.04 LTS, with CPU 7 core Intel (R) Xeon (R), NVIDIA GeForce RTX 3090 (24 GB), and the coding tool is PyTorch 2.2.1, CUDA 12.1, DIE with PyCharm(2024.3.6), and Anaconda 23.5.2.

The optimizer is SGD, and the resolution is 640 × 640. The weight decay coefficient is 0.0005, the momentum factor is 0.937, the fixed batch is 16, and the starting learning rate is 0.01. Through repeated testing, it was discovered that ATD-Net’s prediction results on the dataset converged after about 78 rounds.

5.2. Detection Results

The experiment data is from eight different models and three types of samples based on a trained weight model (best. pt), including samples with complex backgrounds, clean backgrounds, and small lesions. In order to systematically evaluate the true detection performance of the ATD-Net model and visually demonstrate the performance differences between different models and the proposed ATD-Net, The YOLO prediction results are calculated based on confidence scores.

C o n f i d e n c e = α \times P_{o b j} \times {I O U}_{p r e d i c t i o n}^{t r u t h}

(16)

where α is the task alignment coefficient, used to balance the contribution weights of classification and localization, so that the confidence level can reflect the comprehensive quality of the prediction box better.

Figure 9 compares the detection results of various models, beginning with the left side of the first line, including Faster R-CNN, YOLOv5, YOLOv8, YOLOv10, YOLOv11, YOLOv12, DETR, and ATD-Net. Figure 9a–c shows the detection performance of different models in complex backgrounds, clean backgrounds, and early disease detection. These bounding boxes are actually useful for forecasting the course of diseases. Every box identifies a certain illness. The prediction box with the highest degree of confidence is chosen when more than one box appears on a single leaf.

It is evident that Faster R-CNN has the lowest confidence score in both false positives and false negatives. The multi-detection bounding box has a maximum confidence score of just 0.66. The best-performing model in the YOLO series is YOLOv11 because of its robust backbone extraction network, which more effectively combines various feature information, whereas YOLOv5 lacks feature alignment techniques. Overall, ATD-Net outperforms YOLOv11 in terms of accuracy and recall, despite YOLOv11’s notable advancements in detection performance.

Confidence heatmap is a spatial visualization of the target confidence/category probability output, which intuitively shows the likelihood of different positions in the image having targets. The generation of heat maps consists of three steps: numerical extraction, normalization, and color mapping. Confidence scores are first extracted for each feature map location

P_{o b j} \times {I O U}_{p r e d i c t i o n}^{t r u t h}

. Then, numerical normalization can be written as:

M_{n o r m} (i, j) = \frac{M (i, j) - m i n (M)}{\max (M) - m i n (M)}

(17)

In the parameters of the heat map,

m i n (M)

is the minimum value in matrix M,

m a x (M)

is the maximum value in matrix M,

M_{n o r m} (i, j)

is a normalized response value and is strictly within the range of

[0, 1]

. Finally, the experiment will normalize

M_{n o r m} (i, j)

by calling the hot algorithm integrated in Python(3.9) and implementing it through a color mapping function.

In contrast to Figure 9’s presentation, Figure 10 compares the heatmap findings of several models by using color gradients to graphically represent data density and distribution. Fine-grained flaws are often missed during the model’s feature extraction step because of the tiny pixel ratio, ambiguous texture geometric properties of the chosen sample, and background information. As a result, heat maps can better show the severity of illnesses through variations in color concentration while ignoring background noise interference. This enables agricultural professionals to promptly detect illnesses and implement efficient solutions. The heat map feature distribution of ATD-Net is more robust and concentrated than that of conventional YOLO series models like YOLOv10 and YOLOv12n, which is the key feature information obtained through downsampling, and the fine-grained texture features of the disease are enhanced, making it more dynamic and accurate to locate the disease area. Starting from the left side of the first line, including Faster R-CNN, YOLOv5, YOLOv8, YOLOv10, YOLOv11, YOLOv12, DETR, ATD-Net.

6. Experiment and Discussion

6.1. Evaluation Metrics

In order to evaluate the performance of the model, measures of P (precision), R (recall), F1-Score, and mAP are used. The confusion matrix, also known as the error matrix, compares the classification results with the actual situation and calculates the performance score based on a series of test data [34]. By comparing the predicted results of the model with real labels, it intuitively displays disease classification and the detection of true positive (TP), false positive (FP), false negative (FN), and true negative (TN). Then the value of P, R and F1-Score, AP and mAP can be calculated as follows:

P (P r e c i s i o n) = \frac{T P}{T P + F P}

(18)

R (R e c a l l) = \frac{T P}{T P + F N}

(19)

F 1 - S c o r e = \frac{2 P R}{P + R}

(20)

A P = \sum_{i = 1}^{N} P_{i}

(21)

m A P = \frac{1}{N} \sum_{i = 1}^{N} {A P}_{i}

(22)

In addition to the confusion matrix, GFLOPs is also an important metric for model evaluation. GFLOPs is one billion times the number of floating-point operations (FLOPs) performed by a model during one forward inference, reflecting the magnitude of the model’s computational load. GFLOPs can be calculated as the complexity of different models or the same model under different input sizes. It can be written as.

F L O P s = 2 \times C_{i n} \times C_{o u t} \times K_{h} \times K_{W} \times H_{o u t} \times W_{o u t}

(23)

G F L O P s = \frac{\sum_{i} F {L O P s}_{i}}{10^{9}}

(24)

6.2. Evaluation Results

6.2.1. Ablation Experiment

This paper employed ablation experiments on a self-built dataset to confirm the efficacy of certain modules in the ATD-Net network. The contribution of each module and combination module to the overall model performance may be assessed by methodically eliminating certain modules through ablation experiments while maintaining the same experimental conditions and settings. Additionally, it is possible to confirm the model’s effectiveness and enhance its interpretability. The performance and complexity index of ATD-Net under ablation tests are shown in Table 1, where Adown is labeled as A, Gabor is labeled as G, and DBL is labeled as D. The labels “+” and “−” denote the addition and removal of the module, respectively.

The experimental findings show how the network structure is affected by both solo and coupled modules. Using the same dataset, the ADown module’s independent application yielded an accuracy of 83.3% with a parameter count of 2,502,763. This suggests that after downsampling, the ADown module may successfully lower model complexity. The model obtained an accuracy of 84.2%, a recall rate of 79.64%, and a mAP50 of 86.88% while utilizing the Gabor module only. Gabor provides attention weights for self-attention processes and improves the model’s capacity to collect fine-grained texture data based on downsampling convolution. Its accuracy falls between that of C3K2 and Transformer. This makes it possible for the model to consider significant local texture characteristics, particularly when it comes to early leaf disease detection. Nevertheless, the extra Gabor module raised the ATD-Net model’s parameter count to 2,610,234 and its volume to 5.3 MB, increasing the model’s computational demand and complexity. The ATD-Net model achieves an accuracy of 83.7% and mAP of 86.92% when DBL is excluded. The number of frames that may be processed per second is somewhat decreased since the expected bounding box must be dynamically adjusted. However, DBL performance is better than the baseline model.

The combination of modules highlights the cooperation and constraints among components. Combining ADown with Gabor improves both the recall rate and model accuracy, which initially reaches 84.8%. The model parameters are greatly decreased because of ADown’s downsampling convolution. This suggests that ADown and Gabor work together to improve the capacity to extract multi-level feature information while significantly lowering model complexity. The model’s performance metrics somewhat improve when ADown and DBL are coupled, but the lengthy frame rate processing time issue persists and might result in detection delays when deployed to edge devices. Simultaneous use of Gabor and DBL significantly improves the complexity index while increasing the model accuracy and mAP. In addition to attaining detection performance benefits over baseline models, this further validates the efficacy of texture feature perception and dynamic boundary modification in identifying plant leaf diseases in actual scenarios. Despite having slightly greater accuracy than YOLOv11+ADown+Gabor, YOLOv11+Gabor+DBL has somewhat slower inference speed due to slightly higher mAP50 and mAP50:95. As a result, module DBL is still appropriate and required. The performance of different indicators is further optimized and a balance between detection accuracy and model complexity is achieved when ADown, Gabor, and DBL operate together. Additionally, its frame rate is higher than that of the baseline model, reaching 303.23 FPS, which satisfies the real-time criteria for more accurate plant disease detection in actual fields and demonstrates quick reaction and effective processing. This method achieves comprehensive and more accurate disease identification for edge devices with limited computing power.

In conclusion, the three enhanced ATD-Net network modules’ average accuracy on the test set is 87.42%, their recall rate is 78.99%, and their F1-Score is 79.91%. It maintains a respectable number of parameters at 2,634,332 and 6.5 GFLOPs while striking a balance between performance and complexity. For real-world scene tasks in edge computing applications, the ATD-Net network offers a high-quality, lightweight solution.

The model’s performance disparity may be intuitively reflected by variations in the indicators of various modules at various training phases. The accuracy variation is seen in Figure 11a. Figure 11b illustrates the variance in recall rate. The variation curves of mAP50 and mAP50–95 for various module combinations are displayed in Figure 11c and Figure 11d, respectively, following many iterations of training. Due to the YOLOV11 baseline model, the depiction demonstrates that the ATD-Net network model based on accuracy (P), recall (R), and average accuracy (mAP50, mAP50–95) performance measures works better than any other module combination. This suggests that when it comes to plant disease target identification tasks, the ATD-Net network model performs better overall.

6.2.2. Comparison Experiment

We compared ATD-Net with other deep detection models, including VGG, ResNet, Faster R-CNN, and YOLO, to further confirm the efficacy of the suggested approach. We also verified the model’s professionalism and illness detection using two publicly available datasets. The comparison experiment (Group 1) using tomato disease data in the PlantVillage dataset is summarized in Table 2, which includes key assessment metrics including accuracy, GFLOPs, and mAP50.

Table 2 demonstrates that two traditional deep convolutional neural network models, VGG [35] and ResNet [35], obtain comparable mAP50 values of 96.5% and 97%, respectively. We also contrasted YOLOv10m, the heavyweight YOLO model [37]. They use more resources for a real-time plant disease detection system with 2.885 M parameters and 9.6 GFLOPs, but have somewhat greater accuracy and mAP50 than YOLOv11n [37]. Refs. [35,38,39] are variants of the YOLO model designed by different researchers, demonstrating accurate detection capabilities in tomato leaf diseases. Due to limited relevant data, it is not difficult to observe through a comparison of key indicators that YOLO-LeafNet achieves good results in accuracy and mAP50, with 98.5% and 99% respectively, while NCA’s mAP50 is slightly higher than YOLO-LeafNet. But RRDN’s accuracy is just 95%. With an accuracy of 99.29%, ATD-Net outperforms all other models in identifying illnesses in the PlantVillage dataset. Furthermore, both mAP50 and mAP50–95 are superior to the baseline model and other models, however our model has 0.2% more parameters and 0.2 GFLOPs than the YOLOV11 baseline model. This also shows how useful it is in contexts with limited resources and successfully resolves the conflict between computation and efficiency.

Table 3 compares ATD-Net with deep learning models of various scales on the PlantDOC dataset and under the same conditions.

We also evaluated ATD-Net on the PlantDOC dataset to investigate further data testing, and we demonstrated its efficacy by contrasting it with other models. Although Faster R-CNN [42], a common two-stage detection model, has shown promise in several domains, its test results for various plant disease categories in the PlantDOC dataset are subpar, with a mAP50 of 45%. Conversely, for infrared remote control detection, RTDETR-L [43] is more frequently utilized. It still performs well in plant disease identification because of its high computational complexity, with an accuracy of up to 63%, but at the expense of using more resources. With a parameter size of 2.705 M and an accuracy of 38.2%, YOLOv10 [44] is not as good at detection in this region as YOLOv11. With a recall rate of up to 67.1%, the PPD-YOLO model showed strong benefits in lowering missed detections. Nevertheless, its 23.882 M parameter size makes it unsuitable for edge device real-time detection. WD-YOLO offers a respectable range of characteristics when compared to [44], but accuracy, recall, and mAP50 indicators may still be improved. Our ATD-Net is still built on two pillars: lightweight deployment and effective detection. In addition, the recall rate of ATD Net is 66.9%, which is higher than [45], reflecting that the ATD Net model can obtain more comprehensive disease data. Table 3 shows that ATD-Net’s precision on the PlantDoc dataset is 52.2%, which is somewhat less than YOLOv11n’s 53.6%. Based on our investigation, we think that the model design goals and the features of the PlantDoc dataset are the main causes of this phenomena. The PlantDoc collection mostly comprises complicated backdrops, noise interference, and low-quality photos. The goal of ATD-Net’s Gabor texture enhancement module is to improve Recall and mean Average Precision (mAP) by increasing sensitivity to early lesions and weak texture characteristics. However, this enhancement mechanism may also introduce misidentifications of background textures in complex backgrounds, leading to some negative samples being incorrectly identified as positive samples, which in turn slightly reduces Precision. This indicates that in the current task, ATD-Net pays a small price in Precision while improving detection coverage.

6.3. Discussion

In order to address the issues of low efficiency, high cost, and high detection accuracy impacted by environmental interference in conventional disease detection methods, we proposed ATD-Net, a lightweight detection network that detects plant leaf diseases through texture enhancement mechanism. ATD-Net strikes a notable compromise between detection accuracy and model complexity by including downsampling convolution (ADown), Gabor filters, and dynamic boundary loss adjustment regression boxes.

6.3.1. Analysis of the Effectiveness of Model Design

The experimental findings in Table 1 demonstrate that the ADown module reduces the parameter count by 3.2% compared to the baseline, while maintaining a comparable mAP50 (86.65% vs. 86.89%). The cooperative design of average pooling and branch convolution, which lowers noise while preserving important semantic information, is responsible for this decrease in model complexity. We used Grad-CAM to view the feature maps before and after ADown processing in order to further confirm that ADown maintains important texture characteristics. Texture information is successfully maintained, as evidenced by the data, which demonstrate that the stimulated regions stay focused on lesion locations. This demonstrates that in lightweight object identification networks, well-crafted lightweight downsampling procedures might be preferable to conventional convolutions, particularly for deployment on edge devices with limited computing resources.

The fine-grained characteristics of leaf diseases are intended to be extracted by the Gabor texture enhancement module. The addition of this module alone (G) increases recall from 76.42% to 77.60% and mAP50 from 86.65% to 86.99%, according to ablation experiments (Table 1). More significantly, the recall improvement increases from 68.3% to 71.5% when assessing on a subset of early-stage illness samples (defined as lesions with an area smaller than 5% of the leaf). This suggests that the Transformer attention method directly improves the model’s perception of complicated texture and morphological aspects by including direction-sensitive Gabor filtering, which makes up for YOLOv11’s poor prior knowledge in particular domains. The heatmap display further supports the dual benefits of global context modeling and local texture perception in Figure 9, where ATD-Net exhibits more concentrated activation on lesion areas compared to baseline models.

The Dynamic Boundary Loss (DBL) module adaptively adjusts the probability distribution of bounding boxes by introducing a temperature coefficient and information entropy, enabling the model to capture localization uncertainty during training. Experimental results (Table 1) show that the inclusion of DBL alone (D) improves mAP75 from 75.14% to 75.21% and mAP50–95 from 70.75% to 70.70%. When combined with ADown and Gabor (A+G+D), mAP75 further increases to 76.11%, representing a 0.97% improvement over the baseline. To quantify localization accuracy, we computed the average Euclidean distance between predicted and ground-truth bounding box corners. The DBL module reduces this distance from 4.2 pixels to 3.9 pixels on the test set, indicating more precise alignment with lesion boundaries. These results demonstrate that DBL contributes to the accurate separation and localization of overlapping and irregular lesions.

6.3.2. Performance Advantages in Complex Scenarios

In addition to achieving SOTA level accuracy (99.29%) and mAP50 (99.69%) on the PlantVillage dataset with a clean backdrop, ATD-Net also showed great competitiveness on the Plant Doc dataset with a complex background and numerous interferences, according to comparative trials (Table 2 and Table 3). Both its mAP50 (66.1%) and mAP50–95 (55.6%) outperformed YOLO counterparts by the same amount. The texture improvement method, which improves the model’s capacity to differentiate disease spots from noisy background interference of weeds and dirt, is primarily responsible for this. The model’s lightweight, robust feature expression, and high placement accuracy are made possible via multi-module collaboration, which allows it to work steadily in real-world settings with varying illumination, occlusion, and scale.

The visualization results displayed in Figure 9 and Figure 10 provide proof that the suggested method can perform better. While the baseline model and other comparison models are vulnerable to response dispersion, false activation, or missing detection, ATD-Net’s heat map response focuses on the actual lesion area on complicated backdrops and early illness samples.

6.3.3. Limitations

Despite the promising results achieved by ATD-Net, several limitations should be acknowledged. First, a notable performance gap exists between datasets with controlled backgrounds and those with complex real-world scenes. As shown in Table 2 and Table 3, ATD-Net achieves a mAP50 of 99.69% on the PlantVillage dataset (clean background), but this drops to 66.1% on the PlantDoc dataset (complex background with noise and occlusions). This indicates that the model’s texture enhancement mechanism, while effective in controlled settings, is still challenged by extreme background interference and low-quality field images.

Second, the ablation experiments and main performance evaluations are primarily conducted on a self-built dataset, which, although carefully annotated and diverse in disease categories, may limit the generalizability of the conclusions. The dataset is collected from specific regions (e.g., tea fields in Shanghai) and online resources and may not fully represent the global diversity of plant species, disease manifestations, and environmental conditions.

Third, while ATD-Net is designed with edge-device compatibility constraints (model size 5.2 MB, 6.5 GFLOPs), actual deployment on edge hardware such as the NVIDIA Jetson Xavier NX has not yet been performed. Therefore, metrics such as inference latency, power consumption, and real-time performance on embedded platforms remain unvalidated. Future work will involve systematic edge-device deployment and evaluation to further validate the model’s practical applicability in agricultural scenarios.

7. Conclusions

The paper proposes ATD-Net, an improved texture-aware detection network, to address the problems of fine-grained feature extraction, high computational cost, and bounding box localization accuracy in plant leaf disease detection. Three key enhancements are incorporated into the network. First, as shown by the lower parameter count and steady mAP in ablation tests, ADown is used to decrease model parameters and computational complexity while preserving the representational capacity for crucial texture aspects (Table 1). Second, the combination of the Gabor filter and Transformers improves recall and mAP on early-stage illness samples by helping to extract and enhance edge, directional, and texture features in diseased areas (Table 1). Third, the proposed dynamic boundary loss function adaptively adjusts the probability distribution of bounding box regression, resulting in improved localization accuracy under strict IoU thresholds (mAP75) increased by 0.97% in the full model). Experimental results demonstrate that ATD-Net achieves a favorable balance between detection accuracy and model complexity.

In the future, we intend to improve the texture enhancement technique for plant leaf diseases and optimize the Gabor and DBL algorithms. In order to test, apply the algorithm to several categories, and improve the practical quality of ATD-Net by merging the method with features, we still need to gather more data for plant leaf diseases. Additionally, characteristics of sick leaves’ color and form still need to be more complete and discriminating.

Author Contributions

Conceptualization, Y.L. and X.Z.; methodology, Y.L. and X.Z.; software, Y.L. and X.Z.; validation, Y.L. and X.Z.; formal analysis, Y.L. and X.Z.; investigation, Y.L. and X.Z.; resources, Y.L. and X.Z.; data curation, Y.L. and X.Z.; writing—preparation of the original draft, Y.L. and X.Z.; writing—revision and editing, Y.L. and X.Z.; visualization, Y.L. and X.Z.; supervision, Y.L. and X.Z.; project management, Y.L. and X.Z.; securing funding, Y.L. and X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data is available from the authors upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Bandara, A.Y.; Weerasooriya, D.K.; Bradley, C.A.; Allen, T.W.; Esker, P.D. Dissecting the economic impact of soybean diseases in the United States over two decades. PLoS ONE 2020, 15, e0231141. [Google Scholar] [CrossRef]
Shi, N.N.; Du, Y.X.; Ruan, H.C.; Yang, X.; Dai, Y.L.; Gan, L.; Chen, F.R. First report of Colletotrichum fructicola causing anthracnose on Camellia sinensis in Guangdong Province, China. Plant Dis. 2018, 102, 241. [Google Scholar] [CrossRef]
Easlon, H.M.; Bloom, A.J. Easy Leaf Area: Automated digital image analysis for rapid and accurate measurement of leaf area. Appl. Plant Sci. 2014, 2, 1400033. [Google Scholar] [CrossRef] [PubMed]
Sahu, A.K.; Deep, H.; Vishnoi, U.; Saraswat, M. Ayurvedic Plant Leaf Detection Using HOG Feature Descriptor and SVM Classifier. In Proceedings of The International Conference on Recent Innovations in Computing, Jammu, India, 26–27 October 2023; Springer Nature Singapore: Singapore, 2023; pp. 357–370. [Google Scholar]
Rachmad, A.; Syarief, M.; Rifka, S.; Sonata, F.; Setiawan, W.; Rochman, E.M.S. Corn leaf disease classification using local binary patterns (LBP) feature extraction. J. Phys. Conf. Ser. 2022, 2406, 012020. [Google Scholar] [CrossRef]
Ahmed, I.; Yadav, P.K. Plant disease detection using machine learning approaches. Expert Syst. 2023, 40, e13136. [Google Scholar] [CrossRef]
Shrestha, G.; Das, M.; Dey, N. Plant disease detection using CNN. In Proceedings of the 2020 IEEE Applied Signal Processing Conference (ASPCON), Kolkata, India, 7–9 October 2020; IEEE: New York, NY, USA, 2020; pp. 109–113. [Google Scholar]
Yin, Z.B.; Liu, F.Y.; Geng, H.; Xi, Y.-J.; Zeng, D.-B.; Si, C.-J.; Shi, M.-D. A high-precision jujube disease spot detection based on SSD during the sorting process. PLoS ONE 2024, 19, e0296314. [Google Scholar] [CrossRef]
Xie, M.; Wu, J.; Sun, J.; Xiao, L.; Liu, Z.; Yuan, R.; Duan, S.; Wang, L. MFFSNet: A Lightweight Multi-Scale Shuffle CNN Network for Wheat Disease Identification in Complex Contexts. Agronomy 2025, 15, 910. [Google Scholar] [CrossRef]
Wang, J.; Zhang, C.; Yan, T.; Yang, J.; Lu, X.; Lu, G.; Huang, B. A cross-domain fruit classification method based on lightweight attention networks and unsupervised domain adaptation. Complex Intell. Syst. 2023, 9, 4227–4247. [Google Scholar] [CrossRef]
Zhou, L.; Xiao, Q.; Taha, M.F.; Xu, C.; Zhang, C. Phenotypic analysis of diseased plant leaves using supervised and weakly supervised deep learning. Plant Phenomics 2023, 5, 0022. [Google Scholar] [CrossRef]
Arora, A.; Gautam, V. Plant Leaves Disease Detection in Complex Background using Automatic Segmentation and Improved Multi-Scale Feature Fusion Network. Syst. Soft Comput. 2025, 8, 200434. [Google Scholar] [CrossRef]
Badiger, M.; Mathew, J.A. Tomato plant leaf disease segmentation and multiclass disease detection using hybrid optimization enabled deep learning. J. Biotechnol. 2023, 374, 101–113. [Google Scholar] [CrossRef] [PubMed]
Kumar, D.; Kukreja, V. Impact of image segmentation and feature sets in automated plant disease classification: A comprehensive review based on wheat plant images. Prog. Artif. Intell. 2025, 14, 451–504. [Google Scholar] [CrossRef]
Pal, A.; Kumar, V. AgriDet: Plant Leaf Disease severity classification using agriculture detection framework. Eng. Appl. Artif. Intell. 2023, 119, 105754. [Google Scholar] [CrossRef]
Wu, J.; Wen, C.; Chen, H.; Ma, Z.; Zhang, T.; Su, H.; Yang, C. DS-DETR: A model for tomato leaf disease segmentation and damage evaluation. Agronomy 2022, 12, 2023. [Google Scholar] [CrossRef]
Bari, B.S.; Islam, M.N.; Rashid, M.; Hasan, J.; Razman, M.A.M.; Musa, R.M.; Ab Nasir, A.F.; Majeed, A.P.A. A real-time approach of diagnosing rice leaf disease using deep learning-based faster R-CNN framework. PeerJ Comput. Sci. 2021, 7, e432. [Google Scholar] [CrossRef]
Sun, H.; Xu, H.; Liu, B.; He, D.; He, J.; Zhang, H.; Geng, N. MEAN-SSD: A novel real-time detector for apple leaf diseases using improved light-weight convolutional neural networks. Comput. Electron. Agric. 2021, 189, 106379. [Google Scholar] [CrossRef]
Li, D.; Ahmed, F.; Wu, N.; Sethi, A.I. Yolo-JD: A Deep Learning Network for jute diseases and pests detection from images. Plants 2022, 11, 937. [Google Scholar] [CrossRef]
He, L.; Zhou, Y.; Liu, L.; Cao, W.; Ma, J.-H. Research on object detection and recognition in remote sensing images based on YOLOv11. Sci. Rep. 2025, 15, 14032. [Google Scholar] [CrossRef]
Tao, C.; Huang, Z.; Huang, Z.; Chen, M.; Liu, K.; Zhou, M. YOLOv11-STDD: Improved YOLOv11 Small Target Detection for Drones. In Proceedings of the 2025 7th International Conference on Internet of Things, Automation and Artificial Intelligence (IoTAAI), Guangzhou, China, 12–14 September 2025. [Google Scholar] [CrossRef]
Xiao, R.; Wang, H.; Wang, L.; Yuan, H. C3Ghost and C3k2: Performance study of feature extraction module for small target detection in YOLOv11 remote sensing images. In Proceedings of the Second International Conference on Big Data, Computational Intelligence, and Applications (BDCIA 2024), Huanggang, China, 16 November 2024; SPIE: Bellingham, WA, USA, 2025; Volume 13550, pp. 464–470. [Google Scholar]
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
Menghani, G. Efficient deep learning: A survey on making deep learning models smaller, faster, and better. ACM Comput. Surv. 2023, 55, 1–37. [Google Scholar] [CrossRef]
Jian, Y.; Yu, J.; Wen, S.; Zhao, Q.; Hu, H.; Zhang, J. Optimization of C2PSA Module in YOLOv11 Architecture for Semiconductor Wafer Inspection. In Proceedings of the 2025 International Conference on Advanced Mechatronic Systems (ICAMechS), Xi’an, China, 19–22 September 2025; IEEE: New York, NY, USA, 2025; pp. 232–237. [Google Scholar]
Bei, Y.; Yang, F.; Wu, H. Automatic Labeling Method for State Detection Data Set of Guard Plate Based on Yolov11n Improvement. In Proceedings of the 2025 6th International Conference on Internet of Things, Artificial Intelligence and Mechanical Automation (IoTAIMA), Dalian, China, 22–24 August 2025; IEEE: New York, NY, USA, 2025; pp. 132–138. [Google Scholar]
Noyan, M.A. Uncovering bias in the PlantVillage dataset. arXiv 2022, arXiv:2206.04374. [Google Scholar] [CrossRef]
Singh, D.; Jain, N.; Jain, P.; Kayal, P.; Kumawat, S.; Batra, N. PlantDoc: A dataset for visual plant disease detection. In Proceedings of the 7th ACM IKDD CoDS and 25th COMAD, Hyderabad, India, 5–7 January 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 249–253. [Google Scholar]
Gu, W.; Gao, W.; Zou, Y.; Ma, S. ATW-YOLO: Reconstructing the downsampling process and attention mechanism of yolo network for rail foreign body detection. Signal Image Video Process. 2025, 19, 368. [Google Scholar] [CrossRef]
Luan, S.; Chen, C.; Zhang, B.; Han, J.; Liu, J. Gabor convolutional networks. IEEE Trans. Image Process. 2018, 27, 4357–4366. [Google Scholar] [CrossRef] [PubMed]
Fogel, I.; Sagi, D. Gabor filters as texture discriminator. Biol. Cybern. 1989, 61, 103–113. [Google Scholar] [CrossRef]
Wang, Z.; Fu, S.; Fu, S.; Li, D.; Liu, D.; Yao, Y.; Yin, H. Hybrid gabor attention convolution and transformer interaction network with hierarchical monitoring mechanism for liver and tumor segmentation. Sci. Rep. 2025, 15, 8318. [Google Scholar] [CrossRef]
Xu, Y.; Wei, Z.; Li, Z.; Wei, X.; Lu, Y. Dynamic weighting loss for decision boundary adjustment based on robust distance in adversarial training. In Proceedings of the 2025 IEEE International Conference on Multimedia and Expo (ICME), Nantes, France, 30 June–4 July 2025; IEEE: New York, NY, USA, 2025; pp. 1–6. [Google Scholar]
Fan, C.L. Evaluation model for crack detection with deep learning: Improved confusion matrix based on linear features. J. Constr. Eng. Manag. 2025, 151, 04024210. [Google Scholar] [CrossRef]
Kaur, R.; Mittal, U.; Wadhawan, A.; Almogren, A.; Singla, J.; Bharany, S.; Hussen, S.; Rehman, A.U.; Al-Huqail, A.A. YOLO-LeafNet: A robust deep learning framework for multispecies plant disease detection with data augmentation. Sci. Rep. 2025, 15, 28513. [Google Scholar] [CrossRef]
Wang, Q.; Qi, F.; Sun, M.; Qu, J.; Xue, J. Identification of tomato disease types and detection of infected areas based on deep convolutional neural networks and object detection techniques. Comput. Intell. Neurosci. 2019, 2019, 9142753. [Google Scholar] [CrossRef]
Lee, Y.S.; Patil, M.P.; Kim, J.G.; Seo, Y.B.; Ahn, D.-H.; Kim, G.-D. Hyperparameter Optimization for Tomato Leaf Disease Recognition Based on YOLOv11m. Plants 2025, 14, 653. [Google Scholar] [CrossRef]
Bayram, H.Y.; Bingol, H.; Alatas, B. Hybrid deep model for automated detection of tomato leaf diseases. Trait. Du Signal 2022, 39, 1781–1787. [Google Scholar] [CrossRef]
Zhou, C.; Zhou, S.; Xing, J.; Song, J. Tomato leaf disease identification by restructured deep residual dense network. IEEE Access 2021, 9, 28822–28831. [Google Scholar] [CrossRef]
Bellout, A.; Zarboubi, M.; Dliou, A.; Latif, R.; Saddik, A. Advanced YOLO models for real-time detection of tomato leaf diseases. Math. Model. Comput. 2024, 11, 1198–1210. [Google Scholar] [CrossRef]
Song, Z.; Zhu, Y.; Wang, D.; Liu, H.; Jiang, L.; Duan, Y.; Zhang, Z.; Li, S.; Li, J. TCLeaf-Net: A transformer-convolution framework with global-local attention for robust in-field lesion-level plant leaf disease detection. arXiv 2025, arXiv:2512.12357. [Google Scholar]
Li, W.; Zhu, L.; Liu, J. PL-DINO: An improved transformer-based method for plant leaf disease detection. Agriculture 2024, 14, 691. [Google Scholar] [CrossRef]
Miao, Y.; Meng, W.; Zhou, X. SerpensGate-YOLOv8: An enhanced YOLOv8 model for accurate plant disease detection. Front. Plant Sci. 2025, 15, 1514832. [Google Scholar] [CrossRef]
Nguyen, P.T.; Huynh, D.C.; Ho, L.D.; Tran, H.A.; Dunnigan, M.W. Improving the YOLOv11 Model for Detecting Plant Diseases. J. Eng. Sci. Technol. 2025, 20, 1500–1515. [Google Scholar]
Yang, X.; Wang, H.; Zhou, Q.; Lu, L.; Zhang, L.; Sun, C.; Wu, G. A Lightweight and Efficient Plant Disease Detection Method Integrating Knowledge Distillation and Dual-Scale Weighted Convolutions. Algorithms 2025, 18, 433. [Google Scholar] [CrossRef]

Figure 1. Screenshot of our dataset samples. (a) Tea-blister-blight. (b) Tea-mold. (c) Tea-rust. (d) Tea brown-spot. (e) Tomato-leaf-mold. (f) Tomato-early-blight. (g) Tomato-gray-mold. (h) Tomato-anthracnose. (i) Tomato-late-blight.

Figure 2. Shows screenshot of two distributions in our collected dataset.

Figure 3. ATD-Net Model Architecture.

Figure 4. ADown overall structural diagram.

Figure 5. Gabor and Transformer fusion design scheme.

Figure 6. Pseudo-code for the forward propagation algorithm.

Figure 7. Gabor texture enhancement feature map.

Figure 8. Position coordinate diagram of dynamic boundary loss function.

Figure 9. Comparison of Algorithm Prediction Graph Effects. (a) Disease detection results in complex scenes. (b) Disease detection results on a clean background. (c) Early disease detection results.

Figure 10. Comparison of Algorithm Heatmap Effects. (a) Disease heatmap results in complex scenes. (b) Disease heatmap results on a clean background. (c) Early disease heatmap results.

Figure 11. ATD-Net prediction curve graph. (a) precision prediction. (b) Recall prediction. (c) mAP50 prediction. (d) mAP50–95 prediction.

Table 1. Results of ablation experiment data.

YOLOv11+A+G+D	P (%)	R (%)	F1-Score (%)	mAP50 (%)	mAP75 (%)	mAP50–95 (%)	GFLOPs (M)	Parameters	FPS	Model Size (MB)
+ − − −	0.839	0.7642	0.7975	0.8665	0.7514	0.7075	6.3	2,584,882	257.49	5.2
+ + − −	0.833	0.7605	0.7972	0.8689	0.7517	0.7068	6.3	2,502,763	271.30	5.0
+ − + −	0.842	0.776	0.796	0.8699	0.7532	0.7073	6.5	2,610,234	283.52	5.3
+ − − +	0.837	0.766	0.789	0.8692	0.7521	0.7070	6.5	2,600,884	257.56	5.2
+ + + −	0.848	0.779	0.7978	0.8701	0.7554	0.7086	6.5	2,553,776	296.77	5.2
+ + − +	0.8502	0.7865	0.7991	0.8709	0.7572	0.7089	6.5	2,571,862	286.95	5.2
+ − + +	0.8579	0.7894	0.7999	0.8722	0.7592	0.7092	6.5	2,779,840	286.95	5.2
+ + + +	0.8596	0.7899	0.7991	0.8742	0.7611	0.7099	6.5	2,634,332	303.23	5.2

Table 2. Comparison with state-of-the-art techniques in PlantVillage.

Models	P (%)	R (%)	F1-Score (%)	mAP50 (%)	mAP75 (%)	mAP50–95 (%)	GFLOPs (M)	Parameters (M)	FPS	Model Size (MB)
VGG-19 [35]	0.955	0.94	0.947	0.965	0.825	-	-	0.955	0.94	0.947
ResNet-50 [35]	0.965	0.95	0.957	0.97	0.845	-	-	0.965	0.95	0.957
YOLOv5 [35]	0.861	0.868	0.864	0.944	0.815	4.5	1.9	0.861	0.868	0.864
Faster R-CNN [36]	0.8964	-	-	-	-	-	-	0.8964	-	-
YOLOv7 [37]	0.9021	0.8367	0.8682	0.9302	0.7567	-	-	0.9021	0.8367	0.8682
YOLOv8m [37]	0.9856	0.9899	0.9878	0.9918	0.8744	-	-	0.9856	0.9899	0.9878
YOLOv9m [37]	0.9861	0.9899	0.9880	0.9916	0.8763	-	-	0.9861	0.9899	0.9880
YOLOv10m [37]	0.9857	0.9867	0.9862	0.9921	0.8721	9.6	2.885	0.9857	0.9867	0.9862
YOLO-LeafNet [35]	0.985	0.98	0.982	0.99	0.94	-	-	0.985	0.98	0.982
NCA [38]	-	0.995	-	0.995	-	-	-	-	0.995	-
RRDN [39]	0.9500	-	-	-	-	-	-	0.9500	-	-
ATD-Net (ours)	0.9929	0.9915	0.9920	0.9969	0.8799	6.5	2.588	0.9929	0.9915	0.9920

Note: In the table, the missing value “-” represents that the data was not reported in the original publication.

Table 3. Comparison with state-of-the-art techniques in PlantDOC.

Models	P (%)	R (%)	F1-Score (%)	mAP50 (%)	mAP75 (%)	mAP50–95 (%)	GFLOPs (M)
YOLOv5n [40]	-	-	-	0.568	0.280	4.5	1.9
RTDETR-L [41]	0.63	0.624	-	0.619	0.391	-	-
Faster R-CNN [42]	0.380	0.641	0.423	0.456	-	-	-
YOLOv8 [43]	-	0.642	-	0.6160	-	-	11.2
YOLOv10n [44]	0.382	0.557	-	0.486	-	8.3	2.705
YOLOv11n	0.536	0.534	0.512	0.567	0.441	6.3	2.587
PPD-YOLO [44]	0.533	0.671	-	0.659	-	85.3	23.882
WD-YOLO [45]	0.604	0.601	-	0.654	0.531	-	2.78
ATD-Net (ours)	0.522	0.669	0.508	0.661	0.556	6.5	2.588

Note: In the table, the missing value “-” represents that the data was not reported in the original publication.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, Y.; Zhang, X. Plant-Leaf Disease Detection Based on Texture Enhancement Using ATD-Net. AgriEngineering 2026, 8, 160. https://doi.org/10.3390/agriengineering8050160

AMA Style

Li Y, Zhang X. Plant-Leaf Disease Detection Based on Texture Enhancement Using ATD-Net. AgriEngineering. 2026; 8(5):160. https://doi.org/10.3390/agriengineering8050160

Chicago/Turabian Style

Li, Yuheng, and Xiafen Zhang. 2026. "Plant-Leaf Disease Detection Based on Texture Enhancement Using ATD-Net" AgriEngineering 8, no. 5: 160. https://doi.org/10.3390/agriengineering8050160

APA Style

Li, Y., & Zhang, X. (2026). Plant-Leaf Disease Detection Based on Texture Enhancement Using ATD-Net. AgriEngineering, 8(5), 160. https://doi.org/10.3390/agriengineering8050160

Article Menu

Plant-Leaf Disease Detection Based on Texture Enhancement Using ATD-Net

Abstract

1. Introduction

2. Data Acquiring

2.1. Self-Built Dataset

2.2. PlantVillage Dataset

2.3. PlantDoc Dataset

3. System Architecture

4. Texture Feature Enhancement

4.1. Feature Extraction

4.2. Texture Fusion

4.3. Bounding Box Regression Optimization

5. Disease Detection

5.1. Experiment Environment

5.2. Detection Results

6. Experiment and Discussion

6.1. Evaluation Metrics

6.2. Evaluation Results

6.2.1. Ablation Experiment

6.2.2. Comparison Experiment

6.3. Discussion

6.3.1. Analysis of the Effectiveness of Model Design

6.3.2. Performance Advantages in Complex Scenarios

6.3.3. Limitations

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI