Accurate and Efficient Recognition of Mixed Diseases in Apple Leaves Using a Multi-Task Learning Approach

Luan, Peng; Guo, Nawei; Li, Libo; Li, Bo; Zhao, Zhanmin; Ma, Li; Liu, Bo

doi:10.3390/agriculture16010071

Open AccessArticle

Accurate and Efficient Recognition of Mixed Diseases in Apple Leaves Using a Multi-Task Learning Approach

by

Peng Luan

^1,2,†

,

Nawei Guo

^3,†,

Libo Li

⁴,

Bo Li

⁵,

Zhanmin Zhao

^1,6,7,

Li Ma

^1,2 and

Bo Liu

^1,2,8,*

¹

College of Information Science and Technology, Hebei Agricultural University, Baoding 071001, China

²

Hebei Key Laboratory of Agricultural Big Data, Hebei Agricultural University, Baoding 071001, China

³

School of Intelligent Engineering, Jinzhong College of Information, Jinzhong 071001, China

⁴

Hebei Green Valley Information Technology Co., Ltd., Shijiazhuang 056400, China

⁵

College of Plant Protection, Hebei Agricultural University, Baoding 071001, China

⁶

Hebei Key Laboratory of Photoelectric Information and Earth Detection Technology, Hebei GEO University, Shijiazhuang 052161, China

⁷

Hebei Digital Agriculture Industry Technology Research Institute, Handan 056400, China

⁸

Hebei Engineering Research Center for Agricultural Remote Sensing Applications, Hebei Agricultural University, Baoding 071001, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Agriculture 2026, 16(1), 71; https://doi.org/10.3390/agriculture16010071

Submission received: 18 November 2025 / Revised: 21 December 2025 / Accepted: 24 December 2025 / Published: 28 December 2025

(This article belongs to the Section Crop Protection, Diseases, Pests and Weeds)

Download

Browse Figures

Versions Notes

Abstract

The increasing complexity of plant disease manifestations, especially in cases of multiple simultaneous infections, poses significant challenges to sustainable agriculture. To address this issue, we introduce the Apple Leaf Mixed Disease Recognition (ALMDR) model, a novel multi-task learning approach specifically designed for identifying and quantifying mixed disease infections in apple leaves. ALMDR comprises four key modules: a Group Feature Pyramid Network (GFPN) for multi-scale feature extraction, a Multi-Label Classification Head (MLCH) for disease type prediction, a Leaf Segmentation Head (LSH), and a Lesion Segmentation Head (LeSH) for precise delineation of leaf and lesion areas. The GFPN enhances the traditional Feature Pyramid Network (FPN) through differential sampling and grouping strategies, significantly improving the capture of fine-grained disease characteristics. The MLCH enables simultaneous classification of multiple diseases on a single leaf, effectively addressing the mixed infection problem. The segmentation heads (LSH and LeSH) work in tandem to accurately isolate leaf and lesion regions, facilitating detailed analysis of disease patterns. Experimental results on the Plant Pathology 2021-FGVC8 dataset demonstrate ALMDR’s effectiveness, outperforming state-of-the-art methods across multiple tasks. Our model achieves high performance in multi-label classification (F1-score of 93.74%), detection and segmentation (mean Average Precision (mAP) of 51.32% and 45.50%, respectively), and disease severity estimation (

R^{2}

= 0.9757). Additionally, the model maintains this accuracy while processing 6.25 frames per second, balancing performance with computational efficiency. ALMDR demonstrates potential for real-time disease management in apple orchards, with possible applications extending to other crops.

Keywords:

mixed disease recognition; apple leaf diseases; multi-label classification; multi-task learning; sustainable agriculture

1. Introduction

Apples, valued for their nutritional and economic importance [1], are vital to global agriculture. However, they are vulnerable to various diseases, particularly those that affect leaves, significantly reducing yield and quality [2,3]. Common diseases like rust and scab often appear together, creating complex mixed-disease cases. In this study, ‘mixed disease’ is explicitly defined as the co-occurrence of symptoms from at least two different pathogen classes on the same leaf surface. This phenomenon complicates diagnosis and treatment [4], making accurate identification of both single and mixed infections crucial for effective disease management. Accurate identification of both single and mixed infections are thus crucial for effective disease management.

Traditional disease detection relies on manual inspection or laboratory testing [5]. While manual inspection is labor-intensive and expert-dependent [6] and laboratory laboratory testing offers high accuracy [7,8], molecular biology techniques [9] typically require specific primers for each target pathogen. This dependency makes them inefficient and costly for identifying unpredictable mixed infections in large-scale orchards compared to visual recognition methods [10]. Recent advancements in computer vision and machine learning have revolutionized plant disease detection, offering automation, efficiency, and scalability. These technologies enable rapid processing of large-scale image data and accurate disease classification, substantially reducing manual labor [11,12,13,14]. However, many existing methods are optimized for single-disease scenarios and may not effectively handle multiple co-occurring diseases in real-world agricultural settings.

To address these limitations, we propose an Apple Leaf Mixed Disease Recognition (ALMDR) model to address the challenge of identifying multiple concurrent diseases in apple leaf images. ALMDR employs an instance segmentation approach that considers the distribution and scale variations of leaves and lesions in natural scenes, as well as the interrelationships among different diseases. The model’s core component is the Group Feature Pyramid Network (GFPN), an enhanced version of the Feature Pyramid Network (FPN) [15]. Designed to overcome the scale discrepancy between large leaves and micro-lesions, GFPN structurally decouples global leaf morphology from fine-grained lesion details. This ensures that features at all scales are effectively captured and fused, providing a robust foundation for subsequent tasks.

Building upon these features, ALMDR incorporates three main modules for disease classification and segmentation. The Multi-Label Classification Head (MLCH) is introduced to accurately identify co-occurring disease types, moving beyond standard single-label prediction to handle complex mixed-infection scenarios. The Leaf Segmentation Head (LSH) and the Lesion Segmentation Head (LeSH) are based on the SOLO [16] framework and work in tandem to provide precise, pixel-level delineation of healthy and diseased areas, directly facilitating detailed symptom analysis. LSH focuses on delineating leaf regions using leaf-group features, while LeSH segments disease areas using lesion-group features. The collaboration among these modules is a key aspect of ALMDR: MLCH guides the learning process of LSH and LeSH by providing intermediate-layer features, enhancing segmentation accuracy. GFPN further facilitates communication between leaf and lesion groups, promoting the effective use of multi-scale features.

The main contributions of this paper can be summarized as follows:

(1): We propose the ALMDR model for simultaneous multi-disease classification and segmentation, bridging the gap between single-disease detection and more realistic multi-disease scenarios in agricultural fields.
(2): We introduce a multi-task learning architecture that integrates GFPN for multi-scale feature extraction, MLCH for disease type prediction, and segmentation heads (LSH and LeSH) for precise leaf and lesion delineation, significantly enhancing disease identification accuracy and lesion localization.
(3): We demonstrate the superior performance and computational efficiency of ALMDR through extensive experiments on the Plant Pathology 2021-FGVC8 (an Apple leaf disease dataset) and a self-collected Cucumber leaf disease dataset.

2. Related Work

Plant disease recognition methods can be categorized into classification-based, detection-based, and segmentation-based approaches. Below we review representative studies in each category.

2.1. Classification-Based Methods

Classification-based methods for plant disease recognition have advanced significantly by leveraging convolutional neural networks (CNNs) and transfer learning. For instance, adapted pre-trained models like ResNet50 have achieved high accuracy on public datasets [17], while novel architectures incorporating attention mechanisms have shown promising results [18,19]. Lightweight networks effectively meet real-time classification needs [20]. In apple leaf disease recognition, spatial and channel attention modules outperform classic models [21], with MobileNetV2 adaptations [22] and Siamese network structures [23] proving useful for common diseases and limited data scenarios. Hybrid architectures [24,25] and crop-specific modifications [12] further extend performance boundaries. Despite their accuracy and efficiency, classification methods lack localization mechanisms, limiting their ability to conduct detailed symptom analysis in real-world applications.

2.2. Detection-Based Methods

Detection-based methods for plant diseases have progressed considerably, employing architectures like YOLO [26] and Faster R-CNN [27] to pinpoint diseases in leaf images. Recent YOLO variants (YOLOv4–YOLOv8) report mAP scores ranging from 72.7% to 93.1% across various crops [13,28,29,30,31]. Faster R-CNN adaptations have also demonstrated potential, achieving mAP up to 96.31% on tomato leaf disease datasets [32], often integrating attention modules and feature pyramid networks to bolster feature extraction [33]. Ongoing research addresses unique agricultural challenges, including small object detection [34,35] and complex backgrounds [36,37].

2.3. Segmentation-Based Methods

Segmentation-based approaches provide pixel-level precision in disease identification. Models such as U-Net [38], Mask R-CNN [39], and DeepLab [40] generate detailed masks outlining diseased areas. U-Net variants with residual attention and atrous spatial pyramid pooling significantly improve segmentation accuracy [41,42], while incorporating Squeeze-and-Excitation modules and Swin Transformers further refines performance [43]. Improved U-Net++ with attention modules excels at detecting small lesions on apple leaves [44]. Similarly, DeepLabv3+ enhancements achieve high IoU scores for pear and tea diseases via multi-scale feature fusion [45], and two-stage DeepLabv3+ methods have proven effective in complex scenes [14,46]. Novel architectures such as ECA-SegFormer [47] and MFBP-UNet [48] continue pushing boundaries. In instance segmentation, Mask R-CNN-based frameworks have also advanced, boosting accuracy for tobacco [49], strawberry [50], and potato [51] diseases.

However, current methodologies predominantly operate under a single-label assumption, which proves inadequate for complex mixed-disease scenarios. Specifically, standard object detection algorithms frequently discard overlapping lesion proposals via Non-Maximum Suppression (NMS), thereby increasing the false negative rate for co-occurring infections. Moreover, the conventional approach of categorizing mixed infections as a singular, distinct class [4,52,53,54,55] neglects the intrinsic semantic independence and feature correlations among co-existing pathogens. To mitigate these issues, we propose a multi-task framework designed to decouple feature extraction, ensuring the distinct representation of individual disease signatures within mixed infection contexts.

3. Materials

3.1. Dataset Construction

This study utilized the Plant Pathology 2021-FGVC8 dataset from Kaggle [56], comprising 18,632 high-resolution images (4000 × 3000 and 4000 × 2672 pixels) captured in natural outdoor environments. Our dataset includes four primary apple leaf diseases: rust, scab, frog eye leaf spot (FELS), and powdery mildew (PM), along with healthy leaves. These diseases were chosen due to their prevalence and economic impact on apple production. Additionally, three categories of mixed diseases were considered: rust&scab (R&S), scab&frog eye leaf spot (S&FELS), and rust&frog eye leaf spot (R&FELS), as illustrated in Figure 1a.

To ensure label accuracy, we established a rigorous review protocol involving three plant pathology experts. They systematically re-evaluated images with ambiguous labels based on distinct symptomological criteria. This process resulted in the correction of approximately 150 images that were originally mislabeled (e.g., labeled as mixed diseases but presenting only single symptoms), thereby ensuring the reliability of the ground truth.The distribution of images across categories and sets is shown in Figure 1b.

To evaluate the applicability of our model across different crops, we constructed a dataset of cucumber leaf diseases. The dataset was collected on 13 July 2024, at Jiahe Farm in Baoding City, Hebei Province, and comprises 1000 cucumber leaf images, evenly balanced to support model generalization. Specifically, it contains 250 images for each of the four categories: Healthy, Powdery Mildew (PM), Physiological Disease (PD), and Mixed PM&PD. Detailed statistics regarding the sample counts and class distribution across training, validation, and testing sets are provided in Table 1. These images were captured under varying lighting conditions, featuring complex backgrounds and uneven lighting intensity, making the dataset challenging to process. Example images from the dataset are shown in Figure 1c, and the distribution of different disease classes and data splits is illustrated in Figure 1d.

3.2. Data Augmentation

To mitigate severe class imbalance within rare mixed-disease categories (e.g., R&S, R&FELS), we implemented a targeted lesion synthesis strategy. Lesion templates, accurately extracted via ground-truth segmentation masks to preserve morphological fidelity, were subjected to stochastic affine transformations—including scaling, rotation, and flipping—to broaden the feature space. To ensure seamless visual integration, the boundaries of these superimposed lesions underwent Gaussian blending to eliminate potential edge artifacts (Figure 2). To strictly prevent data leakage, the original dataset was partitioned prior to augmentation. Lesion templates were exclusively extracted from and inserted into the training set, ensuring no overlap of source information across splits.This approach specifically enriched under-represented classes, increasing the dataset from 18,632 to 19,814 images and rectifying distributional skew without introducing redundancy. The resulting curated dataset was stratified into training, validation, and testing subsets in an 8:1:1 ratio.

3.3. Data Annotation

Given the classification-oriented nature of the original dataset, which only provided image-level labels, our research objectives required a transition to instance segmentation with pixel-level annotations. We used the LabelMe tool to meticulously delineate the contours of the primary leaf and its associated disease spots in each image. To ensure annotation quality, a strict protocol was followed: initial annotations were performed by trained researchers and subsequently reviewed by three experienced plant pathology experts. Any discrepancies in disease categorization or boundary delineation were resolved through a majority consensus vote.This process generated JSON files containing detailed annotation information, which we then converted to the COCO format. The number of annotated instances for different disease categories in the Apple Leaf Disease Dataset and Cucumber Leaf Disease Dataset are presented in Table 1 and Table 2, respectively.

3.4. Methodology

3.4.1. ALMDR

The proposed apple leaf mixed disease recognition (ALMDR) model, illustrated in Figure 3, addresses the challenge of identifying multiple concurrent diseases on apple leaves. Central to its architecture is the group feature pyramid network (GFPN), which enhances multi-scale feature extraction by separating leaf and lesion information. The multi-label classification head (MLCH) leverages GFPN’s output to simultaneously identify various disease types, which is a key feature for addressing mixed disease scenarios. Two complementary modules, the leaf segmentation head (LSH) and lesion segmentation head (LeSH), perform anchor-free instance segmentation of leaves and disease regions respectively. ALMDR effectively recognizes and segments multiple apple leaf diseases by synergizing classification and segmentation tasks.

3.4.2. Group Feature Pyramid Network

In field-collected images of diseased apple leaves, the scale discrepancy between large leaf regions and small, scattered lesions poses significant challenges for accurate detection and segmentation. To address this issue, we propose the group feature pyramid network (GFPN), an improvement over existing FPN variants specifically designed to handle these scale differences. Figure 4 illustrates the evolution of FPN architectures leading to our GFPN design. The original FPN [15] (Figure 4a) introduces top-down information flow, enhancing low-level features with semantic information, but lacks sufficient multi-scale feature fusion. PANet [57] (Figure 4b) adds a bottom-up path for bidirectional information flow. Bi-FPN [58] (Figure 4c) strengthens multi-scale feature interaction through additional lateral connections but applies a uniform fusion strategy across all scales. To address the scale discrepancy between large leaves and micro-lesions, we propose the GFPN (Figure 4d), which is based on the principle of feature decoupling. Unlike Bi-FPN (Figure 4c, which uses uniform fusion across all scales and incurs high computational cost, GFPN employs a more efficient topology to balance inference speed and multi-scale aggregation. The division into Leaf’ and Lesion’ groups is based on their distinct frequency characteristics: leaf extraction relies on low-frequency global features, while lesion segmentation requires high-frequency local textural details. Decoupling these features helps avoid semantic interference and gradient conflicts during backpropagation. GFPN processes leaf and lesion features separately, effectively capturing both global leaf structure and local disease characteristics for apple leaf disease recognition. Let

F = {F_{i}}_{i = 0}^{4}

denote the set of feature maps output by each stage of the backbone network, where

F_{i}

has a spatial size of

\frac{W}{2^{i + 2}} \times \frac{H}{2^{i + 2}}

and a channel dimension of

128 \times 2^{i}

, with W and H being the width and height of the input image, respectively. To ensure consistent channel dimensions across all levels, a

1 \times 1

convolution is applied to each

F_{i}

, resulting in a new set of feature maps

C = {C_{i}}_{i = 0}^{4}

, where each

C_{i}

has a channel dimension of 128. GFPN constructs a five-layer feature pyramid

P = {P_{i}}_{i = 0}^{4}

, where

P_{0}

corresponds to the highest resolution and

P_{4}

corresponds to the lowest resolution. The spatial and channel dimensions of

P_{i}

are the same as those of

C_{i}

. The pyramid is divided into two groups: the leaf group

P^{l} = {P_{2}, P_{3}, P_{4}}

and the lesion group

P^{l e} = {P_{0}, P_{1}, P_{2}}

. Note that

P^{l} \subset P

and

P^{l e} \subset P

, and

P_{2}

is a shared layer between the leaf and lesion groups, facilitating inter-group information transfer.

GFPN performs both intra-group and inter-group feature fusion to enhance the feature representation for each task. For the leaf group

P^{l}

, a top-down upsampling approach is employed to integrate global information into larger scales:

P_{i} = {UpSample}_{2 \times} (C_{i + 1}) + {Conv}_{3 \times 3} (C_{i}), i \in {2, 3}

(1)

where

{UpSample}_{2 \times}

denotes the bilinear upsampling operation that doubles the spatial size, and

{Conv}_{3 \times 3}

represents the

3 \times 3

convolution operation. This approach propagates high-level semantic information to enhance contextual understanding for accurate leaf segmentation. On the other hand, for the lesion group

P^{l e}

, a bottom-up downsampling scheme is adopted to supplement fine-grained information into smaller scales:

P_{j} = {DownSample}_{\frac{1}{2} \times} (C_{j - 1}) + {Conv}_{3 \times 3} (C_{j}), j \in {1, 2}

(2)

where

{DownSample}_{\frac{1}{2} \times}

represents the bilinear downsampling operation that halves the spatial size. This scheme preserves and incorporates fine-grained details essential for precisely localizing and segmenting small lesion areas. Furthermore, the inter-group feature fusion occurs at the shared layer

P_{2}

, where the features from both the leaf group and the lesion group are added to facilitate information exchange.

3.4.3. Multi-Label Classification Head

The multi-label classification head (MLCH) addresses the challenge of identifying multiple co-occurring disease types on apple leaves. Unlike single-label classification, our approach enables simultaneous detection of multiple diseases, crucial for accurately representing complex real-world pathological patterns. This module processes input from the lesion group

P^{l e}

generated by the GFPN, leveraging its fine-grained details for precise disease classification.

Each input feature map

P_{j} \in P^{l e}

is first processed through a

1 \times 1

convolution to produce

M_{j}

:

M_{j} = {Conv}_{1 \times 1} (P_{j})

(3)

where

M_{j}

is an intermediate representation with unified channel dimensions of 128, which is also used for further segmentation mask learning (see Section 3.4.4 for details). Next, the MLCH applies a series of operations to

M_{j}

to obtain the multi-label disease probability predictions:

p_{j} = σ (GSP ({Conv}_{3 \times 3} ({AvgPool}_{2 \times 2} (M_{j}))))

(4)

where

j \in {0, 1, 2}

,

{AvgPool}_{2 \times 2}

denotes average pooling with a kernel size and stride of 2,

{Conv}_{3 \times 3}

represents a

3 \times 3

convolution with L output channels, GSP denotes global sum pooling, and

σ

is the element-wise sigmoid activation function. The number of output channels L is set to 4, corresponding to the four single disease types described in Section 3.1. We employ the sigmoid function

σ

instead of softmax to facilitate multi-label classification, enabling independent computation of each disease probability. Consequently, the output

p_{j} \in R^{L}

represents the probability of each disease’s presence without the constraint of probabilities summing to 1, which is typically imposed in single-label classification scenarios. To leverage information from different scales, we average the predictions from all levels by

\hat{y} = \frac{1}{3} \sum_{j = 0}^{2} p_{j}

.

To train the MLCH, we use a binary cross-entropy (BCE) loss for each disease category. Let

y \in {0, 1}^{L}

denote the ground-truth binary vector, where

y_{i} = 1

indicates the presence of the i-th disease category and

y_{i} = 0

otherwise. The multi-label classification loss is defined as:

L^{MLCH} = - \frac{1}{L} \sum_{i = 1}^{L} [y_{i} log ({\hat{y}}_{i}) + (1 - y_{i}) log (1 - {\hat{y}}_{i})]

(5)

3.4.4. Lesion Segmentation Head

The lesion segmentation head (LeSH) is designed to precisely segment lesion areas on apple leaves. We adapt the anchor-free paradigm, inspired by FCOS [59] and SOLO [16], to address mixed-disease pathology. Unlike traditional single-label approaches, LeSH incorporates a pixel-wise multi-label classification mechanism within its dual-branch structure. Taking the feature maps

P^{l e}

from the lesion group as input, LeSH (Figure 5) employs two parallel branches: a classification branch and a regression branch. Each branch consists of four

3 \times 3

convolution layers that maintain the spatial dimensions of the input. The classification branch focuses on identifying lesion areas, while the regression branch is responsible for predicting bounding boxes, centerness scores, and segmentation masks.

In the classification branch (Cls), each feature map

P_{j} \in P^{l e}

is processed by the convolutional layers to obtain the output feature

C l s_{j}^{l e}

. Subsequently, a

3 \times 3

convolutional layer followed by a channel-wise Softmax normalization operation is applied, such that

C o n f_{j}^{l e} = Softmax ({Conv}_{3 \times 3} (C l s_{j}^{l e}))

to generate a confidence tensor

C o n f_{j}^{l e}

of size

W_{j} \times H_{j} \times (L + 1)

, where

W_{j}

and

H_{j}

are the width and height of the feature map at level j, and each position in

C o n f_{j}^{l e}

records the probability distribution over

L + 1

categories. Here, L includes both disease types and a healthy class, with an additional background class.

In this branch, focal Loss [60] is used to address class imbalance by focusing on hard-to-classify examples. For each feature map level

j (j = 0, 1, 2)

, the focal Loss is calculated as follows:

L_{Focal}^{LeSH} = - \frac{1}{N} \sum_{j = 0}^{2} \sum_{c = 0}^{L} Y_{j} [:, :, c] \cdot {(1 - C o n f_{j}^{l e} [:, :, c])}^{γ} \cdot log (C o n f_{j}^{l e} [:, :, c])

(6)

where N is the total number of positions across all feature maps:

N = \sum_{j = 0}^{2} W_{j} \times H_{j}

.

Y_{j} [:, :, c]

is the ground truth for class c,

{Conf}_{j}^{l e} [:, :, c]

is the predicted confidence, L is the number of classes, and

γ

(typically 2.0) is the focusing parameter.

In the regression branch (Reg), each feature map

P_{j} \in P^{l e}

is processed by the convolutional layers to obtain the output feature

R e g_{j}^{l e}

. This output is added to the intermediate feature

M_{j}

from the MLCH, generated by Equation (3), resulting in

{\tilde{R e g}}_{j}^{l e}

:

{\tilde{R e g}}_{j}^{l e} = R e g_{j}^{l e} + M_{j}

(7)

The resulting feature map

{\tilde{R e g}}_{j}^{l e}

is used to perform the following three subtasks:

1. Bounding box regression. A

3 \times 3

convolutional layer is applied to generate a bounding box feature map

B o x_{j}^{l e}

of size

W_{j} \times H_{j} \times 4

. For each position

(x, y)

on the feature map, the model predicts four offsets

(l, t, r, b)

, representing the distances from that position to the left, top, right, and bottom boundaries of the target bounding box. If the position is a positive sample (i.e., located inside a ground-truth bounding box), the GIoU loss [61] is computed between the predicted offsets and the ground-truth offsets:

L_{Box}^{LeSH} = 1 - (\frac{| B_{p} \cap B_{g} |}{| B_{p} \cup B_{g} |} - \frac{| C \ (B_{p} \cup B_{g}) |}{| C |})

(8)

where

B_{p}

and

B_{g}

denote the predicted and ground-truth bounding boxes, respectively, and C represents the smallest enclosing box covering both

B_{p}

and

B_{g}

.

2. Centerness regression: A

3 \times 3

convolution followed by sigmoid activation is applied to

{\tilde{R e g}}_{j}^{l e}

defined Equation (7), resulting in

C e n_{j}^{l e} = σ ({Conv}_{3 \times 3} ({\tilde{R e g}}_{j}^{l e}))

. This output

{C e n}_{j}^{l e}

of size

W_{j} \times H_{j} \times 1

, representing the centerness score for each position. The centerness loss is only computed for positive samples, i.e., positions located inside ground-truth bounding boxes, which is defined using the binary cross-entropy:

L_{Cen}^{LeSH} = - [C e n_{(x, y)}^{g t} log (C e n_{(j, x, y)}^{l e}) + (1 - C e n_{(x, y)}^{g t}) log (1 - C e n_{(j, x, y)}^{l e})]

(9)

where

C e n_{(x, y)}^{g t}

is the ground-truth centerness value at position

(x, y)

,

C e n_{j, x, y}^{l e}

is the predicted centerness value at position

(x, y)

on the j-th level feature map.

3. Mask regression. Following the decoupled SOLO strategy [16],

{\tilde{R e g}}_{j}^{l e}

is processed through

3 \times 3

convolutions, producing two tensors

M a s k_{j}^{X}

and

M a s k_{j}^{Y}

of size

W_{j} \times H_{j} \times S

. These tensors represent the horizontal and vertical axes of the mask, respectively, where S is the number of grid cells along each axis, defining the resolution of the mask prediction. For an object located at grid position

(x, y)

, its mask can be represented as:

M a s k_{(j, x, y)} = σ (M a s k_{j}^{X} [:, :, x]) ⊙ σ (M a s k_{j}^{Y} [:, :, y])

(10)

where

σ

denotes the sigmoid function, ⊙ represents element-wise multiplication,

M a s k_{j}^{X} [:, :, x]

and

M a s k_{j}^{Y} [:, :, y]

are the x-th and y-th channel maps of

M a s k_{j}^{X}

and

M a s k_{j}^{Y}

respectively. This decoupled approach reduces the output space from

W_{j} \times H_{j} \times S^{2}

to

W_{j} \times H_{j} \times 2 S

. The mask loss is composed of the binary cross-entropy and the dice loss. The binary cross-entropy can be defined as:

L_{BCE} = - \frac{1}{N} \sum_{j = 0}^{2} \sum_{(x, y) \in Ω_{j}} [M a s k_{(x, y)}^{g t} log (M a s k_{(j, x, y)}) + (1 - M a s k_{(x, y)}^{g t}) log (1 - M a s k_{(j, x, y)})]

(11)

where

Ω_{j}

is the set of all positions on the j-th level feature map. The dice loss can be defined as:

L_{Dice} = \sum_{j = 0}^{2} (1 - \frac{2 \sum_{(x, y) \in Ω_{j}} M a s k_{(x, y)}^{gt} \cdot M a s k_{(j, x, y)}}{\sum_{(x, y) \in Ω_{j}} (M a s k_{(x, y)}^{gt} + M a s k_{(j, x, y)})})

(12)

The final mask loss function is a sum of the binary cross-entropy and the dice loss:

L_{Mask}^{LeSH} = L_{BCE} + L_{Dice}

(13)

In the LeSH model, the total loss function is defined as the sum of four component losses: the classification loss (Equation (6)), localization loss (Equation (8)), centerness loss (Equation (9)), and mask loss (Equation (13)):

L^{LeSH} = L_{Focal}^{LeSH} + L_{Box}^{LeSH} + L_{Cen}^{LeSH} + L_{Mask}^{LeSH}

(14)

3.4.5. Leaf Segmentation Head

The leaf segmentation head (LSH) is designed for instance segmentation of leaves. As illustrated in Figure 3, LSH adopts an architecture similar to LeSH. This head takes the leaf group

P^{l}

generated by the GFPN as its input. The loss function for LSH,

L^{LSH}

, follows the same structure as

L^{LeSH}

defined in Equation (14). Considering that the ALMDR model consists of three main components: MLCH, LeSH, and LSH, the total loss function of the ALMDR model is defined as the sum of the losses from its three heads:

L = L^{MLCH} + L^{LeSH} + L^{LSH}

(15)

4. Results

4.1. Implementation Details

Experiments were conducted on a platform equipped with an Intel(R) Core(TM) i9-9900X CPU and two NVIDIA GeForce RTX 3090Ti GPUs. The models were trained with a batch size of 8, using backbones initialized with pre-trained ImageNet weights [62]. Training was performed over 200 epochs utilizing stochastic gradient descent (SGD). The initial learning rate was set to

10^{- 3}

and was reduced by a factor of ten at epochs 100, and 150. A weight decay of

5 \times 10^{- 4}

and a momentum of 0.9 were applied. The input images were resized to

512 \times 512

pixels. To demonstrate the generalization capability of the proposed method across different crops, experiments were conducted on the Apple leaf disease dataset and the Cucumber leaf disease dataset for training, validation, and testing.

4.2. Evaluation Metrics

Given the multi-task nature of the ALMDR model, which encompasses disease detection, segmentation, and classification, a comprehensive set of evaluation metrics is employed to assess its performance across various aspects. These metrics are categorized into four main groups: disease classification, detection and segmentation, severity assessment, and model efficiency.

Classification task metrics: Disease classification in instance segmentation models typically relies on aggregating confidence scores from detected instances. The ALMDR model enhances this approach by effectively integrating outputs from three key modules for disease classification: LSH for precise leaf area delineation, LeSH for instance-level disease predictions, and MLCH for image-level multi-disease detection. Figure 6 provides a detailed comparison of confidence score computation methods between ALMDR and other models.

The classification process begins with applying non-maximum suppression (NMS) [63] to LeSH and LSH outputs, eliminating redundant detections. If LeSH output is an empty set, indicating no lesion detection, the sample is classified as a healthy leaf. For samples with lesions, a weighted fusion strategy combines LeSH and MLCH predictions:

C o n f_{C} = 0.5 \times (\frac{1}{n} \sum_{i = 1}^{n} C o n f_{i, C}^{LeSH}) + 0.5 \times C o n f_{C}^{MLCH}

(16)

where n is the number of lesion instances detected for disease type C,

C o n f_{i, C}^{LeSH}

is the confidence score of the i-th lesion instance for disease type C from LeSH, and

C o n f_{C}^{MLCH}

is the confidence score for disease type C from MLCH. A disease type C is considered present if

C o n f_{C} \geq θ

, where

θ

is a predefined threshold (typically set to 0.50 based on empirical studies). Here, LeSH contributes to spatial localization, while MLCH incorporates global semantic context. The equal weighting scheme (

w = 0.5

) assumes that instance-level features from LeSH and image-level features from MLCH have an equal contribution to the final decision. To evaluate the robustness of the chosen threshold (

θ = 0.50

), a sensitivity analysis was conducted on the validation set. By varying

θ

within the range

[0.4, 0.6]

, we observed negligible fluctuations in the F1-score (<0.2%), confirming that the model’s performance is stable and not overly sensitive to small changes in threshold within this margin.

Based on the aforementioned classification methods, we evaluate our model’s performance using both multi-label and multi-class classification metrics. For multi-label classification, we employ Hamming Loss, One-Error, Zero-One Loss, and example-based precision, recall, and F1-score. These metrics collectively assess the model’s ability to handle multiple labels per instance and identify compound diseases. For multi-class classification, we use macro and micro averaged precision, recall, and F1-score to evaluate the model’s performance across all disease categories and its overall accuracy. To determine the statistical significance of performance differences between ALMDR and other models, we utilize the Wilcoxon signed-rank test [64]. The unit of analysis is defined as the per-class metric (e.g., F1-score), where the performance of ALMDR and a baseline model on the same disease category constitutes a paired sample, with a p-value threshold of 0.05 indicating a significant advantage for ALMDR.

Detection and segmentation task metrics: For detection and segmentation tasks, we adopt evaluation metrics consistent with the COCO dataset [65], including threshold-based and object size-based metrics. This facilitates comparisons with other research in the computer vision field. We assess the model’s performance using average precision (AP) at different intersection over union (IoU) thresholds, ranging from ${AP}_{50}$ to ${AP}_{90}$ with a step size of 0.10. For example, ${AP}_{50}$ considers a prediction correct when the predicted bounding box has an $IoU \geq 0.5$ with the ground truth and the predicted category matches. We also compute the mean average precision (mAP) and mean average recall (mAR) across these thresholds to reflect the model’s overall performance. Moreover, we use ${AP}_{S}$ , ${AP}_{M}$ , and ${AP}_{L}$ to evaluate the model’s performance on small (area $< 32 \times 32$ pixels), medium ( $32 \times 32$ to $96 \times 96$ pixels), and large (area $> 96 \times 96$ pixels) objects, respectively. These metrics help assess the model’s ability to detect and segment objects across different scales.
Disease severity estimation task metrics: The segmentation results obtained from our model enable us to assess the severity of plant diseases. This assessment follows the National Standard of the People’s Republic of China GB/T 17980.124-2004 for apple leaf diseases [66] and GB/T 17980.30-2000 for cucumber leaf diseases [67]. Although primarily designed for apple leaf spot diseases (Alternaria mali and Marssonina coronaria), this standard explicitly states its applicability to other apple leaf lesions, making it suitable for our diverse set of leaf diseases. Table 3 presents the standard’s severity level criteria, which quantify disease severity based on the proportion of lesion area to total leaf area. The table also includes the distribution of samples across these severity levels in our test set. We evaluate disease severity using two approaches: (1) linear regression to fit predicted disease proportions with ground truth, measured by the coefficient of determination $R^{2}$ [14], and (2) classification of severity levels, evaluated using F1-score. These methods assess our model’s performance in both continuous proportion prediction and discrete severity level classification.
Model efficiency metrics: To assess our ALMDR model’s efficiency, we use three metrics: FPS (frames per second), which measures real-time performance; FLOPs(G) (giga floating point operations per second), which indicates computational complexity; and parameters (in millions), which reflects storage requirements. These metrics collectively evaluate the model’s practicality for deployment and optimization.

4.3. Comparative Analysis of Multi-Label Disease Classification

In this section, we compare the performance of our proposed ALMDR model with state-of-the-art methods on three key tasks: multi-label disease classification, instance segmentation of leaves and lesions, and disease severity estimation. We evaluated against YOLACT [68], FCOS [59], SOLOv1 [16], SOLOv2 [69], YOLOv7 [70], YOLOv8 [71] and YOLOv9 [72], chosen for their relevance and strong performance in related tasks.

To ensure a strictly fair comparison and isolate architectural advantages, all baseline models (YOLACT, FCOS, SOLO variants, and YOLO series) were retrained from scratch using the identical experimental protocol described in Section 3.3. This included utilizing the same data partitions, input resolution (

512 \times 512

), augmentation pipeline, and optimization strategy (SGD, 200 epochs). Consequently, performance variances can be directly attributed to architectural differences rather than discrepancies in training hyperparameters. Additionally, we consider the backbone architectures used in these models: YOLACT, FCOS, SOLOv1, and SOLOv2 utilize a 101-layer ResNet-101 backbone, while the YOLO series employs CSPDarknet backbones with increasing depth and complexity. YOLOv7 uses a 52-layer custom CSPDarknet with enhanced cross-stage connections for improved gradient flow, and YOLOv8 advances to a 65-layer architecture, introducing depth-wise separable convolutions and lightweight attention mechanisms to boost performance. YOLOv9 further optimizes to a 73-layer RepCSPDarknet, incorporating re-parameterization techniques for more efficient inference.

Table 4 compares multi-label and multi-class classification results across different models on the Apple leaf disease dataset, with ALMDR consistently outperforming others across all metrics. In multi-label classification, ALMDR achieves the lowest Hamming Loss (7.94%), 0.33% lower than YOLOv9 (8.27%), indicating superior accuracy in label assignment. It also shows the lowest One-Error (8.46%) and Zero-One Loss (8.75%), demonstrating its ability to correctly predict both top-scoring and complete label sets. ALMDR’s high example-based precision (93.19%), recall (94.34%), and F1-score (93.74%) further underscore its balanced performance across individual instances. In multi-class classification, ALMDR maintains its leading position with the highest macro F1-score (93.66%), surpassing YOLOv9 by 0.57% and SOLOv2 by 9.62%, and the highest micro F1-score (93.03%), which is 0.95% higher than YOLOv9 and 6.25% higher than SOLOv2. Moreover, the consistently low p-values (

p < 0.05

) across both multi-label and multi-class classifications demonstrate that ALMDR’s performance improvements are statistically significant.

The classification results on the Cucumber leaf disease dataset are presented in Table 5. Compared to the results in Table 4, the overall performance of the model has improved, which can be attributed to the differences in data distribution between the datasets. Notably, the ALMDR method continues to demonstrate outstanding performance. Specifically, the ALMDR method outperforms the second-best method (YOLOv9) by 0.52% and 0.94% in terms of macro F1-score and micro F1-score, respectively. This further highlights the model’s robustness and adaptability across different crops.

To further illustrate the model’s performance, Figure 7 provides a visual representation of our proposed model’s disease classification capabilities on the Apple leaf disease dataset. The radar chart in Figure 7a shows our model consistently outperforming existing architectures across all categories, with improvement from simple to complex disease scenarios. In the health category, both our model and YOLOv9 achieve F1-scores that are nearly perfect, approaching 1.0, significantly reducing false positives and establishing a crucial baseline for reliable disease detection. Most models, including ours, show lower average performance in detecting combined diseases compared to single diseases. Moreover, the confusion matrix in Figure 7b offers additional insights. Most misclassifications occur between related categories, such as a single disease being misclassified as a combination including that disease. A similar trend is observed on the cucumber leaf disease dataset, as illustrated in Figure 7c,d. The reduced performance on mixed diseases (e.g., R&S) stems from high feature entanglement. Unlike single infections with distinct visual signatures, co-occurring pathogens often generate overlapping symptoms that create ambiguous decision boundaries in the feature space, leading to lower confidence scores compared to single-label instances. This pattern, observed in both the Apple and Cucumber leaf disease datasets, indicates that the model has effectively learned fundamental disease characteristics but still has room for improvement in distinguishing overlapping or similar symptoms.

4.4. Evaluation of Instance Segmentation of Leaves and Lesion Regions

Table 6 demonstrates that our proposed ALMDR model outperforms existing architectures across all detection and segmentation metrics. On the Apple leaf disease dataset, ALMDR achieves 51.32% mAP and 58.71% mAR in detection, surpassing the next best model, YOLOv9, by 1.21% and 1.93% respectively. For segmentation, ALMDR shows similar superiority with 45.50% mAP and 48.10% mAR, exceeding YOLOv9 by 1.19% and 1.39% This consistent improvement in both precision and recall indicates ALMDR’s enhanced capability in comprehensively capturing objects while maintaining a low false positive rate. A similar trend is observed on the Cucumber leaf disease dataset. For detection, ALMDR achieves 55.55% mAP and 62.61% mAR, while in segmentation, it achieves 50.96% mAP and 51.89%. These results further demonstrate the consistent improvement of ALMDR across different datasets, validating its robustness and adaptability in diverse agricultural scenarios. From a computational efficiency perspective, ALMDR achieves the highest FPS (6.25) while maintaining relatively low computational complexity (74.78 G FLOPs) and a small parameter count (45.27 M), which is crucial for practical applications.

Moreover, Figure 8 further illustrates ALMDR’s superior performance across all IoU thresholds and object sizes. All models show improved performance as IoU thresholds decrease and object sizes grow from small (S) to large (L), with ALMDR exhibiting the most significant enhancement.

4.5. Evaluation of Disease Severity Estimation Performance

To assess the severity estimation performance of different models, we conducted an evaluation based on the disease severity criteria outlined in Table 3. This analysis includes linear regression and F1-score calculations for various disease levels, as shown in Figure 9. The regression analysis (Figure 9a–g) compares model-predicted disease proportions with ground truth annotations on the Apple leaf disease dataset. ALMDR demonstrates superior performance across all severity levels, achieving the highest coefficient of determination (

R^{2}

= 0.9757). Notably, all models exhibit a trend of decreasing accuracy as disease severity increases, particularly evident in SOLOv1 and SOLOv2. This suggests reduced robustness when dealing with high-coverage diseases. In contrast, ALMDR maintains relatively consistent performance across severity levels, demonstrating its strength in handling diverse disease manifestations. The F1-score analysis (Figure 9h) further confirms ALMDR’s superiority. It consistently outperforms other models across all severity levels, with the gap widening at higher severities.

A similar trend is observed on the Cucumber leaf disease dataset (Figure 10a–g). ALMDR achieves comparable performance, maintaining the highest

R^{2}

(

R^{2}

= 0.9852) value and consistent superiority across severity levels. While other models also exhibit decreasing accuracy at higher severities, ALMDR demonstrates greater robustness, further validating its adaptability and effectiveness across different datasets.

4.6. Ablation Study

We conducted ablation studies to evaluate the effectiveness of our proposed GFPN architecture and MLCH mechanism. We compared GFPN + MLCH (the proposed model) with various feature pyramid network variants: FPN [15] (baseline), PANet [57], Bi-FPN [58], and GFPN without MLCH.

The results presented in Table 7 clearly demonstrate the superiority of our proposed GFPN + MLCH model across all performance metrics. It significantly outperforms the baseline (FPN), with 4.33% and 1.13% increases in detection and segmentation mAP. Performance improves from FPN to PANet to Bi-FPN, reflecting the benefits of advanced feature fusion. However, GFPN outperforms all these variants even without MLCH, demonstrating the effectiveness of our group-based approach. Integrating MLCH with GFPN yields further performance gains: detection mAP rises by 0.43%, while segmentation mAP improves by 0.39%. These enhancements underscore MLCH’s crucial role in refining feature representations. Importantly, GFPN + MLCH maintains computational efficiency. Despite slight increases in FLOPs and parameters compared to GFPN without MLCH, it offers a favorable performance-efficiency trade-off, maintaining a high frame rate of 6.25 FPS.

5. Discussion

5.1. Cross-Model Applicability of GFPN and MLCH Modules

To evaluate the general applicability of our proposed GFPN and MLCH modules, we integrated them into YOLOv7, YOLOv8, and YOLOv9 models. Specifically, GFPN replaced the original Bi-FPN-based structures or their variants in these algorithms, enhancing the feature fusion process. MLCH was added as an additional branch to the original architecture, enabling multi-label classification. As shown in Table 8, incorporating GFPN and MLCH consistently improves detection and segmentation performance across all models. For example, adding both GFPN and MLCH to YOLOv9 increases detection mAP by 0.78% (from 50.11% to 50.89%) and segmentation mAP by 0.69% (from 44.31% to 45.00%), with minimal impact on FPS and slight reductions in FLOPs and parameter counts.

5.2. Visualization of Leaf and Lesion Segmentation Results Across Different Models

Plant disease detection and segmentation in natural environments pose significant challenges. These challenges arise from the complex interplay of environmental factors and diverse pathological manifestations. Traditional approaches relying on laboratory conditions or simplified backgrounds [73] fail to capture the full complexity of disease expression in real-world settings. Variable lighting, leaf overlap, and environmental stressors significantly impact detection accuracy in field conditions [28]. Our proposed ALMDR model addresses these challenges by fully considering the scale differences between leaves and lesions, as well as the co-occurrence patterns of mixed diseases. To analyze the performance of different segmentation methods under various environmental conditions, we conducted a comprehensive visual analysis.

We examined leaf segmentation performance across five challenging scenarios, categorized into two groups: environmental factors (overlapping leaves, complex illumination) and leaf-specific characteristics (leaf curling, serrated edges, blurred edges).

As shown in Figure 11a, ALMDR demonstrates superior leaf segmentation performance across these challenging scenarios. It consistently outperforms other models in delineating overlapping leaves, maintaining accuracy under complex illumination, and precisely capturing intricate leaf morphologies. For disease segmentation, we analyzed an additional set of five challenging scenarios: irregular shaped spots, low-contrast spots, marginal spots, small spots, and dense spots. Figure 11b demonstrates ALMDR’s superior performance across all conditions.

As visually corroborated in Figure 11a, ALMDR demonstrates superior leaf segmentation, particularly in delineating overlapping boundaries. This qualitative observation aligns with the quantitative results in Table 6, where ALMDR achieves a segmentation mAP of 45.50%, surpassing YOLOv9 by 1.19%. Specifically, the model’s ability to handle serrated edges (Figure 11a) directly contributes to the higher detection accuracy for small objects (

A P_{S}

), validating that the visual improvements translate into statistically significant metric gains. Similarly, in Figure 11b, the precise segmentation of dense spots explains the reduced False Negative Rate observed in our confusion matrix analysis.

While YOLOv9 performed well overall, it fell short of ALMDR in precision, especially with small and dense spots. SOLOv1 and SOLOv2 struggled with complex lesion scenarios, particularly with irregular shaped and marginal spots. These results highlight ALMDR’s effectiveness in handling the complexities of both leaf and disease segmentation in natural agricultural environments.

6. Conclusions

This study introduces the apple leaf mixed disease recognition (ALMDR) framework, addressing the challenge of recognizing multiple co-occurring diseases on apple leaves in natural environments. By unifying multi-label classification and instance segmentation, ALMDR demonstrates superior performance in detecting and segmenting complex disease patterns. The multi-label classification head (MLCH) enhances disease classification using global image information, while the group feature pyramid network (GFPN) captures both global leaf structures and local lesion details. Experimental results confirm ALMDR’s robustness across various challenging scenarios, highlighting its potential for precision agriculture. However, the framework’s current limitations include its specificity to apple leaves and a limited range of disease types, which may affect its generalizability. Future work will prioritize three specific directions to address current limitations. First, to overcome the identified challenge of feature entanglement in mixed diseases (e.g., R&S), we plan to integrate channel-wise attention mechanisms into the LeSH module to better discriminate overlapping symptoms. Second, building on the successful transfer from apple to cucumber datasets, we aim to develop a domain-adaptive version of ALMDR to reduce annotation costs for novel crops. Finally, to further enhance the practical utility indicated by our 6.25 FPS speed, we will explore model pruning techniques to deploy ALMDR on edge devices for in-field real-time monitoring.

Author Contributions

Conceptualization, B.L. (Bo Liu); data curation, P.L. and N.G.; formal analysis, P.L. and N.G.; funding acquisition, B.L. (Bo Li) and L.L.; investigation, P.L. and N.G.; methodology, B.L. (Bo Liu), Z.Z. and L.L.; project administration, Z.Z. and B.L. (Bo Liu); resources, B.L. (Bo Liu); software, P.L. and N.G.; supervision, L.M. and B.L. (Bo Liu); validation, M.L and L.L.; visualization, P.L., Z.Z. and N.G.; writing—original draft preparation, P.L. and N.G.; writing—review and editing, L.M. and B.L. (Bo Liu). All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the S&T Program of Hebei [252N7401D], the National Modern Apple Technology Industry System [CARS-28] and the Innovation Support Project for Graduate Students of Hebei under Grant (CXZZBS2024080).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The Plant Pathology 2021-FGVC8 dataset used in this study is publicly available on Kaggle at Plant Pathology 2021-FGVC8 Kaggle Competition. The severity criteria for apple leaf disease follow the National Standard of the People’s Republic of China GB/T 17980.¹-2000 Pesticide—Guidelines for the Field Efficacy Trials (II)—Part 124: Fungicides Against Alternaria Leaf Spot of Apple. The severity criteria for cucumber leaf disease follow the National Standard of the People’s Republic of China GB/T 17980.³⁰-2000 Pesticide—Guidelines for the Field Efficacy Trials (I)—Fungicides Against Cucumber Powdery Mildew.

Acknowledgments

We thank the editor and reviewers for their helpful suggestions to improve the quality of this manuscript.

Conflicts of Interest

Author L.L. is employed by the company Hebei Green Valley Information Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The company had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

Boyer, J.; Liu, R.H. Apple phytochemicals and their health benefits. Nutr. J. 2004, 3, 5. [Google Scholar] [CrossRef] [PubMed]
Yang, Q.; Duan, S.; Wang, L. Efficient identification of apple leaf diseases in the wild using convolutional neural networks. Agronomy 2022, 12, 2784. [Google Scholar] [CrossRef]
Lv, X.; Zhang, X.; Gao, H.; He, T.; Lv, Z.; Zhangzhong, L. When crops meet machine vision: A review and development framework for a low-cost nondestructive online monitoring technology in agricultural production. Agric. Commun. 2024, 2, 100029. [Google Scholar] [CrossRef]
Feng, W.; Song, Q.; Sun, G.; Zhang, X. Lightweight Isotropic Convolutional Neural Network for Plant Disease Identification. Agronomy 2023, 13, 1849. [Google Scholar] [CrossRef]
Nikith, B.; Keerthan, N.; Praneeth, M.; Amrita, T. Leaf disease detection and classification. Procedia Comput. Sci. 2023, 218, 291–300. [Google Scholar] [CrossRef]
Khan, R.U.; Khan, K.; Albattah, W.; Qamar, A.M. Image-based detection of plant diseases: From classical machine learning to deep learning journey. Wirel. Commun. Mob. Comput. 2021, 2021, 5541859. [Google Scholar] [CrossRef]
Hardham, A.R. Confocal microscopy in plant–pathogen interactions. In Plant Fungal Pathogens: Methods and Protocols; Humana Press: Totowa, NJ, USA, 2012; pp. 295–309. [Google Scholar]
Buja, I.; Sabella, E.; Monteduro, A.G.; Chiriacò, M.S.; De Bellis, L.; Luvisi, A.; Maruccio, G. Advances in plant disease detection and monitoring: From traditional assays to in-field diagnostics. Sensors 2021, 21, 2129. [Google Scholar] [CrossRef]
Schena, L.; Duncan, J.; Cooke, D. Development and application of a PCR-based ‘molecular tool box’ for the identification of Phytophthora species damaging forests and natural ecosystems. Plant Pathol. 2008, 57, 64–75. [Google Scholar] [CrossRef]
Mahlein, A.K. Plant disease detection by imaging sensors–parallels and specific demands for precision agriculture and plant phenotyping. Plant Dis. 2016, 100, 241–251. [Google Scholar] [CrossRef] [PubMed]
Ferentinos, K.P. Deep learning models for plant disease detection and diagnosis. Comput. Electron. Agric. 2018, 145, 311–318. [Google Scholar] [CrossRef]
Suseno, J.R.K.; Azhar, Y.; Minarno, A.E. The Implementation of Pretrained VGG16 Model for Rice Leaf Disease Classification using Image Segmentation. Kinet. Game Technol. Inf. Syst. Comput. Netw. Comput. Electron. Control 2023, 8, 499–506. [Google Scholar] [CrossRef]
Zhao, Y.; Lin, C.; Wu, N.; Xu, X. APEIOU Integration for Enhanced YOLOV7: Achieving Efficient Plant Disease Detection. Agriculture 2024, 14, 820. [Google Scholar] [CrossRef]
Liu, W.; Chen, Y.; Lu, Z.; Lu, X.; Wu, Z.; Zheng, Z.; Suo, Y.; Lan, C.; Yuan, X. StripeRust-Pocket: A Mobile-Based Deep Learning Application for Efficient Disease Severity Assessment of Wheat Stripe Rust. Plant Phenomics 2024, 2024, 0201. [Google Scholar] [CrossRef]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 22–25 July 2017; pp. 2117–2125. [Google Scholar]
Wang, X.; Kong, T.; Shen, C.; Jiang, Y.; Li, L. Solo: Segmenting objects by locations. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XVIII 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 649–665. [Google Scholar]
Zhang, X.; Li, H.; Sun, S.; Zhang, W.; Shi, F.; Zhang, R.; Liu, Q. Classification and Identification of Apple Leaf Diseases and Insect Pests Based on Improved ResNet-50 Model. Horticulturae 2023, 9, 1046. [Google Scholar] [CrossRef]
Lin, J.; Zhang, X.; Qin, Y.; Yang, S.; Wen, X.; Cernava, T.; Migheli, Q.; Chen, X. Local and global feature-aware dual-branch networks for plant disease recognition. Plant Phenomics 2024, 6, 0208. [Google Scholar] [CrossRef] [PubMed]
Prashanthi, B.; Krishna, A.P.; Rao, C.M. LEViT-Leaf Disease identification and classification using an enhanced Vision transformers (ViT) model. Multimed. Tools Appl. 2025, 84, 23313–23344. [Google Scholar] [CrossRef]
Zeng, W.; Li, H.; Hu, G.; Liang, D. Lightweight dense-scale network (LDSNet) for corn leaf disease identification. Comput. Electron. Agric. 2022, 197, 106943. [Google Scholar] [CrossRef]
Cheng, H.; Li, H. Identification of apple leaf disease via novel attention mechanism based convolutional neural network. Front. Plant Sci. 2023, 14, 1274231. [Google Scholar] [CrossRef] [PubMed]
Liu, S.; Bai, H.; Li, F.; Wang, D.; Zheng, Y.; Jiang, Q.; Sun, F. An apple leaf disease identification model for safeguarding apple food safety. Food Sci. Technol. 2023, 43, e104322. [Google Scholar] [CrossRef]
Zhang, S.; Wang, D.; Yu, C. Apple leaf disease recognition method based on Siamese dilated Inception network with less training samples. Comput. Electron. Agric. 2023, 213, 108188. [Google Scholar] [CrossRef]
Karthik, R.; Alfred, J.J.; Kennedy, J.J. Inception-based global context attention network for the classification of coffee leaf diseases. Ecol. Inform. 2023, 77, 102213. [Google Scholar] [CrossRef]
Huang, X.; Xu, D.; Chen, Y.; Zhang, Q.; Feng, P.; Ma, Y.; Dong, Q.; Yu, F. EConv-ViT: A strongly generalized apple leaf disease classification model based on the fusion of ConvNeXt and Transformer. Inf. Process. Agric. 2025, 12, 466–477. [Google Scholar] [CrossRef]
Redmon, J. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Wang, J.; Wu, H.; Yu, Y.; Sun, H.; Zhang, H. Detection of powdery mildew on strawberry leaves based on DAC-YOLOv4 model. Comput. Electron. Agric. 2022, 202, 107418. [Google Scholar] [CrossRef]
Rajamohanan, R.; Latha, B.C. An Optimized YOLO v5 Model for Tomato Leaf Disease Classification with Field Dataset. Eng. Technol. Appl. Sci. Res. 2023, 13, 12033–12038. [Google Scholar] [CrossRef]
Iren, E. Comparison of yolov5 and yolov6 models for plant leaf disease detection. Eng. Technol. Appl. Sci. Res. 2024, 14, 13714–13719. [Google Scholar] [CrossRef]
Sun, H.; Nicholaus, I.T.; Fu, R.; Kang, D.K. YOLO-FMDI: A Lightweight YOLOv8 Focusing on a Multi-Scale Feature Diffusion Interaction Neck for Tomato Pest and Disease Detection. Electronics 2024, 13, 2974. [Google Scholar] [CrossRef]
Rehana, H.; Ibrahim, M.; Ali, M.H. Plant disease detection using region-based convolutional neural network. arXiv 2023, arXiv:2303.09063. [Google Scholar] [CrossRef]
Kang, R.; Huang, J.; Zhou, X.; Ren, N.; Sun, S. Toward Real Scenery: A Lightweight Tomato Growth Inspection Algorithm for Leaf Disease Detection and Fruit Counting. Plant Phenomics 2024, 6, 0174. [Google Scholar] [CrossRef]
Wang, X.; Liu, J. Detection of small targets in cucumber disease images through global information perception and feature fusion. Front. Sustain. Food Syst. 2024, 8, 1366387. [Google Scholar] [CrossRef]
Lee, Y.S.; Patil, M.P.; Kim, J.G.; Seo, Y.B.; Ahn, D.H.; Kim, G.D. Hyperparameter Optimization for Tomato Leaf Disease Recognition Based on YOLOv11m. Plants 2025, 14, 653. [Google Scholar] [CrossRef]
Zhang, Y.; Zhou, G.; Chen, A.; He, M.; Li, J.; Hu, Y. A precise apple leaf diseases detection using BCTNet under unconstrained environments. Comput. Electron. Agric. 2023, 212, 108132. [Google Scholar] [CrossRef]
Wang, S.; Xu, D.; Liang, H.; Bai, Y.; Li, X.; Zhou, J.; Su, C.; Wei, W. Advances in deep learning applications for plant disease and pest detection: A review. Remote Sens. 2025, 17, 698. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef]
Abinaya, S.; Kumar, K.U.; Alphonse, A.S. Cascading Autoencoder with Attention Residual U-Net for Multi-Class Plant Leaf Disease Segmentation and Classification. IEEE Access 2023, 11, 98153–98170. [Google Scholar] [CrossRef]
Deng, Y.; Xi, H.; Zhou, G.; Chen, A.; Wang, Y.; Li, L.; Hu, Y. An effective image-based tomato leaf disease segmentation method using MC-UNet. Plant Phenomics 2023, 5, 0049. [Google Scholar] [CrossRef] [PubMed]
Yang, Y.; Wang, C.; Zhao, Q.; Li, G.; Zang, H. Se-swin unet for image segmentation of major maize foliar diseases. Eng. Agríc. 2024, 44, e20230097. [Google Scholar] [CrossRef]
Zhang, X.; Li, D.; Liu, X.; Sun, T.; Lin, X.; Ren, Z. Research of segmentation recognition of small disease spots on apple leaves based on hybrid loss function and cbam. Front. Plant Sci. 2023, 14, 1175027. [Google Scholar] [CrossRef]
Zhou, H.; Peng, Y.; Zhang, R.; He, Y.; Li, L.; Xiao, W. GS-DeepLabV3+: A Mountain Tea Disease Segmentation Network Based on Improved Shuffle Attention and Gated Multidimensional Feature Extraction. Crop Prot. 2024, 106762. [Google Scholar] [CrossRef]
Zhu, S.; Ma, W.; Lu, J.; Ren, B.; Wang, C.; Wang, J. A novel approach for apple leaf disease image segmentation in complex scenes based on two-stage DeepLabv3+ with adaptive loss. Comput. Electron. Agric. 2023, 204, 107539. [Google Scholar] [CrossRef]
Yang, R.; Guo, Y.; Hu, Z.; Gao, R.; Yang, H. Semantic segmentation of cucumber leaf disease spots based on ECA-SegFormer. Agriculture 2023, 13, 1513. [Google Scholar] [CrossRef]
Wang, H.; Ding, J.; He, S.; Feng, C.; Zhang, C.; Fan, G.; Wu, Y.; Zhang, Y. MFBP-UNet: A network for pear leaf disease segmentation in natural agricultural environments. Plants 2023, 12, 3209. [Google Scholar] [CrossRef]
Zhang, W.; Wang, Y.; Shen, G.; Li, C.; Li, M.; Guo, Y. Tobacco leaf segmentation based on improved mask RCNN algorithm and SAM model. IEEE Access 2023, 11, 103102–103114. [Google Scholar] [CrossRef]
Afzaal, U.; Bhattarai, B.; Pandeya, Y.R.; Lee, J. An instance segmentation model for strawberry diseases based on mask R-CNN. Sensors 2021, 21, 6565. [Google Scholar] [CrossRef] [PubMed]
Johnson, J.; Sharma, G.; Srinivasan, S.; Masakapalli, S.K.; Sharma, S.; Sharma, J.; Dua, V.K. Enhanced field-based detection of potato blight in complex backgrounds using deep learning. Plant Phenomics 2021, 2021, 9835724. [Google Scholar] [CrossRef] [PubMed]
Vora, K.; Padalia, D. An ensemble of convolutional neural networks to detect foliar diseases in apple plants. arXiv 2022, arXiv:2210.00298. [Google Scholar] [CrossRef]
Yadav, A.; Thakur, U.; Saxena, R.; Pal, V.; Bhateja, V.; Lin, J.C.W. AFD-Net: Apple Foliar Disease multi classification using deep learning on plant pathology dataset. Plant Soil 2022, 477, 595–611. [Google Scholar] [CrossRef]
Zuo, X.; Chu, J.; Shen, J.; Sun, J. Multi-granularity feature aggregation with self-attention and spatial reasoning for fine-grained crop disease classification. Agriculture 2022, 12, 1499. [Google Scholar] [CrossRef]
Chen, Z.; Peng, Y.; Jiao, J.; Wang, A.; Wang, L.; Lin, W.; Guo, Y. MD-Unet for tobacco leaf disease spot segmentation based on multi-scale residual dilated convolutions. Sci. Rep. 2025, 15, 2759. [Google Scholar] [CrossRef]
Thapa, R.; Zhang, K.; Snavely, N.; Belongie, S.; Khan, A. The Plant Pathology Challenge 2020 data set to classify foliar disease of apples. Appl. Plant Sci. 2020, 8, e11390. [Google Scholar] [CrossRef] [PubMed]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8759–8768. [Google Scholar]
Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: A simple and strong anchor-free object detector. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 1922–1933. [Google Scholar] [CrossRef]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
Neubeck, A.; Van Gool, L. Efficient non-maximum suppression. In Proceedings of the IEEE 18th International Conference on Pattern Recognition (ICPR’06), Hong Kong, 20–24 August 2006; Volume 3, pp. 850–855. [Google Scholar]
Rosner, B.; Glynn, R.J.; Lee, M.L.T. The Wilcoxon signed rank test for paired comparisons of clustered data. Biometrics 2006, 62, 185–192. [Google Scholar] [CrossRef]
Lin, T.Y.; Maire, M.; Belongie, S.; Bourdev, L.; Girshick, R.; Hays, J.; Perona, P.; Ramanan, D.; Zitnick, C.L.; Dollár, P. Microsoft COCO: Common Objects in Context. arXiv 2015, arXiv:1405.0312. [Google Scholar] [CrossRef]
GB/T 17980.124-2004; The National Standard of the People’s Republic of China: Pesticide–Guidelines for the Field Efficacy Trials (I)–Fungicides Against Cucumber Powdery Mildew. State Administration for Market Regulation: Beijing, China, 2004.
GB/T 17980.30-2000; The National Standard of the People’s Republic of China: Pesticide–Guidelines for the Field Efficacy Trials (I)–Fungicides Against Cucumber Powdery Mildew. State Administration for Market Regulation: Beijing, China, 2000.
Bolya, D.; Zhou, C.; Xiao, F.; Lee, Y.J. Yolact: Real-time instance segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9157–9166. [Google Scholar]
Wang, X.; Zhang, R.; Kong, T.; Li, L.; Shen, C. Solov2: Dynamic and fast instance segmentation. Adv. Neural Inf. Process. Syst. 2020, 33, 17721–17732. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLO. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 7 February 2025).
Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. Yolov9: Learning what you want to learn using programmable gradient information. arXiv 2024, arXiv:2402.13616. [Google Scholar] [CrossRef]
Wang, D.; Wang, J.; Li, W.; Guan, P. T-CNN: Trilinear convolutional neural networks model for visual detection of plant diseases. Comput. Electron. Agric. 2021, 190, 106468. [Google Scholar] [CrossRef]

Figure 1. Apple and cucumber leaf disease dataset (FELS: frog-eye leaf spot; R: rust; S: scab. PM: powdery mildew; PD: physiological disease).

Figure 2. Comparison of original and augmented apple leaf images with single and mixed disease insertions.

Figure 3. Architecture of the apple leaf mixed disease recognition (ALMDR) framework, illustrating three task-specific heads for leaf segmentation, lesion segmentation, and multi-label disease classification.

Figure 4. Feature pyramid network (FPN) and its variants.

Figure 5. Lesion segmentation head (LeSH) architecture with classification and regression branches.

Figure 6. Comparison of classification confidence score computation in ALMDR and other models.

Figure 7. Performance comparison of disease classification models.

Figure 8. Comparison of detection and segmentation precision results with different models across varying IoU thresholds and object sizes on the apple leaf disease dataset.

Figure 9. Comparison of severity level estimation between different models on the apple leaf disease dataset.

Figure 10. Comparison of severity level estimation between different models on the cucumber leaf disease dataset.

Figure 11. Comparison of leaf and lesion segmentation results across different models under challenging conditions.

Table 1. Distribution of images and instances across disease categories in train, validation, and test sets of the Cucumber leaf disease dataset.

	Image Information					Instance Information
	Health	PM	PD	PW&PD	Total	PM	PD	Total
Train	200	200	200	200	800	8540	884	9424
Val	25	25	25	25	100	1098	81	1179
Test	25	25	25	25	100	1109	78	1187

Table 2. Distribution of images and instances across disease categories in train, validation, and test sets of the Apple leaf disease dataset.

		Image Information									Instance Information
		Health	Rust	Scab	FELS	PM	R&S	R&FELS	S&FELS	Total	Rust	Scab	FELS	PM	Total
Train	Initial	3647	1460	3801	2497	845	540	84	124	12,998	16,030	37,915	17,525	902	71,470
	Augmented	0	0	0	0	0	824	1396	1344	3564	1020	6392	2467	0	9879
	Total	3647	1460	3801	2497	845	1364	1480	1468	16,562	17,050	44,307	19,992	902	81,349
Val		456	183	475	312	106	68	10	16	1626	1626	2804	799	88	5229
Test		456	183	475	312	106	68	10	16	1626	1626	2789	748	92	5163

Table 3. Criteria for disease severity levels and sample distribution in the test set. ( r: The ratio of affected area to total area.)

	Criteria for Apple Leaf Disease Severity Levels		Criteria for Cucumber Disease Severity Levels
	Ratio (%)	Sample Count	Ratio (%)	Sample Count
Level 0	$0 %$	456	$0 %$	25
Level 1	$0 % < r \leq 10 %$	284	$0 % < r \leq 5 %$	7
Level 3	$10 % < r \leq 25 %$	319	$5 % < r \leq 10 %$	6
Level 5	$25 % < r \leq 40 %$	201	$10 % < r \leq 20 %$	22
Level 7	$40 % < r \leq 65 %$	178	$20 % < r \leq 40 %$	19
Level 9	$r > 65 %$	188	$r > 40 %$	21

Table 4. Comparison of multi-label and multi-class classification results across different models on the Apple leaf disease dataset.

Model	Multi-Label Classification							Multi-Class Classification
	Hamming Loss (%) ↓	One-Error (%) ↓	Zero-One Loss (%) ↓	Example-Based ↑			$p$ -Value	Macro ↑			Micro ↑			$p$ -Value
	Hamming Loss (%) ↓	One-Error (%) ↓	Zero-One Loss (%) ↓	Precision (%)	Recall (%)	F1-Score (%)	$p$ -Value	Precision (%)	Recall (%)	F1-Score (%)	Precision (%)	Recall (%)	F1-Score (%)	$p$ -Value
YOLACT	8.70	9.60	9.73	88.70	88.32	89.83	$2.26 \times 10^{- 4}$	86.79	86.16	87.36	87.76	87.85	87.80	$3.02 \times 10^{- 4}$
FCOS	8.41	8.24	9.29	89.93	89.94	90.05	$3.45 \times 10^{- 4}$	88.96	88.90	89.32	89.57	88.73	89.15	$4.37 \times 10^{- 4}$
SOLOv1	10.65	11.47	11.78	85.82	85.91	87.48	$1.00 \times 10^{- 4}$	83.78	82.95	82.60	84.18	84.93	84.55	$2.54 \times 10^{- 4}$
SOLOv2	9.11	8.46	10.49	86.38	87.61	88.07	$1.68 \times 10^{- 4}$	84.55	84.47	84.04	86.74	86.83	86.78	$2.84 \times 10^{- 4}$
YOLOv7	8.39	8.54	9.02	90.02	91.04	91.03	$4.23 \times 10^{- 4}$	90.33	89.43	90.17	90.52	90.05	90.28	$5.18 \times 10^{- 4}$
YOLOv8	8.30	8.49	8.98	91.31	92.88	92.29	$5.20 \times 10^{- 4}$	91.29	90.87	91.64	91.18	91.15	91.16	$6.26 \times 10^{- 4}$
YOLOv9	8.27	8.47	8.89	92.41	92.96	92.91	$5.64 \times 10^{- 4}$	92.54	91.12	93.09	92.44	91.73	92.08	$6.45 \times 10^{- 4}$
ALMDR	7.94	8.46	8.75	93.19	94.34	93.74	-	94.40	92.49	93.66	93.72	92.35	93.03	-

Table 5. Comparison of multi-label and multi-class classification results across different models on the Cucumber leaf disease dataset.

Model	Multi-Label Classification							Multi-Class Classification
	Hamming Loss (%) ↓	One-Error (%) ↓	Zero-One Loss (%) ↓	Example-Based ↑			p-Value	Macro ↑			Micro ↑			p-Value
	Hamming Loss (%) ↓	One-Error (%) ↓	Zero-One Loss (%) ↓	Precision (%)	Recall (%)	F1-Score (%)	p-Value	Precision (%)	Recall (%)	F1-Score (%)	Precision (%)	Recall (%)	F1-Score (%)	p-Value
YOLACT	8.05	8.07	8.63	89.28	88.64	90.35	$2.08 \times 10^{- 4}$	87.31	86.51	88.14	89.39	88.52	88.95	$3.72 \times 10^{- 4}$
FCOS	8.06	7.94	8.49	90.18	90.59	90.98	$2.10 \times 10^{- 4}$	89.25	89.61	89.65	90.15	89.18	89.66	$3.07 \times 10^{- 4}$
SOLOv1	9.60	9.32	9.21	86.62	86.13	87.86	$7.03 \times 10^{- 5}$	84.58	83.20	83.32	87.06	85.54	86.29	$8.40 \times 10^{- 5}$
SOLOv2	8.26	8.96	9.54	87.08	87.89	88.59	$1.03 \times 10^{- 4}$	85.10	85.26	84.36	88.52	87.27	87.89	$1.40 \times 10^{- 4}$
YOLOv7	8.00	7.77	8.07	90.42	91.67	91.34	$3.25 \times 10^{- 4}$	90.55	90.26	90.54	91.00	90.74	90.87	$4.38 \times 10^{- 4}$
YOLOv8	7.94	7.69	7.78	92.09	93.15	92.84	$3.96 \times 10^{- 4}$	91.60	91.61	91.90	91.68	91.56	91.62	$4.89 \times 10^{- 4}$
YOLOv9	7.87	7.44	7.99	92.77	93.43	93.60	$4.03 \times 10^{- 4}$	92.81	91.97	93.48	92.90	92.26	92.58	$5.76 \times 10^{- 4}$
ALMDR	7.59	7.31	7.90	93.40	95.16	94.08	-	94.63	93.37	94.00	94.29	92.77	93.52	-

Table 6. Comparison of detection and segmentation performance results with different models.

Model	Apple Leaf Disease Dataset				Cucumber Leaf Disease Dataset				FPS ↑	FLOPS (G) ↓	Parameter (M) ↓
	Detection ↑		Segmentation ↑		Detection ↑		Segmentation ↑
	mAP (%)	mAR (%)	mAP (%)	mAR (%)	mAP (%)	mAR (%)	mAP (%)	mAR (%)
YOLACT	44.88	52.22	40.94	44.94	48.43	55.78	45.29	47.84	6.20	67.05	49.61
FCOS	46.19	52.08	-	-	50.18	55.80	-	-	6.14	70.81	50.96
SOLOv1	-	-	30.91	40.18	-	-	35.70	42.90	5.86	125.80	55.07
SOLOv2	-	-	35.05	41.57	-	-	39.73	45.03	5.83	125.91	65.36
YOLOv7	46.83	54.67	41.66	44.28	51.95	58.02	46.45	47.51	6.02	230.91	73.36
YOLOv8	48.50	55.87	42.46	45.07	53.49	59.44	48.03	48.96	6.08	200.06	47.75
YOLOv9	50.11	56.78	44.31	46.71	54.46	60.57	49.99	49.83	6.23	145.50	27.40
ALMDR	51.32	58.71	45.50	48.10	55.55	62.61	50.96	51.89	6.25	74.78	45.27

Table 7. Ablation results for the effectiveness of GFPN and MLCH compared to other FPN variants.

Model	Without MLCH	With MLCH	Detection ↑		Segmentation ↑		FPS ↑	FLOPs (G) ↓	Parameters (M) ↓
Model	Without MLCH	With MLCH	mAP (%)	mAR (%)	mAP (%)	mAR (%)	FPS ↑	FLOPs (G) ↓	Parameters (M) ↓
FPN		✓	46.99	53.59	44.37	44.14	6.24	77.67	46.32
PANet		✓	47.08	55.36	45.09	45.36	6.17	78.38	46.46
Bi-FPN		✓	49.53	57.43	45.06	46.37	5.96	80.8	46.57
GFPN	✓		50.89	58.05	45.11	47.55	6.26	73.78	45.25
GFPN		✓	51.32	58.71	45.50	48.10	6.25	74.78	45.27

Table 8. Performance improvements of integrating GFPN and MLCH into YOLO variants.

	GFPN	MLCH	Detection ↑		Segmentation ↑		FPS ↑	FLOPs (G) ↓	Parameter (M) ↓
	GFPN	MLCH	mAP (%)	mAR (%)	mAP (%)	mAR (%)	FPS ↑	FLOPs (G) ↓	Parameter (M) ↓
YOLOv7	✗	✗	46.83	54.67	41.66	44.28	6.02	230.91	73.36
	✓	✗	47.35 (↑ 0.52)	55.26 (↑ 0.59)	41.95 (↑ 0.29)	44.66 (↑ 0.38)	6.03 (↑ 0.01)	229.35 (↓ 1.56)	73.25 (↓ 0.11)
	✗	✓	47.00 (↑ 0.17)	54.88 (↑ 0.21)	41.77 (↑ 0.11)	44.60 (↑ 0.32)	6.02 (0.00)	231.59 (↑ 0.68)	73.41 (↑ 0.05)
	✓	✓	47.70 (↑ 0.87)	55.64 (↑ 0.97)	42.36 (↑ 0.70)	45.18 (↑ 0.90)	6.02 (0.00)	230.03 (↓ 0.88)	73.30 (↓ 0.06)
YOLOv8	✗	✗	48.50	55.87	42.46	45.07	6.08	200.06	47.75
	✓	✗	48.85 (↑ 0.35)	56.24 (↑ 0.37)	42.93 (↑ 0.47)	45.26 (↑ 0.19)	6.09 (↑ 0.01)	198.51 (↓ 1.55)	47.64 (↓ 0.11)
	✗	✓	48.78 (↑ 0.28)	56.02 (↑ 0.15)	42.86 (↑ 0.40)	45.17 (↑ 0.10)	6.08 (0.00)	200.74 (↑ 0.68)	47.80 (↑ 0.05)
	✓	✓	49.33 (↑ 0.83)	56.58 (↑ 0.71)	43.07 (↑ 0.61)	45.47 (↑ 0.40)	6.08 (0.00)	199.19 (↓ 0.87)	47.69 (↓ 0.06)
YOLOv9	✗	✗	50.11	56.78	44.31	46.71	6.23	145.5	27.4
	✓	✗	50.44 (↑ 0.33)	57.20 (↑ 0.42)	44.46 (↑ 0.15)	46.93 (↑ 0.22)	6.24 (↑ 0.01)	143.97 (↓ 1.53)	27.29 (↓ 0.11)
	✗	✓	50.35 (↑ 0.24)	56.98 (↑ 0.20)	44.39 (↑ 0.08)	46.91 (↑ 0.20)	6.23 (0.00)	146.18 (↑ 0.68)	27.45 (↑ 0.05)
	✓	✓	50.89 (↑ 0.78)	57.50 (↑ 0.72)	45.00 (↑ 0.69)	47.29 (↑ 0.58)	6.23 (0.00)	144.64 (↓ 0.86)	27.34 (↓ 0.06)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Luan, P.; Guo, N.; Li, L.; Li, B.; Zhao, Z.; Ma, L.; Liu, B. Accurate and Efficient Recognition of Mixed Diseases in Apple Leaves Using a Multi-Task Learning Approach. Agriculture 2026, 16, 71. https://doi.org/10.3390/agriculture16010071

AMA Style

Luan P, Guo N, Li L, Li B, Zhao Z, Ma L, Liu B. Accurate and Efficient Recognition of Mixed Diseases in Apple Leaves Using a Multi-Task Learning Approach. Agriculture. 2026; 16(1):71. https://doi.org/10.3390/agriculture16010071

Chicago/Turabian Style

Luan, Peng, Nawei Guo, Libo Li, Bo Li, Zhanmin Zhao, Li Ma, and Bo Liu. 2026. "Accurate and Efficient Recognition of Mixed Diseases in Apple Leaves Using a Multi-Task Learning Approach" Agriculture 16, no. 1: 71. https://doi.org/10.3390/agriculture16010071

APA Style

Luan, P., Guo, N., Li, L., Li, B., Zhao, Z., Ma, L., & Liu, B. (2026). Accurate and Efficient Recognition of Mixed Diseases in Apple Leaves Using a Multi-Task Learning Approach. Agriculture, 16(1), 71. https://doi.org/10.3390/agriculture16010071

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Accurate and Efficient Recognition of Mixed Diseases in Apple Leaves Using a Multi-Task Learning Approach

Abstract

1. Introduction

2. Related Work

2.1. Classification-Based Methods

2.2. Detection-Based Methods

2.3. Segmentation-Based Methods

3. Materials

3.1. Dataset Construction

3.2. Data Augmentation

3.3. Data Annotation

3.4. Methodology

3.4.1. ALMDR

3.4.2. Group Feature Pyramid Network

3.4.3. Multi-Label Classification Head

3.4.4. Lesion Segmentation Head

3.4.5. Leaf Segmentation Head

4. Results

4.1. Implementation Details

4.2. Evaluation Metrics

4.3. Comparative Analysis of Multi-Label Disease Classification

4.4. Evaluation of Instance Segmentation of Leaves and Lesion Regions

4.5. Evaluation of Disease Severity Estimation Performance

4.6. Ablation Study

5. Discussion

5.1. Cross-Model Applicability of GFPN and MLCH Modules

5.2. Visualization of Leaf and Lesion Segmentation Results Across Different Models

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI