Optimization of Deep Learning Model Based on Attention-Guided PCA Dimensionality Reduction

Xu, Kangkai; Yu, Jinpeng; Zhu, Fenghua; Li, Zheng; Li, Xiaowei

doi:10.3390/horticulturae11111346

Open AccessArticle

Optimization of Deep Learning Model Based on Attention-Guided PCA Dimensionality Reduction

by

Kangkai Xu

¹

,

Jinpeng Yu

²,

Fenghua Zhu

³

,

Zheng Li

^1,*

and

Xiaowei Li

¹

School of Rail Transportation, Shandong Jiaotong University, Jinan 250357, China

²

School of Computer Engineering, Chengdu Technological University, Chengdu 611730, China

³

Institute of Automation, Chinese Academy of Sciences, Beijing 100000, China

^*

Author to whom correspondence should be addressed.

Horticulturae 2025, 11(11), 1346; https://doi.org/10.3390/horticulturae11111346

Submission received: 29 September 2025 / Revised: 29 October 2025 / Accepted: 7 November 2025 / Published: 9 November 2025

(This article belongs to the Section Plant Pathology and Disease Management (PPDM))

Download

Browse Figures

Versions Notes

Abstract

Plant diseases have a large impact on agricultural production, leading to crop yield reduction and causing economic losses. For the development of intelligent agriculture, it is very important to identify crop diseases accurately. With the help of image recognition methods, precise prevention and control of diseases can be achieved, which significantly reduces the use of pesticides and ultimately improves crop yield and quality. Therefore, this study proposes a theoretical method that combines Attention-Guided PCA (AG-PCA) dimensionality reduction with a spatial attention mechanism. Our method is verified on the ResNet model. The AG-PCA module dynamically selects principal component features based on attention weights, which greatly preserves key disease features during dimensionality reduction. At the same time, a spatial attention mechanism is embedded in the residual blocks to enhance the representation ability of disease regions and suppress background interference. On the AppleLeaf9 dataset containing 10,211 images of 9 disease categories, the model achieved an accuracy of 93.69%, significantly outperforming the baseline methods. Experimental results indicate that it performs stably in complex backgrounds and fine-grained classification tasks, and demonstrates strong generalization ability, showing promising application potential.

Keywords:

plant disease; intelligent agriculture; AG-PCA; spatial attention; AppleLeaf9

1. Introduction

Agriculture occupies an important position in the world economic system; it is a core field for ensuring global food supply, promoting the growth of residents’ income and absorbing social employment, and makes important contributions to economic and social development [1]. As a major economic crop, the yield and quality of apples are seriously threatened by plant diseases, and the economic losses caused by this every year are considerable [2]. At present, common methods for identifying plant leaf diseases include manual identification [3], chemical analysis [4], and image recognition and other methods. Among them, image recognition methods based on deep learning have gradually attracted more attention due to their advantages such as fast speed and low cost [5,6,7].

With the development of computer vision technology, apple leaf disease recognition has made significant progress, but still faces challenges such as limited sample size, complex background interference, difficulty in fine segmentation, and insufficient detection and generalization ability on mobile devices. To address the above challenges, researchers have explored them from different perspectives: Ali et al. proposed the AppleLeafNet model, which uses a lightweight 37-layer convolutional network, achieving a classification accuracy of 98.6% on public datasets [8]. Bi et al. combined the improved HRNet with the DRL-watershed algorithm to achieve pixel-level segmentation of disease severity, effectively improving recognition accuracy under complex backgrounds [9]. Gao et al. conducted rapid detection based on a lightweight YOLOv8 model, achieving a balance between speed and accuracy on mobile devices [10]. Ahmed et al. proposed the MCFFA-Net model, which improves classification performance through contextual feature fusion and attention mechanisms, achieving an accuracy of 90.86% [11]. Sundhar et al. combined the Graph Attention Network (GAT) with the Graph Convolutional Network (GCN), enhancing superpixel features of apple leaves and achieving an F1-score of 0.9818 [12]. Overall, deep learning has formed a relatively mature technical framework in the field of apple disease recognition, and multiple models generally achieve recognition accuracy above 90% on laboratory datasets. However, since most studies still rely mainly on standardized laboratory samples, further exploration is needed to achieve effective application in complex real-world scenarios [13].

Scholars have conducted relevant research on the problem of leaf disease recognition under complex backgrounds. Zhao et al. reviewed strategies such as Convolutional Neural Networks (CNNs), YOLO architectures, and lightweight model optimization, and analyzed key technologies for improving recognition performance under complex backgrounds [14]. Zhou et al. adopted the Region Proposal and Progressive Learning (PRP-Net) method, achieving an average accuracy of 98.26% in vegetable leaf disease recognition under complex backgrounds [15]. Ashurov et al. proposed an improved deep Convolutional Neural Network (CNN) method, combining depthwise separable convolution, Squeeze-and-Excitation (SE) modules, and improved residual skip connections, effectively enhancing the model’s feature extraction ability under complex backgrounds [16]. Studies have found that emphasizing key features of plant leaf diseases through feature enhancement can significantly improve model recognition accuracy. For example, a multimodal collaborative method proposed in [17] combines deep transfer learning, Canny edge detection, chromatic intensity analysis, and customized data augmentation to effectively highlight key features; the LBPAttNet model proposed in [18] integrates a lightweight coordinate attention mechanism with LBP local features, achieving accuracies of 92.78% and 98.13%, respectively; the method in [19] uses SIFT feature extraction combined with FCM clustering and LSTM classification, improving recognition accuracy to 96%, highlighting the importance of feature extraction.

Existing studies have shown that commonly used key features include edge features [20], texture features [21], morphological features [22], pixel features, and frequency domain features. However, these features often rely on specific algorithm modules for enhancement, and therefore usually can only optimize one type of feature at a time, making it difficult to comprehensively optimize multiple key features simultaneously. Ref. [23] proposed using the PCA method to reduce the dimensionality of 8192-dimensional deep high-level features extracted by VGG16, retaining key discriminative principal components, and combining the BBBC optimization algorithm to eliminate redundant features, thereby reducing computation time and providing efficient input for the ANN classifier, ultimately improving classification accuracy. Inspired by this, we consider whether the principal component selection characteristic of PCA can be used to extract multiple key features simultaneously under complex backgrounds, in order to further improve model performance.

Therefore, this paper designs an Attention-Guided PCA (AG-PCA) module and introduces it into the disease recognition model. AG-PCA is used to replace the traditional MaxPooling, retaining effective features as much as possible while performing dimensionality reduction, and embedding a spatial attention mechanism within the residual blocks. The combination of the two significantly enhances the model’s discriminative ability and classification accuracy. The main contributions of this paper are as follows:

Proposed the AG-PCA feature optimization module: Quantifies the discriminative value of features through attention weights and dynamically adjusts the selection priority of PCA principal components, retaining key relevant features while performing dimensionality reduction.
Embedded a spatial attention mechanism in residual blocks: Enhances the feature representation of disease regions and suppresses background redundancy, providing high-quality input for the AG-PCA module and achieving collaborative optimization.
Conducted experiments on the AppleLeaf9 dataset: The dataset contains 10,211 images covering 9 categories of apple leaf diseases; results show that the improved model achieves a recognition accuracy of 93.69%, significantly outperforming the baseline model.
Validated the core role of the AG-PCA module through ablation experiments: Demonstrates that its combination with the spatial attention mechanism brings more significant performance gains.

2. Materials and Methods

2.1. Experimental Data Preparation

For model performance evaluation, the AppleLeaf9 dataset, which provides a realistic representation of apple leaf disease scenarios, was employed. The dataset contains a total of 10,211 images across nine categories (eight apple leaf disease types and healthy leaves), with the detailed sample distribution summarized in Table 1. Images were captured using smartphones and DSLR cameras under varying illumination conditions, complex backgrounds, and different disease development stages, thereby reflecting the diversity and complexity of apple leaf diseases in real-world settings. To ensure experimental rigor, a stratified sampling strategy was applied to split the dataset into a training set (8168 images) and a validation set (2043 images), which were used for feature learning, parameter optimization, and performance monitoring [24].

2.2. Model Establishment

The ResNet18-AttentionPCA model proposed in this study adopts a four-stage progressive architecture, and its overall structure is shown in Figure 1. This model integrates the attention mechanism with principal component analysis (PCA), achieving a complete processing flow from raw image input to final classification prediction. Specifically, the implementation of each stage and the interaction between core modules are described as follows.

Steps 1–2: Data Input and Preprocessing The original leaf images are first used as the input of the model. Before entering the network, data augmentation is performed by random cropping (from

256 \times 256

to

224 \times 224

), horizontal flipping, ±20° rotation, color adjustment, and random grayscaling, in order to increase the diversity of samples and effectively improve the generalization ability of the model.

Step 3: Multi-stage Feature Extraction The feature extraction module employs a four-stage progressive architecture, following a cyclic logic of spatial compression, channel expansion, and attention-guided residual enhancement. Spatial features are extracted through convolution and PCA-based downsampling modules, while semantic enhancement is achieved via residual blocks embedded with spatial attention. The residual block structure ensures that gradients can effectively propagate through deep layers, avoiding vanishing or exploding gradients. Simultaneously, the attention-guided loss allows gradients to backpropagate directly through the attention maps, enabling the model to automatically strengthen responses to critical lesion regions.

In each stage, PCA is computed on each batch, where PCA is performed only on the current feature map for each batch and weighted according to the attention weights. This means that PCA is involved in both forward feature compression and receives gradient information during backpropagation, thus achieving tight integration with the attention mechanism. Stage 1 uses 2 principal components, while Stage 2 to Stage 4 use 1 principal component. With this design, the model is able to retain important feature information while reducing spatial dimensions, ensuring smooth gradient backpropagation.

Given an input preprocessed feature map of size

112 \times 112 \times 64

, the network progressively compresses spatial dimensions via attention-guided PCA downsampling: Stage 1 downsamples to

56 \times 56

and projects onto 2 principal components; Stages 2–4 further reduce the resolution to

28 \times 28

,

14 \times 14

, and

7 \times 7

, respectively, while maintaining the channel number. In each stage, residual blocks and

1 \times 1

convolutions gradually expand the channel dimension from 64 to 128, 256, and finally 512, with embedded spatial attention to reinforce salient structures and key classification features. This design enables the network to progressively capture shallow edges and textures, mid-level leaf vein features, and high-level shapes and lesion information, integrating deep semantic features to achieve high classification accuracy while maintaining computational efficiency, ultimately producing a

7 \times 7 \times 512

deep semantic feature map. Compared with the baseline ResNet18 model, this design increases by an appropriate computational overhead in image processing, as shown in Table 2, but it can significantly enhance feature extraction capability, ultimately improving classification accuracy by 3.51%.

Step 4: Classification Decision In the classification stage, the

7 \times 7 \times 512

deep semantic feature map processed by the fourth stage is first converted into a

1 \times 1 \times 512

feature vector through adaptive average pooling, which compresses the spatial dimensions while retaining global semantic information. Then, the feature vector is mapped to the class space through a linear layer, and class probabilities can be calculated using the Softmax function. The class with the highest probability is selected as the final prediction.

Feature visualization shows that after multi-stage feature extraction, the features of the original images gradually cluster into clearly separable classes, validating the model’s effectiveness in feature extraction and discrimination. By combining the attention mechanism with PCA-based dimensionality reduction, this architecture maintains classification accuracy while reducing computational complexity, providing an efficient solution for disease classification tasks on high-resolution images.

2.3. Attention-Guided PCA Downsampling Module

In deep convolutional networks, traditional max pooling or average pooling tends to lose critical information during dimensionality reduction, especially in fine-grained or high-dimensional tasks where the discriminative ability of local features may degrade [25]. To address this, we propose the Attention-Guided Principal Component Analysis downsampling module (AG-PCA), which combines the linear dimensionality reduction property of PCA with the task-adaptive feature selection capability of attention mechanisms, achieving efficient feature compression while preserving task-relevant statistical information (as shown in Figure 2).

In AG-PCA, the input feature map

X \in R^{C \times H \times W}

is first divided into non-overlapping

2 \times 2

spatial blocks. Each block, containing adjacent pixels, is flattened into a 4-dimensional vector

x_{i} \in R^{4}, i = 1, 2, \dots, N

, forming a set of local features to capture local statistical structures (Figure 2a). Each block vector is then centered as

{\hat{x}}_{i} = x_{i} - \bar{x},

(1)

\bar{x} = \frac{1}{N} \sum_{i = 1}^{N} x_{i},

(2)

which removes the global bias and enables PCA to focus on local variations. The unbiased covariance matrix of the centered blocks is computed as

C = \frac{1}{N - 1} \sum_{i = 1}^{N} {\hat{x}}_{i} {\hat{x}}_{i}^{⊤},

(3)

where

C \in R^{4 \times 4}

describes the statistical dependencies among local pixels (Figure 2b). Eigenvalue decomposition is performed to obtain eigenvectors

V = [v_{1}, v_{2}, v_{3}, v_{4}]

and corresponding eigenvalues

λ = {[λ_{1}, λ_{2}, λ_{3}, λ_{4}]}^{⊤}

(Figure 2c), where the eigenvalues represent the variance explained by each principal component and the eigenvectors provide the projection directions.

To incorporate task relevance, AG-PCA employs a two-layer fully connected network to generate attention weights

w = Softmax (W_{2} \cdot ReLU (W_{1} λ)),

(4)

w \in R^{4},

(5)

and applies these weights to the eigenvalues to form a weighted covariance matrix

{\tilde{λ}}_{j} = λ_{j} \cdot w_{j},

(6)

\tilde{C} = V diag (\tilde{λ}) V^{⊤},

(7)

allowing spectral-domain attention to dynamically adjust the contributions of principal components, emphasizing task-relevant features while suppressing redundant information (Figure 2d).

During the principal component projection stage, the top k-weighted eigenvectors are selected to form the projection matrix

V_{k} = [v_{1}, \dots, v_{k}]

, and the centered block vectors are projected into this subspace:

y_{i} = {\hat{x}}_{i}^{⊤} V_{k},

(8)

achieving feature compression while retaining local statistical principal components (Figure 2e). The projected blocks are then aggregated along spatial and channel dimensions to generate the downsampled feature map

Y \in R^{C \times (H / 2) \times (W / 2)},

(9)

reducing spatial dimensionality while preserving task-relevant statistical characteristics (Figure 2f).

Theoretical Analysis and Comparison

The optimization objective of standard PCA is to maximize the sample variance

max_{V_{k}^{⊤} V_{k} = I} Tr (V_{k}^{⊤} C V_{k}),

(10)

whose solution is given by the eigenvalue decomposition

C = V diag (λ) V^{⊤}

. Standard PCA assumes that directions with higher variance are more important, but in practical tasks, high-variance directions may not correspond to classification or recognition tasks.

AG-PCA introduces learnable attention weights w to reweight the eigenvalues

\tilde{C} = V diag (w ⊙ λ) V^{⊤},

(11)

max_{V_{k}^{⊤} V_{k} = I} Tr (V_{k}^{⊤} \tilde{C} V_{k}),

(12)

enabling task-adaptive principal component selection. The attention in AG-PCA operates in the spectral domain, measuring the importance of each principal component via weighted eigenvalues independently of spatial or channel distributions. During backpropagation, the attention weights are trained jointly with the PCA projection matrix, coupling feature selection with linear projection, which allows stable statistical capture and interpretability in high-dimensional data.

Compared with existing PCA-attention hybrids or dynamic feature selection methods, AG-PCA differs in that it jointly optimizes feature selection and linear projection during gradient training, balancing statistical stability and task relevance in downsampling.

Finally, the downsampling operator can be expressed as

Y = f_{AG - PCA} (X) = X V diag (w ⊙ λ),

(13)

achieving a balance between information preservation and computational efficiency in deep feature compression, while providing task-adaptive feature selection capability.

2.4. Feature Enhancement Mechanisms

To enhance the model’s generalization and discriminative capabilities, this study introduces a series of feature enhancement mechanisms, implemented as follows:

2.4.1. Alpha Dropout for Regularization

During training, Alpha Dropout (with a dropout rate of 0.1) is applied to randomly deactivate certain neurons. This operation effectively prevents overfitting while preserving the mean and variance of activations, maintaining the self-normalizing property of subsequent layers, and significantly enhancing the model’s generalization ability [26].

2.4.2. Local Feature Refinement with Batch Normalization

A

3 \times 3

convolutional layer is used to capture local spatial correlations and optimize feature representations. Batch normalization is applied immediately after the convolution to standardize the activations of each mini-batch:

{\hat{z}}_{i} = \frac{z_{i} - μ_{B}}{\sqrt{σ_{B}^{2} + ϵ}} γ + β,

(14)

where

μ_{B}

and

σ_{B}^{2}

denote the mean and variance of the mini-batch

B

,

γ

and

β

are learnable scaling and shifting parameters, and

ϵ

is a small constant (typically

10^{- 5}

) to avoid division by zero. This mechanism stabilizes the training process by reducing internal covariate shift, improving feature consistency while accelerating convergence.

2.4.3. Spatial Attention Mechanism

To dynamically focus on discriminative regions, this study designs a spatial attention module. First, the input feature map is pooled along the channel dimension to generate a spatial descriptor

F \in R^{1 \times H \times W}

, capturing global spatial information. Then, the feature map is processed through a

7 \times 7

convolution layer (padding = 3) to fuse multi-scale spatial information. A Sigmoid activation function is applied to the fused features to generate spatial attention weights

M_{s} \in R^{1 \times H \times W}

, with values ranging from 0 to 1, where higher weights indicate more important regions. Finally, the input feature map is multiplied element-wise with

M_{s}

, enhancing task-relevant spatial regions while suppressing irrelevant background noise.

2.4.4. Self-Normalizing Nonlinear Transformation

The Scaled Exponential Linear Unit (SeLU) is adopted as the activation function:

SeLU (x) = λ \{\begin{matrix} x & if x > 0, \\ α (e^{x} - 1) & if x \leq 0, \end{matrix}

(15)

where

λ \approx 1.0507

and

α \approx 1.6733

are pre-defined constants. This function inherently maintains the stability of mean and variance across network layers, mitigating the vanishing gradient problem without explicit normalization, thereby simplifying network design [27].

2.4.5. Residual Connection for Feature Fusion

Enhanced features are integrated with the main network features through skip connections:

F_{o u t} = F_{m a i n} + F_{e n h a n c e d},

(16)

Here, Fmain represents the original feature map from the main network, and Fenhanced represents the optimized feature. This fusion strategy preserves low-level original information while enhancing high-level discriminative features, facilitating gradient propagation in deep networks and preventing performance degradation in deep architectures.

3. Experimental Preparation Phase

All experiments were conducted in a Python 3.7 environment and implemented based on the PyTorch 2.5.0 framework. The platform was equipped with an NVIDIA GeForce RTX 4060 Ti GPU (16 GB memory), and a fixed random seed (seed = 42) was used to ensure reproducibility of the results. Before the experiments, GPU memory allocation was tested by loading a

16 \times 3 \times 224 \times 224

tensor to verify the compatibility of the hardware environment [28].

3.1. Dataset Construction and Data Augmentation

Dataset Details

The AppleLeaf9 apple leaf disease dataset used in this experiment constructs image–label mappings by parsing the folder hierarchy [29]. First, the class labels are determined based on the names of subfolders under the root directory, including Alternaria leaf spot, Brown spot, Frogeye leaf spot, Grey spot, Health, Mosaic, Powdery mildew, Rust, and Scab. A corresponding “folder name–class index” mapping table is then created, with the specific mapping as follows:

\begin{matrix} { & ’ Alternaria leaf spot ’ : 0, ’ Brown spot ’ : 1, ’ Frogeye leaf spot ’ : 2, \\ ’ Grey spot ’ : 3, ’ Health ’ : 4, ’ Mosaic ’ : 5, \\ ’ Powdery mildew ’ : 6, ’ Rust ’ : 7, ’ Scab ’ : 8} \end{matrix}

(17)

Next, all subfolders were traversed to extract valid image file paths, supporting formats including .png, .jpg, .jpeg, .bmp, and .gif. After filtering out missing or empty files, a total of 10,211 valid samples were obtained, covering the nine apple leaf disease classes mentioned above [30].

To ensure fair model evaluation and avoid bias caused by imbalanced sample distribution, a stratified sampling strategy was used to split the dataset into training and validation sets at an 8:2 ratio, maintaining class distributions consistent with the original dataset. Specifically, the training set contains 8168 samples, and the validation set contains 2043 samples [31,32]. The class mapping has been saved in the file “apple_class_indices.json,” providing a consistent standard for subsequent visualization and interpretation of experimental results.

3.2. Training Configurations and Parameter Settings

3.2.1. Model and Optimizer

In the experimental setup, we constructed an improved ResNet18-AttentionPCA model based on the ResNet18 framework by introducing an attention-guided PCA downsampling module and a spatial attention mechanism. Considering the class scale of the AppleLeaf9 dataset, the number of neurons in the output layer was set to 9 [33].

During training, the stochastic gradient descent (SGD) optimizer was used with the following configuration: a learning rate of 0.005 for the backbone network and 0.05 for the fully connected layer to accelerate the convergence of the classification head; momentum of 0.9 to speed up training and reduce gradient oscillations; and weight decay of

5 \times 10^{- 4}

for L2 regularization to prevent overfitting.

The learning rate schedule adopted the ReduceLROnPlateau strategy: if the validation accuracy does not improve for three consecutive epochs, the learning rate is multiplied by 0.3. The minimum learning rates were set to

1 \times 10^{- 6}

for the backbone network and

1 \times 10^{- 5}

for the fully connected layer, allowing dynamic adjustment to optimize the training process.

3.2.2. Training Process Control

The training batch size was set to 64 to balance GPU memory usage and parameter update efficiency, suitable for the NVIDIA GeForce RTX 4060 Ti platform. The number of training epochs was set to 100 to ensure sufficient convergence, with the best validation accuracy of 93.69% achieved at the 32nd epoch. CrossEntropyLoss was used as the loss function to accommodate the multi-class classification task. After each epoch, the accuracy and loss on the validation set were evaluated, and only the model weights with the highest validation accuracy were saved (save path: ./apple_disease_results/apple_resnet18_attention_pca_best_model.pth).

4. Experiments and Results

4.1. Data Augmentation

The AppleLeaf9 dataset used in this study contains nine apple leaf disease categories, with sample sizes ranging from 238 to 3787 per category. The dataset was split into a training set (8168 samples) and a validation set (2043 samples) using stratified sampling with an 8:2 ratio, simulating disease classification scenarios under complex backgrounds. To enhance model robustness, multi-dimensional data augmentation was applied during training (as illustrated in Figure 3), including resizing images to

224 \times 224

pixels, random horizontal flipping (probability 0.5), random rotation within

\pm 20^{\circ}

, brightness/contrast/saturation/hue adjustments, and random grayscale conversion (probability 0.1). During validation, only fixed preprocessing (resizing and normalization) was performed to ensure consistent evaluation results.

4.2. Training and Testing Experiments

The changes in loss and accuracy during model training are shown in Figure 4, where subplot (a) presents the training and validation loss, and subplot (b) presents the training and validation accuracy.

In the loss curves, the training loss continuously decreased from 1.6188 in the first epoch to 0.2344 in the 100th epoch. Although the validation loss fluctuated, it generally decreased and stabilized, finally converging at 0.1998. The trends of the training and validation curves are similar and closely aligned, indicating no obvious overfitting.

In the accuracy curves, the training accuracy steadily increased from 38.16% to 91.66%, while the validation accuracy reached a peak of 93.69% at the 32nd epoch and first surpassed 90% at the 27th epoch. Overall, the training and validation accuracies remained consistent, demonstrating that the model has fast convergence and strong generalization capability.

4.3. Cross-Dataset Validation

To evaluate the generalization capability of the ResNet18-AttentionPCA model, a cross-dataset validation scheme was designed. The model was first trained on the AppleLeaf9 dataset and subsequently tested on the publicly available PlantVillage dataset, which contains 39 categories of crop diseases and healthy samples. During testing, all model parameters were kept fixed, and the same normalization preprocessing as in the training phase was applied to the test set without any fine-tuning operations.

Experimental results showed that the proposed model achieved an overall accuracy of 99.54% and a loss value of 0.161 on the PlantVillage dataset (as shown in Figure 5, where subfigure (a) represents validation loss and subfigure (b) denotes validation accuracy), with a macro-averaged F1-score of 99.44%. Among the apple leaf categories that were included in the training set, the F1-scores ranged from 0.9950 to 1.0000. For the remaining 35 unseen crop–disease categories, except for a few morphologically similar cases, all F1-scores exceeded 0.98.

These results indicate that the features learned through the joint optimization of attention mechanisms and PCA possess strong cross-domain adaptability and robustness. Although a small number of confusions still occurred, the overall performance demonstrates excellent generalization potential, providing valuable insights for multi-crop disease diagnosis research.

4.4. Comprehensive Evaluation of Classification Performance

To comprehensively evaluate the model’s classification performance, this study employed the confusion matrix, ROC curves, and precision–recall (PR) curves to analyze the validation set from multiple perspectives.

Figure 6 shows the confusion matrix for the validation set, presented as a heatmap illustrating the correspondence between true classes (rows) and predicted classes (columns), with color intensity representing sample counts. The results indicate that the larger classes, Scab and Frogeye leaf spot, have over 95% of samples correctly classified along the diagonal, demonstrating high classification accuracy. The small-sample class Grey spot (238 samples) achieved an accuracy of 89.70%, still maintaining relatively high performance. Notably, Alternaria leaf spot and Brown spot exhibited approximately 5.30% cross-misclassification, mainly due to the similarity in lesion shapes, indicating that the model still has room for improvement in capturing subtle feature differences.

The ROC curves and corresponding AUC values (Figure 7) further quantify the model’s class discrimination ability. All nine classes achieved AUC values above 0.960, with Scab and Frogeye leaf spot exceeding 0.980, indicating strong and stable discriminative performance for classes with sufficient samples. Even for the smallest class Grey spot, the AUC reached 0.962, demonstrating that the model maintains good robustness under data imbalance conditions.

Precision–recall (PR) curves and corresponding average precision (AP) values (Figure 8) were used to assess the model’s performance in high-sensitivity scenarios. Within the recall range above 90%, the precision for all classes remained above 85%, with the Health class achieving 92.3% precision, meeting the high-sensitivity detection requirements for early disease identification in real-world cultivation. The overall mean average precision (mAP) was 0.947, further confirming the model’s ability to accurately identify positive samples under class-imbalanced conditions.

4.5. Statistical Analysis

To validate the effectiveness of the proposed ResNet18-AttentionPCA model in apple leaf disease recognition, we conducted repeated experiments and statistical significance analysis comparing it with the baseline ResNet18 model. Experiments were repeated five times under the same data split and random seed conditions, recording the validation accuracy in each run. The results are shown in Table 3.

It can be seen that ResNet18-AttentionPCA outperforms the baseline in every experiment, with small fluctuations across runs. The means ± standard deviations are

90.22 \pm 0.16 %

for ResNet18 and

93.68 \pm 0.09 %

for ResNet18-AttentionPCA, indicating that the performance improvement of the proposed model on the validation set is stable and reliable.

To quantitatively assess the significance of the improvement, paired t-tests and one-way ANOVA were conducted on the repeated experimental data. The paired t-test result is

t (4) = - 13.43, p = 1.78 \times 10^{- 4},

(18)

and the one-way ANOVA result is

F = 230.40, p = 3.51 \times 10^{- 7},

(19)

These statistical results indicate that the performance improvement of ResNet18-AttentionPCA compared to the baseline is significant, with small data variability, further demonstrating the effectiveness and stability of the model in apple leaf disease recognition tasks.

4.6. Ablation Experiments

To validate the effectiveness of each module, ablation experiments were designed, and the results are shown in Table 4. In each experiment, key components of the model, such as the attention-guided PCA downsampling module and the spatial attention mechanism, were selectively removed or replaced to evaluate their impact on classification performance and quantify the contribution of each module to the overall model.

The ablation results show that the contributions of different modules to model performance vary. The baseline ResNet18 achieved 90.18% accuracy on the validation set. Introducing PCA downsampling increased the accuracy to 91.25%, while adding the spatial attention mechanism reached 90.72%. When both the attention-guided PCA downsampling and spatial attention were combined in the ResNet18-AttentionPCA model, the validation accuracy reached the highest value of 93.69%. These results indicate that the synergy between the two modules is the key factor for performance improvement, effectively enhancing feature discriminability.

4.7. Comparative Experiment and Analysis

Performance was further validated through comparative experiments with classical and state-of-the-art (SOTA) models. The selected classical models for fruit leaf disease classification include INAR-SSD (SSD with Inception modules and rainbow concatenation) [34], ShuffleNet [35], and ResNet50 [36]; SOTA models include ConvNeXt [36], Swin Transformer [37], MGA-YOLO [31], and EfficientNet-V2 [37]. To ensure experimental rigor, all these models were reproduced for comparison.

As shown in Table 5, on the apple leaf disease recognition dataset, the proposed ResNet18-AttentionPCA model achieved the highest accuracy of 93.69%, significantly outperforming classical models such as ShuffleNet (84.36%) and ResNet50 (86.38%), and slightly surpassing advanced models such as Swin Transformer (91.06%) and EfficientNet-V2 (92.47%). The model demonstrated stable performance on complex background samples in the dataset, highlighting its excellent classification capability and robustness.

The experimental results indicate that the synergistic design of the attention-guided PCA downsampling module and the spatial attention mechanism can effectively overcome the limitations of traditional pooling methods in static feature aggregation. This allows the model to more accurately retain key information relevant to disease recognition during feature extraction, while suppressing background noise and irrelevant features. Such multi-level feature optimization enables the ResNet18-AttentionPCA model to achieve a high accuracy of 93.69% in apple leaf disease recognition, maintaining robust classification performance even under complex backgrounds and class imbalance. Furthermore, the model demonstrates fast convergence and strong generalization ability, making it adaptable to image inputs under varying acquisition conditions. This provides an efficient and scalable technical solution for intelligent classification of plant disease images, with broad practical applications including early disease detection, precision agriculture management, and crop health monitoring.

5. Analysis and Discussion

5.1. Comparing RGB Features

To enhance the model’s capability in analyzing image information, the RGB channels of apple leaf images were optimized and visualized, as shown in Figure 9. The figure consists of four subplots: Figure 9a shows the original RGB image as a reference; Figure 9b displays the red channel feature map, which clearly captures lesion edges and texture information; Figure 9c shows the green channel feature map, which is sensitive to grayscale differences between healthy and diseased regions; Figure 9d presents the blue channel feature map, which, although exhibiting lower contrast, can effectively delineate the boundaries of specific diseases.

We further performed Grad-CAM saliency analysis on the RGB channels for three representative disease categories, as summarized in Table 6. For Grey spot, the reddish-brown lesion edges and gray-brown sporulation layer are most detectable in the red channel, indicating that the model primarily relies on the red channel for texture feature extraction. For Mosaic, the yellow-green mottled boundaries are more prominent in the green channel. For Brown spot, the lesion-to-healthy tissue boundaries are clearly delineated, relying on a combination of the green and blue channels for recognition. These results demonstrate the complementary nature of the RGB channels, confirming that optimizing individual channels can enhance the model’s feature representation for complex diseases and provide guidance for the design of the feature extraction module.

5.2. Attention Analysis of Disease Regions in Complex Backgrounds

To validate the advantage of the attention-guided PCA downsampling module in feature extraction, we conducted a visualization analysis of its feature responses (Figure 10). The figure shows four typical disease samples with three types of feature representations: Figure 10a shows the original image, used to mark key disease regions; Figure 10b shows the heatmap generated by MaxPooling, where features are dispersed and the distinction between disease and background is low; Figure 10c shows the heatmap generated by PCA downsampling, where warm areas are clearly concentrated on disease regions and background responses are weak, indicating that the module can accurately focus on task-relevant features while suppressing irrelevant regions.

It can be seen that PCA downsampling dynamically allocates feature weights, preserving key disease information and suppressing redundant background features, thereby enhancing lesion discriminability and the model’s robustness in complex backgrounds.

5.3. Discussion

Channel and spatial attention play a critical role in deep feature extraction. In the task of apple leaf disease recognition, lesion regions are often difficult to distinguish under complex backgrounds, and conventional downsampling may lead to the loss of key features. To address this, we propose a combination of attention-guided PCA downsampling (AG-PCA) and spatial attention mechanisms, enabling the model to dynamically select channel features relevant to lesion areas and enhance the spatial response of lesions, thereby improving recognition performance under complex conditions.

To further strengthen the guidance of this mechanism on feature extraction, a spatial attention module is embedded within the residual blocks, allowing channel-wise dynamic selection and spatial feature focusing to act synergistically within the network. Training, testing, ablation, and comparative experiments on the AppleLeaf9 dataset demonstrate that this framework can stably extract lesion features under varying illumination, leaf occlusion, and multi-angle acquisition scenarios, significantly improving the capture of subtle lesions. It was also observed that when multiple diseases are present on a leaf or leaf textures are complex, attention distribution may shift, leading to unstable responses for some critical features. Additionally, AG-PCA introduces extra computational and memory overhead during feature weighting and ranking, which may pose limitations for real-time deployment on mobile devices.

From an application perspective, this framework is flexible and can be combined with lightweight networks such as MobileNet and ShuffleNet. On mobile devices equipped with the Snapdragon 8 Elite processor, the ResNet18-AttentionPCA model achieves an inference speed of approximately 500 inferences/s per single image (

224 \times 224

), meeting the real-time disease recognition requirements for drones, field robots, and other mobile platforms. By integrating hyperspectral or infrared multimodal sensing technologies with edge computing platforms, it holds the potential to enable efficient and stable disease monitoring in smart agriculture.

6. Conclusions

This study focuses on the recognition of apple leaf diseases under complex backgrounds. To enhance the model’s discriminative capability for lesion regions, we proposed an Attention-Guided PCA (AG-PCA) downsampling module and incorporated a spatial attention mechanism within residual blocks, forming a synergistic feature extraction framework. By dynamically selecting key channel features through AG-PCA and highlighting lesion responses via spatial attention, the model demonstrates stable and accurate recognition under varying illumination, leaf occlusion, and multi-angle imaging conditions.

Experimental results on the AppleLeaf9 dataset show that the proposed framework achieves a classification accuracy of 93.69%, significantly outperforming the baseline model. Ablation studies further confirm the contribution of each module: introducing either AG-PCA or the spatial attention mechanism alone leads to performance improvements, while their combination yields an accuracy gain of approximately 3.51%, demonstrating the complementary advantages of “dynamic channel selection–spatial feature focusing.” This indicates that the integration of AG-PCA and spatial attention effectively enhances lesion feature responses while suppressing background interference.

In summary, this study combines dynamic feature selection with spatial attention to achieve efficient extraction of disease features under complex environments. Future work will focus on optimizing model lightweight design and inference speed for deployment on mobile devices and drones, as well as exploring multimodal feature fusion and few-shot class handling to further improve the model’s generalization and practical applicability in agricultural scenarios.

Author Contributions

Conceptualization, K.X. and Z.L.; methodology, Z.L. and K.X.; software, Z.L.; validation, Z.L. and F.Z.; investigation, J.Y. and K.X.; data curation, J.Y. and F.Z.; writing—original draft preparation, K.X. and X.L.; writing—review and editing, Z.L. and X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by two funding sources: (1) the Natural Science Foundation of China under Grant No. U24A20277; (2) the Shandong Provincial Key Research and Development Program (Innovation Capability Enhancement Project for Science and Technology-Based Small and Medium-sized Enterprises) under Grant No. 2024TSGC0932. Neither supporting source had involvement in the study design; collection, analysis, and interpretation of data; writing of the report; or the decision to submit the report for publication.

Data Availability Statement

Data is available upon request due to privacy.

Conflicts of Interest

All authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Alston, J.M.; Pardey, P.G. Agriculture in the Global Economy. J. Econ. Perspect. 2014, 28, 121–146. [Google Scholar] [CrossRef]
Li, L.; Zhang, S.; Wang, B. Plant Disease Detection and Classification by Deep Learning—A Review. IEEE Access 2021, 9, 56683–56698. [Google Scholar] [CrossRef]
Dawod, R.G.; Dobre, C. Upper and Lower Leaf Side Detection with Machine Learning Methods. Sensors 2022, 22, 2696. [Google Scholar] [CrossRef]
Ji, X.; Xue, J.; Shi, J.; Wang, W.; Zhang, X.; Wang, Z.; Lu, W.; Liu, J.; Fu, Y.V.; Xu, N. Noninvasive Raman Spectroscopy for the Detection of Rice Bacterial Leaf Blight and Bacterial Leaf Streak. Talanta 2025, 282, 126962. [Google Scholar] [CrossRef]
Jiang, X.; Wang, J.; Xie, K.; Cui, C.; Du, A.; Shi, X.; Yang, W.; Zhai, R. PlantCaFo: An Efficient Few-Shot Plant Disease Recognition Method Based on Foundation Models. Plant Phenomics 2025, 7, 100024. [Google Scholar] [CrossRef]
Peng, D.; Li, W.; Zhao, H.; Zhou, G.; Cai, C. Recognition of Tomato Leaf Diseases Based on DIMPCNET. Agronomy 2023, 13, 1812. [Google Scholar] [CrossRef]
Krishna, M.S.; Machado, P.; Otuka, R.I.; Yahaya, S.W.; Neves dos Santos, F.; Ihianle, I.K. Plant Leaf Disease Detection Using Deep Learning: A Multi-Dataset Approach. J 2025, 8, 4. [Google Scholar] [CrossRef]
Ali, M.U.; Khalid, M.; Farrash, M.; Lahza, H.F.M.; Zafar, A.; Kim, S.-H. AppleLeafNet: A Lightweight and Efficient Deep Learning Framework for Diagnosing Apple Leaf Diseases. Front. Plant Sci. 2024, 15, 1502314. [Google Scholar] [CrossRef]
Bi, Z.; Ma, F.; Guan, J.; Wu, J.; Li, J.; Li, F.; Li, Y.; Liu, Z. Apple Leaf Disease Severity Grading Based on Deep Learning and the DRL-Watershed Algorithm. Sci. Rep. 2025, 15, 30071. [Google Scholar] [CrossRef]
Gao, L.; Zhao, X.; Yue, X.; Yue, Y.; Wang, X.; Wu, H.; Zhang, X. A Lightweight YOLOv8 Model for Apple Leaf Disease Detection. Appl. Sci. 2024, 14, 6710. [Google Scholar] [CrossRef]
Ahmed, M.R.; Ashrafi, A.F.; Ahmed, R.U.; Ahmed, T. MCFFA-Net: Multi-Contextual Feature Fusion and Attention Guided Network for Apple Foliar Disease Classification. In Proceedings of the 25th International Conference on Computer and Information Technology (ICCIT), Dhaka, Bangladesh, 17 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 757–762. [Google Scholar] [CrossRef]
Sundhar, S.; Sharma, R.; Maheshwari, P.; Kumar, S.R.; Kumar, T.S. Enhancing Leaf Disease Classification Using GAT-GCN Hybrid Model. arXiv 2025, arXiv:2504.04764. [Google Scholar] [CrossRef] [PubMed]
Doutoum, A.S.; Tugrul, B. A Systematic Review of Deep Learning Techniques for Apple Leaf Diseases Classification and Detection. Peerj Comput. Sci. 2025, 11, e2655. [Google Scholar] [CrossRef]
Zhao, J.; Xu, L.; Ma, Z.; Li, J.; Wang, X.; Liu, Y.; Du, X. A Review of Plant Leaf Disease Identification by Deep Learning Algorithms. Front. Plant Sci. 2025, 16, 1637241. [Google Scholar] [CrossRef]
Zhou, J.; Li, J.; Wang, C.; Wu, H.; Zhao, C.; Wang, Q. A Vegetable Disease Recognition Model for Complex Background Based on Region Proposal and Progressive Learning. Comput. Electron. Agric. 2021, 184, 106101. [Google Scholar] [CrossRef]
Ashurov, A.Y.; Al-Gaashani, M.S.A.M.; Samee, N.A.; Alkanhel, R.; Atteia, G.; Abdallah, H.A.; Muthanna, M.S.A. Enhancing Plant Disease Detection through Deep Learning: A Depthwise CNN with Squeeze and Excitation Integration and Residual Skip Connections. Front. Plant Sci. 2025, 15, 1505857. [Google Scholar] [CrossRef]
Ametefe, D.S.; Sarnin, S.S.; Ali, D.M.; Caliskan, A.; Caliskan, I.T.; Aliu, A.A.; John, D. Enhancing Leaf Disease Detection Accuracy through Synergistic Integration of Deep Transfer Learning and Multimodal Techniques. Inf. Process. Agric. 2024, 12, 279–299. [Google Scholar] [CrossRef]
Wu, P.; Liu, J.; Jiang, M.; Zhang, L.; Ding, S.; Zhang, K. Tea Leaf Disease Recognition Using Attention Convolutional Neural Network and Handcrafted Features. Crop Prot. 2025, 190, 107118. [Google Scholar] [CrossRef]
Umamageswari, A.; Deepa, S.; Raja, K. An Enhanced Approach for Leaf Disease Identification and Classification Using Deep Learning Techniques. Meas. Sens. 2022, 24, 100568. [Google Scholar] [CrossRef]
Wang, Z.; Cui, W.; Huang, C.; Zhou, Y.; Zhao, Z.; Yue, Y.; Dong, X.; Lv, C. Framework for Apple Phenotype Feature Extraction Using Instance Segmentation and Edge Attention Mechanism. Agriculture 2025, 15, 305. [Google Scholar] [CrossRef]
Dubey, S.R.; Jalal, A. Detection and Classification of Apple Fruit Diseases Using Complete Local Binary Patterns. In Proceedings of the 3rd International Conference on Computer and Communication Technology (ICCCT), Allahabad, India, 23–25 November 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 346–351. [Google Scholar] [CrossRef]
González, E.; Sutton, T.B.; Correll, J.C. Clarification of the Etiology of Glomerella Leaf Spot and Bitter Rot of Apple Caused by Colletotrichum spp. Based on Morphology and Genetic, Molecular, and Pathogenicity Tests. Phytopathology 2006, 96, 982–992. [Google Scholar] [CrossRef]
Sharma, R.; Singh, A. VGG16 Feature Selection Using PCA-Big Bang Big Algorithm. J. Intell. Fuzzy Syst. 2023, 45, 1437–1451. [Google Scholar] [CrossRef]
Yang, Q.; Duan, S.; Wang, L. Efficient Identification of Apple Leaf Diseases in the Wild Using Convolutional Neural Networks. Agronomy 2022, 12, 2784. [Google Scholar] [CrossRef]
Chen, X.; Cai, C.; He, X.; Mei, D. Hybrid Deep Learning Model for Vegetable Price Forecasting Based on Principal Component Analysis and Attention Mechanism. Phys. Scr. 2024, 99, 125017. [Google Scholar] [CrossRef]
Fang, T.; Zhang, J.; Qi, D.; Gao, M. BLSENet: A Novel Lightweight Bilinear Convolutional Neural Network Based on Attention Mechanism and Feature Fusion Strategy for Apple Leaf Disease Classification. J. Food Qual. 2024, 2024, 5561625. [Google Scholar] [CrossRef]
Zhang, C.; Zhang, S.; Yang, J.; Shi, Y.; Chen, J. Apple Leaf Disease Identification Using Genetic Algorithm and Correlation Based Feature Selection Method. Int. J. Agric. Biol. Eng. 2017, 10, 74–83. [Google Scholar] [CrossRef]
Jin, D.; Yin, H.; Gu, Y.H. Shuffle-PG: Lightweight Feature Extraction Model for Retrieving Images of Plant Diseases and Pests with Deep Metric Learning. Alex. Eng. J. 2025, 113, 138–149. [Google Scholar] [CrossRef]
Liu, J.; Wang, X. Plant Diseases and Pests Detection Based on Deep Learning: A Review. Plant Methods 2021, 17, 22. [Google Scholar] [CrossRef]
Singh, S.; Gupta, S.; Tanta, A.; Gupta, R. Extraction of Multiple Diseases in Apple Leaf Using Machine Learning. Int. J. Image Graph. 2021, online. [Google Scholar] [CrossRef]
Zhu, R.; Zou, H.; Li, Z.; Ni, R. Apple-Net: A Model Based on Improved YOLOv5 to Detect the Apple Leaf Diseases. Plants 2023, 12, 169. [Google Scholar] [CrossRef]
Singh, V.; Chug, A.; Singh, A.P. Classification of Beans Leaf Diseases Using Fine Tuned CNN Model. Procedia Comput. Sci. 2023, 218, 348–356. [Google Scholar] [CrossRef]
Jiang, X.; Ding, R.; Hu, H.; Wang, T.; Liu, X.; Shi, J. Image Classification of Leaf Diseases Based on Transfer Learning. In Proceedings of the 7th International Conference on Computer and Communications (ICCC), Chengdu, China, 10–13 December 2021; pp. 821–825. [Google Scholar] [CrossRef]
Jiang, P.; Chen, Y.; Liu, B.; He, D.; Liang, C. Real-Time Detection of Apple Leaf Diseases Using Deep Learning Approach Based on Improved Convolutional Neural Networks. IEEE Access 2019, 7, 59069–59080. [Google Scholar] [CrossRef]
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 6848–6856. [Google Scholar] [CrossRef]
Ding, J.; Zhang, C.; Cheng, X.; Yue, Y.; Fan, G.; Wu, Y.; Zhang, Y. Method for Classifying Apple Leaf Diseases Based on Dual Attention and Multi-Scale Feature Extraction. Agriculture 2023, 13, 940. [Google Scholar] [CrossRef]
Sun, Y.; Ning, L.; Zhao, B.; Yan, J. Tomato Leaf Disease Classification by Combining EfficientNetv2 and a Swin Transformer. Appl. Sci. 2024, 14, 7472. [Google Scholar] [CrossRef]

Figure 1. Overview of the proposed method.

Figure 2. The attention-guided PCA downsampling module performs feature patching, normalization, and attention-weighted eigen decomposition, followed by principal component projection, enabling adaptive downsampling while minimizing information loss.

Figure 3. Data Augmentation Strategy.

Figure 4. (a,b) Train and val loss and accuracy curves.

Figure 5. (a,b) Validation loss and accuracy curves.

Figure 6. Confusion Matrix of the Validation Set.

Figure 7. Multi-class ROC curves and AUC values.

Figure 8. Multi-class Precision-Recall curves and AP values.

Figure 9. RGB channel feature visualization of apple leaf images (with subgraphs (a–d)).

Figure 10. Comparison of Feature Heatmaps between PCA Downsampling and MaxPooling. (a) is the original image, (b) is the MaxPooling module heatmap, (c) is the PCA module heatmap.

Table 1. Dataset preparation used for classification.

Class Label	Disease Type	Number of Samples
0	Alternaria leaf spot	292
1	Brown spot	288
2	Frogeye leaf spot	2227
3	Grey spot	238
4	Health	362
5	Mosaic	260
6	Powdery mildew	829
7	Rust	1928
8	Scab	3787

Table 2. Computational Overhead and Parameter Comparison Between ResNet18_AttentionPCA and Baseline ResNet18 (RTX 4060 Ti, 16GB VRAM).

Model Architecture	FLOPs (G)	Latency (ms)	Memory Usage (GB)
ResNet18 (Baseline)	1.8	1.433	2.348
ResNet18-AttentionPCA	2.9	4.600	2.348

Table 3. Validation accuracy (%) over repeated experiments.

Experiment	ResNet18 (Baseline)	ResNet18-AttentionPCA
1	90.02	93.58
2	90.45	93.72
3	90.11	93.81
4	90.31	93.64
5	90.21	93.67

Table 4. Ablation experiment results.

Model Variant	Accuracy (%)	Improvement (%)
ResNet18 baseline	90.18	-
ResNet18 + PCA downsampling	91.25	1.07
ResNet18 + spatial attention	90.72	0.54
ResNet18-AttentionPCA	93.69	3.51

Table 5. Accuracy comparison with classical models.

Model	Accuracy (%)
INAR-SSD (SSD with inception module and rainbow concatenation) [34]	78.80
ShuffleNet [35]	84.36
ConvNeXt [36]	86.70
ResNet50 [36]	86.38
Swin Transformer [37]	91.06
MGA-YOLO [31]	91.25
EfficientNet-V2 [37]	92.47
ResNet18-AttentionPCA	93.69

Table 6. Average Grad-CAM Activation Values of Apple Leaf Diseases in RGB Channels.

Leaf Disease	Red Channel	Green Channel	Blue Channel
Grey spot	0.249	0.219	0.210
Mosaic	0.203	0.282	0.201
Brown spot	0.256	0.290	0.290

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, K.; Yu, J.; Zhu, F.; Li, Z.; Li, X. Optimization of Deep Learning Model Based on Attention-Guided PCA Dimensionality Reduction. Horticulturae 2025, 11, 1346. https://doi.org/10.3390/horticulturae11111346

AMA Style

Xu K, Yu J, Zhu F, Li Z, Li X. Optimization of Deep Learning Model Based on Attention-Guided PCA Dimensionality Reduction. Horticulturae. 2025; 11(11):1346. https://doi.org/10.3390/horticulturae11111346

Chicago/Turabian Style

Xu, Kangkai, Jinpeng Yu, Fenghua Zhu, Zheng Li, and Xiaowei Li. 2025. "Optimization of Deep Learning Model Based on Attention-Guided PCA Dimensionality Reduction" Horticulturae 11, no. 11: 1346. https://doi.org/10.3390/horticulturae11111346

APA Style

Xu, K., Yu, J., Zhu, F., Li, Z., & Li, X. (2025). Optimization of Deep Learning Model Based on Attention-Guided PCA Dimensionality Reduction. Horticulturae, 11(11), 1346. https://doi.org/10.3390/horticulturae11111346

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimization of Deep Learning Model Based on Attention-Guided PCA Dimensionality Reduction

Abstract

1. Introduction

2. Materials and Methods

2.1. Experimental Data Preparation

2.2. Model Establishment

2.3. Attention-Guided PCA Downsampling Module

Theoretical Analysis and Comparison

2.4. Feature Enhancement Mechanisms

2.4.1. Alpha Dropout for Regularization

2.4.2. Local Feature Refinement with Batch Normalization

2.4.3. Spatial Attention Mechanism

2.4.4. Self-Normalizing Nonlinear Transformation

2.4.5. Residual Connection for Feature Fusion

3. Experimental Preparation Phase

3.1. Dataset Construction and Data Augmentation

Dataset Details

3.2. Training Configurations and Parameter Settings

3.2.1. Model and Optimizer

3.2.2. Training Process Control

4. Experiments and Results

4.1. Data Augmentation

4.2. Training and Testing Experiments

4.3. Cross-Dataset Validation

4.4. Comprehensive Evaluation of Classification Performance

4.5. Statistical Analysis

4.6. Ablation Experiments

4.7. Comparative Experiment and Analysis

5. Analysis and Discussion

5.1. Comparing RGB Features

5.2. Attention Analysis of Disease Regions in Complex Backgrounds

5.3. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI