1. Introduction
Apples are among the most widely cultivated and consumed fruits worldwide. Global production reached 87 million metric tons in 2019 [
1]. Shifts in consumer preferences toward higher quality and variety have driven greater demand for imported fruit, particularly apples. In China, the value of apple imports increased from USD 181.38 million in 1997 to USD 688.5 million in 2022 [
2].
In parallel, problems associated with fungal infection have become more prominent [
3]. During transport, apples are frequently affected by fungal diseases, leading to quality degradation and, in severe cases, spoilage. These outcomes cause substantial economic losses and undermine market reputation, reducing consumers’ willingness to purchase imported apples. Quarantine fungal pathogens are of particular concern because they threaten not only fruit quality but also agricultural biosecurity in importing countries [
4].
Detection of quarantine pathogens in imported apples has primarily relied on traditional biological assays such as polymerase chain reaction (PCR) and DNA sequencing [
5]. These techniques are destructive to samples, labor-intensive, and time-consuming, rendering them unsuitable for rapid inspection at ports of entry. Infections caused by
Cytospora mali,
Monilinia fructigena, and
Botryosphaeria dothidea typically result in fruit decay that begins as surface lesions. As infection advances, lesions darken and expand, and the fruit progressively softens. Because symptom trajectories are highly similar across these pathogens, and because quarantine regulations often restrict available sample sizes, the development of accurate and generalizable identification algorithms is significantly constrained.
Hyperspectral imaging is a technique that integrates spatial and spectral information [
6] and offers clear advantages over conventional RGB imaging. Reflectance imaging spectroscopy was employed, capturing hundreds of narrow bands across the visible and near-infrared, each providing distinct spectral cues. This rich spectral content enables discrimination among samples with very similar phenotypes, particularly when sample availability is limited. In recent years, hyperspectral imaging has been widely adopted in agriculture [
7], food science [
8], and remote sensing [
9]. With continuing advances in sensors and data-processing algorithms, hyperspectral imaging has become a powerful tool, improving the precision and efficiency of crop phenotyping, plant disease detection, and quality assessment of agricultural products.
Recent research efforts have increasingly focused on near-range hyperspectral imaging technology for analyzing crop phenotypic characteristics [
10]. This study developed a peach damage–detection method based on gray-level co-occurrence matrix (GLCM) features. Texture descriptors extracted with GLCM were used to train Random Forest (RF) and Extreme Gradient Boosting (XGBoost) models for prediction. However, the work evaluated only classical machine-learning approaches and did not compare against convolutional neural networks (CNNs) or other deep-learning methods.
With rapid advancements in hyperspectral imaging and artificial intelligence (AI), the fusion of these two technologies provides novel opportunities for plant disease and pest detection. Hyperspectral imaging leverages variations in reflection, absorption, and scattering properties of different materials to capture comprehensive spectral and spatial image information. Meanwhile, CNNs have significantly advanced the field of image recognition. A 2D/3D hybrid CNN incorporating an Feature Pyramid Network (FPN) and hybrid attention was developed for classifying hyperspectral images [
11]. Another study proposed an improved Vision Transformer (ViT) for crop disease recognition that leverages an AFNO-based Transformer to pre-extract global features, markedly improving leaf-disease identification in complex natural environments [
12]. The integration of hyperspectral imaging with deep learning methods allows advanced image-recognition algorithms to fully exploit hyperspectral data, extracting highly valuable features. This combination substantially expands the dimensionality and richness of data available for computer-vision tasks compared to traditional RGB imagery, enhancing classification accuracy and processing efficiency in practical scenarios.
Although hyperspectral imaging has been shown to extract phenotypic information effectively and CNN have demonstrated reliable performance in disease and pest image classification, CNN-based classification of hyperspectral images of apple diseases remains underexplored. To address the slow throughput and heavy labor demands encountered in quarantine inspection under real import and export conditions, an integrated hyperspectral–CNN approach was developed. A field-deployable automated device for hyperspectral detection of apple diseases was designed for use at inspection sites. The system improves detection efficiency and reduces operational costs.
In this study, apple quarantine diseases were examined by capturing hyperspectral images of apples infected by
Cytospora mali,
Monilinia fructigena, and
Botryosphaeria dothidea. The hyperspectral camera captured wavelengths from 400 to 1000 nm, and the structure of the resulting image cube is shown in
Figure 1. Changes in spectral curves were analyzed across early, middle, and late stages of infection. Dimensionality reduction with Competitive Adaptive Reweighted Sampling (CARS) was evaluated and reduced the original 448 bands to 76 informative bands. Hyperspectral imaging was then combined with convolutional neural network–based recognition, and a model named HSC-ResNet was proposed for hyperspectral classification. The architecture was adapted to accommodate the large spectral input and incorporates channel and spatial attention mechanisms to emphasize salient features. With this design, high efficiency and accuracy were achieved for detecting quarantine diseases in apples.
2. Materials and Methods
2.1. Sample Collection and Preparation
‘Gala’ apples (
Malus domestica ‘Gala’) imported from New Zealand were used as experimental material. Prior to sampling, all fruit passed import quarantine to verify freedom from regulated pests and diseases in accordance with the International Standards for Phytosanitary Measures (ISPM) [
13]. Fruit with mechanical injury, visible infection, or other surface defects were excluded to meet experimental quality criteria. The remaining apples were divided into five groups: fruit inoculated with
Botryosphaeria dothidea,
Monilinia fructicola, or
Cytospora mali, and two control groups comprising healthy fruit and mechanically injured fruit.
2.2. Pathogen Identification by PCR Technique
Fungal pathogens were inoculated into apple samples using a mycelium-block method [
14]. Mycelial blocks (5 mm diameter) were excised from the actively growing colony margin on culture plates with a sterile biopsy punch. The inoculation site on each fruit was surface-disinfected with 70% ethanol. A 5 mm core of peel and underlying flesh was then removed with the same punch, and the mycelial block was placed into the cavity with the mycelium oriented inward. Inoculated apples were sealed in plastic containers and incubated at 22 °C under humid conditions. Uninoculated apples handled in parallel served as negative controls.
PCR assays were conducted on infected apples to prevent contamination by unintended fungi [
15]. Pathogen DNA was rapidly extracted directly from infected apple tissues using the alkaline lysis method [
16], infected tissue was placed into a 2-mL centrifuge tube, to which 50 µL of 0.5 mol L
−1 NaOH was added. After thorough shaking and centrifugation, 10 µL of the resulting supernatant was transferred into 90 µL of 100 mmol L
−1 Tris buffer (pH 8.0), providing crude DNA extract suitable for PCR amplification. The extracted DNA could either be used immediately for PCR amplification or stored at −20 °C until use.
Specific primer pairs were utilized to identify
Botryosphaeria dothidea,
Monilinia fructicola, and
Cytospora mali, respectively. Primer sequences and expected PCR product sizes are presented in
Table 1. Each PCR reaction had a total volume of 50 µL, containing 1 × PCR buffer without Mg
2+, 1.5 mmol L
−1 MgCl
2, 0.2 mmol L
−1 of each dNTP, 0.2 µmol L
−1 each of forward and reverse primers, 1.25 U of Taq DNA polymerase, and 2 µL template DNA. The volume was adjusted to 50 µL with ddH
2O. PCR amplification conditions consisted of initial denaturation at 94 °C for 3 min, followed by 30 cycles of denaturation at 94 °C for 30 s, annealing at 55 °C for 30 s, and extension at 72 °C for 90 s. A final extension step was carried out at 72 °C for 10 min before termination.
Following PCR amplification, products were resolved on a 1.5% agarose gel, visualized with a gel imaging system, and their quality assessed, as shown in
Figure 2. Additionally, specific primers targeting the internal transcribed spacer (ITS) regions of fungal DNA were used for PCR amplification. The obtained sequences were compared with available sequences in the NCBI database to confirm the accuracy of the pathogen identification.
2.3. Hyperspectral Imaging System
A hyperspectral camera was mounted directly above a conveyor belt, the camera used was the Specim FX10, designed by SPECIM (Finland). Uniform illumination was provided to maintain consistent lighting across apple surfaces. Shadows and specular reflections that could degrade image quality were thereby minimized. The mechanical structure is shown in
Figure 3. As apples traveled along the belt, the camera recorded spectral data continuously at a high frame rate across 400–1000 nm. Detailed specifications of the hyperspectral camera are listed in
Table 2.
2.4. Image Preprocessing
2.4.1. Average Spectral Extraction
Obtaining stable and representative spectral characteristics is crucial for accurate modeling and disease identification from hyperspectral imagery. However, practical spectral analysis often encounters challenges from background interference, shadows, and non-target regions [
17]. Extracting spectra without addressing these factors reduces robustness and degrades accuracy. All hyperspectral images were first calibrated using black and white references [
18]. Annotation masks were then applied to exclude background, and mean reflectance spectra were computed from apple regions for subsequent analysis. The procedure is illustrated in
Figure 4.
Average spectra were computed as the mean reflectance of apple samples at each wavelength, yielding representative spectral profiles. Variability in individual spectra can arise from sample condition, viewing angle, and illumination. Averaging mitigates this variability, smooths individual differences, and reduces the influence of outliers. The resulting profiles emphasize overall spectral characteristics and support more robust downstream analysis. Features derived from inter-class spectral differences in the averaged profiles help suppress noise and improve the generalization of classification, regression, and predictive models. Representative images of early, middle, and late infection stages for the three pathogens, together with a comparison of their average spectral curves, are presented in
Figure 5.
The hyperspectral characteristics of apples infected with Cytospora mali, Monilinia fructicola, and Botryosphaeria dothidea were examined across early, middle, and late stages of infection. Distinct visual trajectories were observed. Cytospora mali initially presented as localized epidermal necrosis that extended deeper into the fruit over time, culminating in visible mold layers. Monilinia fructicola showed mild symptoms at first but later developed large necrotic regions characterized by tissue drying and darkening. Botryosphaeria dothidea began with localized pigment deposition and progressed to extensive tissue liquefaction and structural collapse.
Spectrally, the three diseases displayed similar overall reflectance patterns. Reflectance in the 400–500 nm (blue) region remained low, consistent with absorption by chlorophyll and carotenoids. Within the visible range (400–700 nm), higher reflectance was observed at early stages, indicating limited pigment degradation. In the near-infrared range (700–1000 nm), reflectance declined as infection advanced, reflecting water loss and structural damage in the tissue.
Stage-specific differences were also apparent: Cytospora mali showed the largest spectral changes between 700–900 nm, Monilinia fructicola displayed maximal variation across 600–900 nm, and Botryosphaeria dothidea exhibited distinct changes primarily at 500–600 nm and 800–900 nm. Because simple metrics could not reliably quantify and distinguish the pathogens’ spectral signatures, the study subsequently employed AI-based feature selection to identify key hyperspectral bands, thereby enabling effective dimensionality reduction.
2.4.2. Dimensionality Reduction
Previous analyses indicated that not every wavelength contributes meaningfully to classification. The high dimensionality of hyperspectral data complicates downstream analysis and modeling. Reducing the number of bands removes redundancy, decreases data volume, and lowers computational demand while retaining the most informative features for classification or regression, thereby improving accuracy and stability.
To address this challenge, Competitive Adaptive Reweighted Sampling (CARS) was applied to select an informative subset of spectral bands and discard noise [
19,
20]. This procedure reduces dimensionality, often maintains or improves model accuracy, and markedly shortens computation time while increasing robustness. CARS is widely used in spectral analysis and chemometrics and is particularly effective for high-dimensional datasets such as hyperspectral imagery.
In competitive screening of high-dimensional features like spectral bands, the process begins by calculating regression coefficients for each feature using a model such as partial least squares (PLS) [
21]. The absolute values of these coefficients are then normalized to obtain feature weights.
Here, denotes the regression coefficient of the -th feature estimated by the PLS model, and denotes the set of regression coefficients for all features obtained from the PLS model. These coefficients quantify the relative influence of each feature on the response variable, and their absolute values are normalized to yield the weights . A larger indicates a greater contribution of that feature to the model.
Subsequently, to shrink the feature set step by step, CARS applies an exponential-decay function in the
-th iteration to determine the retention ratio for that round and, from this, calculates how many features to keep.
Building on this, a competitive mechanism—implemented through ranking or random sampling—retains the top features with the highest weights and removes those with lower weights. After multiple iterations, the subset that delivers the best performance on the validation set is chosen, achieving dimensionality reduction while improving model accuracy.
Applying CARS reduced the number of bands from 448 to 76. The reduction decreases data volume and computational load while preserving informative wavelengths, thereby accelerating training and recognition. Nevertheless, approaches based solely on traditional machine learning show limited performance on hyperspectral datasets [
22]. In subsequent experiments, PLS using the 76-band subset achieved 61.89% accuracy, whereas the best single-band model reached 41.02%, indicating that CARS alone yields limited accuracy.
To enhance classification accuracy, convolutional neural networks for hyperspectral image classification were further investigated using the selected bands.
2.5. Deep Learning Models
Rapid advances in artificial intelligence have produced numerous convolutional neural network architectures [
23], each with unique strengths. In this study, five representative models—ShuffleNet [
24], MobileNet [
25], EfficientNet [
26], ViT [
27], and ResNet [
28]—were trained on the hyperspectral dataset. These architectures differ in structural design, computational efficiency, generalization capability, and scalability. ShuffleNet and MobileNet support lightweight deployment on mobile devices. EfficientNet improves performance through compound scaling. ViT introduces Transformer-based vision modeling beyond purely convolutional designs. ResNet remains a foundational deep CNN architecture. Because these models are widely used and each offers distinct strengths, they were selected for comparison on the apple-disease hyperspectral dataset.
We compared recognition accuracy and parameter counts across the networks. ResNet achieved the highest accuracy on hyperspectral images at 87.78%. ShuffleNet used the fewest parameters at 5.42 million. Detailed results are shown in
Table 3. However, despite their effectiveness in conventional image tasks, these standard models face challenges in handling the high-dimensional spectral data typical of hyperspectral imagery.
Among the compared architectures, ResNet’s residual topology—identity shortcuts with layer-wise feature reuse—improves optimization stability and gradient flow, alleviates degradation, and enables deeper, more robust representations on high-dimensional, redundant, and noisy hyperspectral data. Its bottleneck units employ 1 × 1 convolutions for cross-channel linear combination and 3 × 3 convolutions to model local spatial–spectral couplings, thereby enhancing channel interaction and effective receptive field while controlling parameters and computation. By contrast, the depthwise separable convolutions and channel grouping in ShuffleNet and MobileNet substantially weaken cross-channel modeling, making subtle inter-band differences harder to capture; ViT is more sensitive to data scale and regularization and is not advantageous at the present dataset size; EfficientNet benefits from compound scaling, yet its MBConv blocks remain dominated by per-channel operations with limited cross-channel coupling. These architectural distinctions are consistent with the empirical results: on this hyperspectral dataset, ResNet achieved the highest accuracy (87.78%), and its residual, modular design readily accommodates hyperspectral-specific operators such as spectral attention.
Given ResNet’s superior performance on the apple disease dataset, it was adopted as the baseline for further optimization, yielding HSC-ResNet, a variant specifically adapted for hyperspectral image classification.
2.5.1. HSC-ResNet Model
HSC-ResNet was designed specifically for high-throughput hyperspectral image classification. CNNs are optimized for RGB images with only three spectral channels, whereas the images in this study contain 448 bands. Such high-dimensional input can overwhelm standard architectures, so ResNet50 was restructured to handle hyperspectral data more effectively.
The revised network retains the residual learning strategy of ResNet50. This design helps mitigate gradient vanishing and performance degradation in deep architectures. The input channel dimension is expanded from 3 to 76. These 76 channels are the informative bands selected from the original 448 by CARS. The wider input accelerates both training and inference. Channel and spatial attention are incorporated through the Convolutional Block Attention Module (CBAM), enabling the network to emphasize the most informative spectral channels and spatial regions.
The input tensor, shaped (batch, 76, 224, 224), first passes through a convolution–batch normalization–ReLU sequence that extracts low-level spatial–spectral features and reduces the spatial resolution to 112 × 112. Max pooling further reduces the feature map to 56 × 56, lowering computational cost and mitigating overfitting. Feature abstraction proceeds through four residual stages, producing a tensor of size (batch, 2048, 7, 7). After each residual block, the CBAM module reinforces the channels and spatial areas most relevant for classification. Adaptive average pooling then condenses each feature map to a single value, generating a (batch, 2048, 1, 1) tensor. Finally, a fully connected layer maps these high-level descriptors to the target classes, completing the classification. The detailed model architecture is shown in
Figure 6.
In total, the architecture includes fifty convolutional layers, excluding pooling operations and the final dense layer. With its adapted channel configuration, integrated attention mechanisms, and preserved residual connections, HSC-ResNet is well suited to the high dimensionality of hyperspectral imagery while remaining computationally efficient.
2.5.2. CBAM Attention Mechanism
Hyperspectral images are inherently high dimensional. Representing many wavelengths requires a large number of channels. This high channel count makes it difficult for a network to capture target features effectively. To mitigate this limitation, an attention module was adopted as a key optimization. The Convolutional Block Attention Module (CBAM), proposed in 2018, integrates channel and spatial attention [
29]. Because hyperspectral data contain far more spectral bands than standard RGB images, identifying informative features is more challenging. In CBAM, the channel-attention branch assigns greater weight to the most discriminative spectral channels while suppressing redundant ones, and the spatial-attention branch emphasizes the most relevant regions within each feature map. Together, these mechanisms guide the network toward the spectral channels and spatial locations most critical for classification, thereby enhancing discriminative capacity.
The channel attention branch estimates the importance of each channel in the input feature map and assigns a learned weight. It applies both global max pooling and global average pooling, followed by a shared multi-layer perceptron (MLP) to capture channel importance.
The spatial attention branch then evaluates the significance of each spatial position in the feature map, building on the output of channel attention. It performs average pooling and max pooling along the channel dimension to produce two spatial maps, concatenates them, and processes the result with a 7 × 7 convolution to integrate the information. A Sigmoid activation generates the final spatial attention map.
2.6. Experimental Setup
2.6.1. Hardware and Software Setup
To ensure a fair comparison, all experiments were run under identical hardware and software conditions. All models were trained in PyTorch 2.3.0. The complete hardware and software specifications are provided in
Table 4.
To accelerate training, all hyperspectral images underwent uniform black and white calibration, followed by data augmentation. A total of 1628 images were used, randomly divided into training, validation, and test sets in a 6:2:2 ratio. The training set was used for parameter learning, the validation set for monitoring generalization and convergence during training, and the test set—kept untouched until training was complete—served for the final evaluation on unseen samples.
2.6.2. Training Configuration
Tests indicated that the following hyperparameters yielded stable and reliable performance. The network was trained for 100 epochs, balancing accuracy with computational efficiency. After each epoch, accuracy and loss were measured on the validation set to monitor convergence. The batch size was set to 32 to maximize GPU utilization. A learning rate scheduler reduced the rate by 0.8 every 10,000 steps, and twelve worker threads were used to preprocess and load data in parallel. Training began with an initial learning rate of 1 × 10
−4 and employed the AdamW optimizer with cosine annealing for automatic adjustment [
30]. Detailed hyperparameters settings are listed in
Table 5.
2.6.3. Evaluation Metrics
Model performance on the hyperspectral classification task was evaluated using a confusion matrix [
31] and overall accuracy. In the confusion matrix, rows denote the true classes and columns denote the predicted classes. Correct predictions appear on the main diagonal, whereas off-diagonal entries indicate misclassifications and provide a class-wise view of separability. Overall accuracy, computed from the confusion matrix, represents the proportion of correct predictions across all classes and offers a concise summary of model effectiveness.
Table 6 presents the standard form of a confusion matrix for binary classification. Rows give the true class of each sample—the top row for true positives, the bottom row for true negatives—whereas columns list the model’s predictions, with the left column indicating a positive prediction and the right column a negative one. Within this layout, TP (true positive) counts samples that are actually positive and predicted as positive; FN (false negative) counts those that are positive but predicted as negative; FP (false positive) counts samples that are negative yet predicted as positive; and TN (true negative) counts those that are negative and predicted as negative. Together, these four values provide the basis for calculating overall accuracy [
32].
Accuracy reflects the overall correctness of a model’s predictions—the proportion of all samples it classifies correctly.
3. Results
3.1. Training Analysis
The HSC-ResNet model was trained and evaluated multiple times to verify its effectiveness in classifying hyperspectral images of apple diseases. Across these runs, it achieved an accuracy of 95.51%, demonstrating strong practical potential. The training convergence curves are shown in
Figure 7. On the five-class task, the network exhibited robust feature extraction and discrimination. Among the evaluated models, HSC-ResNet attained the highest accuracy, suggesting that its architecture and integrated attention mechanisms enable more comprehensive use of hyperspectral information. Compared with related studies, a fruit recognition system based on a 3D CNN reported an accuracy of 90.47% [
33], and a CNN trained on principal component images achieved 93.33% [
34]. In comparison, the optimized HSC-ResNet reached 95.51% accuracy on the present dataset, indicating stronger practical utility.
The hyperspectral dataset used in this study contains 1628 apple images divided into five categories: Cytospora mali, Monilinia fructicola, Botryosphaeria dothidea, healthy fruit, and mechanically injured fruit. These classes reflect real inspection scenarios. Cytospora mali, Monilinia fructicola, and Botryosphaeria dothidea are common quarantine concerns in imported apples. They spread readily and can cause substantial economic loss. Healthy samples give the model a clear reference for non-diseased tissue. Mechanically injured samples are harder to distinguish because their spectra can partly resemble disease signals.
Classification performance across these five categories therefore offers a direct indication of the model’s suitability for practical industrial applications. To examine the network’s focus during inference, we visualized HSC-ResNet’s attention maps [
35]. The heatmaps reveal that the model concentrates on regions surrounding lesions, an observation that supports further interpretability and refinement. Examples of attention heatmaps are shown in
Figure 8. This visual evidence also highlights the benefit of the CBAM module: by combining channel and spatial attention, it directs the network to emphasize disease regions while suppressing background noise.
Visualizing the confusion matrix highlights where the classifier still encounters difficulties. Most disease categories are recognized with high precision, with
Monilinia fructicola and
Botryosphaeria dothidea showing particularly strong performance. However, accuracy drops for the “injury” class, where mechanically damaged fruit is sometimes misclassified as healthy. This overlap likely results from the spectral similarity between damaged and intact tissue, and shadows or background reflections can further obscure the distinction. Detailed confusion matrices for each class are shown in
Figure 9.
Comparing the predicted labels with ground truth shows that most misclassifications occur in the
Cytospora mali and injury classes. A closer look reveals two distinct patterns. First, several
Cytospora mali samples are labelled as
Botryosphaeria dothidea because lesions of the former share visual traits with the latter at certain developmental stages, misleading the network. Early
Cytospora mali infections are also problematic: the lesion covers only a small fraction of the fruit surface, so the model often mistakes the sample for healthy fruit. Second, some injury samples are classified as healthy because the physical bruises are minor, making their spectral signatures resemble intact tissue and thus harder to detect. A comparison between misclassified and correctly classified samples is shown in
Figure 10.
3.2. Ablation Study
To quantify the contribution of each design choice, ablation experiments were conducted on three factors: expansion of input channels, insertion of the CBAM attention module, and optimization of key training hyperparameters. Summary statistics are provided in
Table 7.
The baseline, Model A, is a standard ResNet50 that accepts only three input channels. Hyperspectral bands are compressed into these three channels prior to training, resulting in an accuracy of 87.78%. While sufficient for tasks that do not require high precision, this approach discards most spectral information and fails to leverage the full potential of hyperspectral data.
In Model B, the input layer is expanded from 3 to 76 channels, enabling the network to utilize richer spectral detail. Accuracy increases to 92.13%, a gain of 4.35% points over the baseline. However, this wider input prevents the direct use of pre-trained ImageNet weights, requiring training from scratch and making convergence more challenging.
Model C builds on Model B by incorporating a CBAM module after each residual block. Channel attention is especially beneficial with seventy-six spectral bands, as it directs the network toward the most informative wavelengths, while spatial attention highlights relevant regions within each feature map. This combination boosts accuracy to 95.11%, an additional 2.98% improvement, providing strong evidence of CBAM’s value.
Model D (HSC-ResNet) retained the expanded input and CBAM while refining the training strategy. Use of the AdamW optimizer with a cosine-annealing schedule reduced overfitting and stabilized gradients, producing the smoothest convergence among all configurations. Final accuracy reached 95.51%. The training convergence curves of these models are compared in
Figure 11.
4. Discussion
This study began by extracting average spectral curves from hyperspectral images to examine spectral changes at early, middle, and late infection stages for each apple disease category. Specific hyperspectral wavelengths correspond to distinct biological traits—for instance, bands near 970 nm are closely linked to water content [
36], while the 500–700 nm range reflects pigment variations associated with disease symptoms. On this basis, temporal evolution of spectral signatures was tracked throughout infection. Subtle inter-disease differences, however, were not readily distinguished by visual inspection of averaged spectra alone, which motivated the use of machine-learning-based spectral feature selection followed by deep-learning-based classification.
For spectral data processing, the CARS algorithm was employed to reduce dimensionality and redundancy, yielding 76 informative bands from the original 448. Although hyperspectral imagery is information rich, not every wavelength contributes equally to classification. Bands with minimal interclass variance can slow training, hinder convergence, and increase inference time. A natural question is whether one or a few highly informative wavelengths would suffice. In experiments conducted here, methods using only a small number of bands typically achieved accuracies below 60%. By contrast, the 76-band subset selected by CARS markedly outperformed single- or few-band inputs, underscoring the value of optimal band selection. In practice, identifying such a subset can enable more compact sensors, reduce acquisition requirements, and lower transmission and computation costs, thereby improving speed and operational efficiency.
Exclusive reliance on spectral feature selection has limitations. By reducing each hyperspectral image to a single spectral curve (wavelength on the x-axis, reflectance on the y-axis), CARS discards spatial context, which constitutes a substantial loss for hyperspectral analysis. To exploit both spectral and spatial cues, convolutional neural network (CNN) architectures capable of jointly modeling multiband spectral detail and spatial structure were therefore adopted.
Multiple established deep learning architectures were evaluated, including ShuffleNet, MobileNet, EfficientNet, ViT, and ResNet. ResNet produced the strongest performance for apple disease classification and was adopted as the baseline. The architecture was then extended to form HSC-ResNet, with the input expanded from three RGB channels to 76 hyperspectral bands to make fuller use of spectral and spatial information. Incorporation of the CBAM module further improved performance by directing channel attention to informative wavelengths and spatial attention to lesion-relevant regions. Hyperparameters were optimized to ensure stable convergence with high-dimensional input. The resulting HSC-ResNet achieved 95.51% accuracy on a challenging five-class dataset (Cytospora mali, Monilinia fructicola, Botryosphaeria dothidea, healthy samples, and mechanically injured fruit), meeting practical detection requirements.
Despite these promising outcomes, limitations remain. The dataset comprised only 1628 images, which is small by deep-learning standards. In addition, expansion of the input channels precluded the use of RGB-pretrained weights and made convergence more difficult. Together, these factors elevate the risk of overfitting and indicate a need for stronger regularization and more effective optimization strategies. Only a single cultivar was included: Gala apples imported from New Zealand. To better reflect the complexity of real import–export inspections, future studies should expand to more diverse and larger datasets.
Furthermore, the hyperspectral camera captured continuous spectra at 448 wavelengths spanning 400 to 1000 nm. During deep learning, only 76 bands were effectively used. Future work could optimize acquisition hardware to capture just these selected wavelengths at the source, greatly increasing imaging speed and detection efficiency.
5. Conclusions
This study addressed the urgent need for early detection and accurate identification of key quarantine apple diseases—Cytospora mali, Monilinia fructicola, and Botryosphaeria dothidea—that pose serious challenges in the apple import–export trade. By integrating hyperspectral imaging, which captures the optical reflection properties of both apple peel and sub-surface tissues, with advanced deep learning techniques, we enabled efficient feature extraction and accurate disease classification. This approach overcomes key limitations of traditional manual inspection and chemical testing, such as subjectivity, slow turnaround, and sample destruction.
For band selection and optimization, a comprehensive framework integrating spectral preprocessing, optimal wavelength selection, and neural network optimization was established to achieve robust plant-disease identification. Initial analyses of inter-disease spectral differences highlighted the need to balance accuracy with computational efficiency. Although hyperspectral imaging offers hundreds of informative bands, the resulting dimensionality imposes substantial computational cost. Comparative experiments showed that selecting 76 informative wavelengths from the original 448 reduced dimensionality, computational load, and training time without compromising accuracy.
For the design of the hyperspectral image recognition model, several established deep learning architectures were first evaluated. An optimized variant tailored for hyperspectral classification, HSC-ResNet, was then developed. The input channel dimensionality was expanded to accommodate hyperspectral data, the CBAM attention mechanism was integrated to emphasize informative wavelengths and lesion regions, and key hyperparameters were tuned to ensure stable convergence. On a five-class apple disease dataset, HSC-ResNet achieved 95.51% accuracy, satisfying the precision and reliability requirements of quarantine inspection.
In summary, the combination of hyperspectral imaging and deep learning presented here provides a practical pathway for rapid, non-destructive, and highly accurate identification of major quarantine apple diseases. The methodology has strong potential to improve inspection efficiency, reduce economic losses, protect local agricultural ecosystems, and help ensure the quality and marketability of imported apples.
Author Contributions
Conceptualization, H.Z., N.Y. and X.Q.; methodology, H.Z., N.Y. and X.Q.; software, N.Y. and P.W.; validation, H.Z., N.Y. and X.Q.; formal analysis, H.Z. and N.Y.; investigation, N.Y. and X.Q.; resources, X.Q., L.Y. and B.J.; data curation, N.Y., P.W. and B.J.; writing—original draft preparation, H.Z., N.Y. and P.W.; writing—review and editing, H.Z., H.X. and X.Q.; visualization, H.Z. and N.Y.; supervision, H.Z., J.G. and X.Q.; project administration, H.Z., L.Y. and X.Q.; funding acquisition, J.G., H.X. and L.Y. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the National Key Research and Development Program of China (2021YFD1400100 & 2021YFD1400102) and the Agricultural Science and Technology Innovation Program (CAAS-ZDRW202505).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The raw/processed data required to reproduce the above findings cannot be shared at this time, as the data also form part of an ongoing study.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Liang, X.F.; Zhang, R.; Gleason, M.L.; Sun, G.Y. Sustainable Apple Disease Management in China: Challenges and Future Directions for a Transforming Industry. Plant Dis. 2022, 106, 786–799. [Google Scholar] [CrossRef]
- Ke, B.; Zhang, Q.Q. Assessing the Efficiency and Potential of China′s Apple Exports. Complexity 2024, 2024, 7816792. [Google Scholar] [CrossRef]
- Duan, J.Y.; Ma, H.T.; Wang, W.X.; Li, Y.L.; Shi, X.Y.; Chen, X.W.; Kageyama, K.; Yan, Y.P.; Li, M.Z. A rapid quarantine approach for the pathogenic and invasive Phytophthora species associated with imported fruits in China. Pest Manag. Sci. 2024, 80, 6130–6141. [Google Scholar] [CrossRef]
- Rodoni, B. The role of plant biosecurity in preventing and controlling emerging plant virus disease epidemics. Virus Res. 2009, 141, 150–157. [Google Scholar] [CrossRef]
- Wang, J.R.; Guo, L.Y.; Xiao, C.L.; Zhu, X.Q. Detection and Identification of Six Monilinia spp. Causing Brown Rot Using TaqMan Real-Time PCR from Pure Cultures and Infected Apple Fruit. Plant Dis. 2018, 102, 1527–1533. [Google Scholar] [CrossRef]
- Diao, Z.H.; Guo, P.L.; Zhang, B.H.; Yan, J.N.; He, Z.D.; Zhao, S.N.; Zhao, C.J.; Zhang, J.C. Spatial-spectral attention-enhanced Res-3D-OctConv for corn and weed identification utilizing hyperspectral imaging and deep learning. Comput. Electron. Agric. 2023, 212, 108092. [Google Scholar] [CrossRef]
- Wang, C.Y.; Liu, B.H.; Liu, L.P.; Zhu, Y.J.; Hou, J.L.; Liu, P.; Li, X. A review of deep learning used in the hyperspectral image analysis for agriculture. Artif. Intell. Rev. 2021, 54, 5205–5253. [Google Scholar] [CrossRef]
- Kang, Z.L.; Zhao, Y.C.; Chen, L.; Guo, Y.J.; Mu, Q.S.; Wang, S.Y. Advances in Machine Learning and Hyperspectral Imaging in the Food Supply Chain. Food Eng. Rev. 2022, 14, 596–616. [Google Scholar] [CrossRef] [PubMed]
- Skoneczny, H.; Kubiak, K.; Spiralski, M.; Kotlarz, J. Fire Blight Disease Detection for Apple Trees: Hyperspectral Analysis of Healthy, Infected and Dry Leaves. Remote Sens. 2020, 12, 2101. [Google Scholar] [CrossRef]
- Li, B.; Zou, J.P.; Yin, H.; Liu, Y.D.; Zhang, F.; Ou-yang, A. Detection of the storage time of light bruises in yellow peaches based on spectrum and texture features of hyperspectral image. J. Chemom. 2023, 37, e3516. [Google Scholar] [CrossRef]
- Jia, Y.; Shi, Y.; Luo, J.; Sun, H. Y-Net: Identification of Typical Diseases of Corn Leaves Using a 3D-2D Hybrid CNN Model Combined with a Hyperspectral Image Band Selection Module. Sensors 2023, 23, 1494. [Google Scholar] [CrossRef] [PubMed]
- Hu, B.; Jiang, W.Q.; Zeng, J.; Cheng, C.; He, L.C. FOTCA: Hybrid transformer-CNN architecture using AFNO for accurate plant leaf disease image recognition. Front. Plant Sci. 2023, 14, 1231903. [Google Scholar] [CrossRef] [PubMed]
- Eschen, R.; Rigaux, L.; Sukovata, L.; Vettraino, A.M.; Marzano, M.; Grégoire, J.C. Phytosanitary inspection of woody plants for planting at European Union entry points: A practical enquiry. Biol. Invasions 2015, 17, 2403–2413. [Google Scholar] [CrossRef]
- Wu, J.J.; Li, Z.P.; Zhang, Y.; Liu, Z.Y. Occurrence of Stem Rot on Acacia mangiu Caused by Amauroderma subresinosum. Plant Dis. 2024, 108, 1401. [Google Scholar] [CrossRef]
- Sasaki, S.; Yamagishi, N.; Yoshikawa, N. Efficient virus-induced gene silencing in apple, pear and Japanese pear using Apple latent spherical viru vectors. Plant Methods 2011, 7, 15. [Google Scholar] [CrossRef] [PubMed]
- Pagliano, G.; Galletti, P.; Samorì, C.; Zaghini, A.; Torri, C. Recovery of Polyhydroxyalkanoates From Single and Mixed Microbial Cultures: A Review. Front. Bioeng. Biotechnol. 2021, 9, 624021. [Google Scholar] [CrossRef]
- Chen, Y.; Wang, X.; Zhang, X.; Xu, X.; Huang, X.; Wang, D.; Amin, A. Spectral-based estimation of chlorophyll content and determination of background interference mechanisms in low-coverage rice. Comput. Electron. Agric. 2024, 226, 109442. [Google Scholar] [CrossRef]
- Li, Q.; García-Martín, J.F.; Ding, F.; Tu, K.; Lan, W.; Tang, C.; Liu, X.; Pan, L. Quantitative Prediction and Kinetic Modelling for the Thermal Inactivation of Brochothrix thermosphacta in Beef Using Hyperspectral Imaging. Foods 2025, 14, 2778. [Google Scholar] [CrossRef]
- Xing, Z.; Du, C.W.; Shen, Y.Z.; Ma, F.; Zhou, J.M. A method combining FTIR-ATR and Raman spectroscopy to determine soil organic matter: Improvement of prediction accuracy using competitive adaptive reweighted sampling (CARS). Comput. Electron. Agric. 2021, 191, 106549. [Google Scholar] [CrossRef]
- Liu, Q.; Zhou, J.; Wu, Z.; Ma, D.; Ma, Y.; Fan, S.; Yan, L. Classification Prediction of Jujube Variety Based on Hyperspectral Imaging: A Comparative Study of Intelligent Optimization Algorithms. Foods 2025, 14, 2527. [Google Scholar] [CrossRef]
- Hansen, P.M.; Schjoerring, J.K. Reflectance measurement of canopy biomass and nitrogen status in wheat crops using normalized difference vegetation indices and partial least squares regression. Remote Sens. Environ. 2003, 86, 542–553. [Google Scholar] [CrossRef]
- He, H.; Li, Z.; Qin, Q.; Yu, Y.; Guo, Y.; Cai, S.; Li, Z. Spectroscopic and Imaging Technologies Combined with Machine Learning for Intelligent Perception of Pesticide Residues in Fruits and Vegetables. Foods 2025, 14, 2679. [Google Scholar] [CrossRef] [PubMed]
- Lun, Z.; Wu, X.; Dong, J.; Wu, B. Deep Learning-Enhanced Spectroscopic Technologies for Food Quality Assessment: Convergence and Emerging Frontiers. Foods 2025, 14, 2350. [Google Scholar] [CrossRef] [PubMed]
- Yang, R.; Lu, X.Y.; Huang, J.; Zhou, J.; Jiao, J.; Liu, Y.F.; Liu, F.; Su, B.F.; Gu, P.W. A Multi-Source Data Fusion Decision-Making Method for Disease and Pest Detection of Grape Foliage Based on ShuffleNet V2. Remote Sens. 2021, 13, 5102. [Google Scholar] [CrossRef]
- Li, X.; Ye, H.; Qiu, S. Cloud Contaminated Multispectral Remote Sensing Image Enhancement Algorithm Based on MobileNet. Remote Sens. 2022, 14, 4815. [Google Scholar] [CrossRef]
- Lu, Z.; Huang, S.; Zhang, X.; Shi, Y.; Yang, W.; Zhu, L.; Huang, C. Intelligent identification on cotton verticillium wilt based on spectral and image feature fusion. Plant Methods 2023, 19, 75. [Google Scholar] [CrossRef]
- Yan, H.P.; Zhang, E.R.; Wang, J.; Leng, C.C.; Basu, A.; Peng, J.Y. Hybrid Conv-ViT Network for Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Paradise, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Woo, S.H.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Zhou, P.; Xie, X.Y.; Lin, Z.C.; Yan, S.C. Towards Understanding Convergence and Generalization of AdamW. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 6486–6493. [Google Scholar] [CrossRef]
- Tharwat, A. Classification assessment methods. Appl. Comput. Inform. 2021, 17, 168–192. [Google Scholar] [CrossRef]
- Javanmardi, S.; Ashtiani, S.H.M. AI-driven deep learning framework for shelf life prediction of edible mushrooms. Postharvest Biol. Technol. 2025, 222, 113396. [Google Scholar] [CrossRef]
- Pourdarbani, R.; Sabzi, S.; Dehghankar, M.; Rohban, M.H.; Arribas, J.I. Examination of Lemon Bruising Using Different CNN-Based Classifiers and Local Spectral-Spatial Hyperspectral Imaging. Algorithms 2023, 16, 113. [Google Scholar] [CrossRef]
- Wang, B.; Lili, L.I. Detection of defects of cerasus humilis fruits based on hyperspectral imaging and convolutional neural networks. Inmateh Agric. Eng. 2023, 71, 103–114. [Google Scholar] [CrossRef]
- Li, F.; Zhou, H.; Wang, Z.; Wu, X. ADDCNN: An Attention-Based Deep Dilated Convolutional Neural Network for Seismic Facies Analysis with Interpretable SpatialSpectral Maps. IEEE Trans. Geosci. Remote Sens. 2021, 59, 1733–1744. [Google Scholar] [CrossRef]
- Zununjan, Z.; Turghan, M.A.; Sattar, M.; Kasim, N.; Emin, B.; Abliz, A. Combining the fractional order derivative and machine learning for leaf water content estimation of spring wheat using hyper-spectral indices. Plant Methods 2024, 20, 97. [Google Scholar] [CrossRef] [PubMed]
Figure 1.
Stereo view of hyperspectral data. The image’s lines, samples, and bands form a cube-like three-dimensional structure.
Figure 1.
Stereo view of hyperspectral data. The image’s lines, samples, and bands form a cube-like three-dimensional structure.
Figure 2.
Representative images of inoculated apple samples, along with PCR products visualized using a gel imaging system, are presented.
Figure 2.
Representative images of inoculated apple samples, along with PCR products visualized using a gel imaging system, are presented.
Figure 3.
Mechanical structure.
Figure 3.
Mechanical structure.
Figure 4.
Process of Extracting Average Spectrum from Apples.
Figure 4.
Process of Extracting Average Spectrum from Apples.
Figure 5.
Example images of apples at early, middle, and late infection stages, together with the corresponding mean spectral curves derived from spectra extracted across all samples.
Figure 5.
Example images of apples at early, middle, and late infection stages, together with the corresponding mean spectral curves derived from spectra extracted across all samples.
Figure 6.
HSC-Resnet Model Architecture Diagram.
Figure 6.
HSC-Resnet Model Architecture Diagram.
Figure 7.
Training convergence curves of the HSC-ResNet model.
Figure 7.
Training convergence curves of the HSC-ResNet model.
Figure 8.
Attention Heatmap Comparison.
Figure 8.
Attention Heatmap Comparison.
Figure 9.
Confusion Matrix. In the figure, the class labels are C.M for Cytospora mali, M.F for Monilinia fructicola, and B.D for Botryosphaeria dothidea.
Figure 9.
Confusion Matrix. In the figure, the class labels are C.M for Cytospora mali, M.F for Monilinia fructicola, and B.D for Botryosphaeria dothidea.
Figure 10.
Comparison of misclassified and correctly classified samples.
Figure 10.
Comparison of misclassified and correctly classified samples.
Figure 11.
Convergence curve comparison of the ablation study.
Figure 11.
Convergence curve comparison of the ablation study.
Table 1.
Primer sequence list.
Table 1.
Primer sequence list.
Name | Primer Sequence | Size of Product (bp) |
---|
Botryosphaeria dothidea | ITS1:5′-TCCGTAGGTGAACCTGCGG-3′ ITS4:5′-TCCTCCGCTTATTGATATGC-3′ | 580 |
Monilinia fructicola | ITS1:5′-TATGCTCGCCAGAGGATAATT-3′ ITS4:5′-TGGGTTTTGGCAGAAGCACACT-3′ | 356 |
Cytospora mali | ITS1:5′-TCCGTAGGTGAACCTGCG-3′ ITS4:5′-TCCTCCGCTTATTGATAT-3′ | 560 |
Table 2.
Hyperspectral Camera Specifications.
Table 2.
Hyperspectral Camera Specifications.
Main Technical Specifications | Parameters |
---|
Spectral Range | 400–1000 nm |
Spectral Resolution per Pixel | 2.7 nm |
Number of Spectral Bands | 448 |
Spectral Resolution | 5.5 nm |
Number of Spatial Pixels | 1024 px |
Camera Frame Rate | 320 FPS |
Field of View | 38° |
Lens F-number | F/1.7 |
Data Interface | GigE Vision, CameraLink- |
Connector | Industrial Ethernet OR CameraLink 26-pin, 0.5 MDR |
Dimensions (L × W × H) | 150 × 71 × 85 mm |
Table 3.
Comparison of Convolutional Neural Network Models on the Hyperspectral Dataset.
Table 3.
Comparison of Convolutional Neural Network Models on the Hyperspectral Dataset.
Model | Accuracy | Parameters |
---|
ShuffleNet | 83.31% | 5.42 M |
MobileNet | 87.23% | 16.20 M |
EfficientNet | 86.72% | 78.00 M |
ViT | 81.35% | 210.00 M |
ResNet | 87.78% | 97.26 M |
Table 4.
Hardware and Software Configuration Information.
Table 4.
Hardware and Software Configuration Information.
Item | Parameters | Version |
---|
Hardware | CPU | Intel(R) Xeon(R) Gold 5317 CPU @ 3.00 GHz |
GPU | NVIDIA RTX 6000 Ada Generation 48 GB |
RAM | 256 GB |
Software | Python | 3.12 |
PyTorch | 2.3.0 |
CUDA | 12.1 |
CUDNN | 8.9.1 |
Table 5.
Model Hyperparameters.
Table 5.
Model Hyperparameters.
Paramete | Value |
---|
Number of Workers | 12 |
Batch Size | 32 |
Epoch | 100 |
Learning Rate | 1 × 10−4 |
Optimizer | AdamW |
LR Scheduler | CosineAnnealing |
Table 6.
Confusion Matrix Method.
Table 6.
Confusion Matrix Method.
Actual Class/Predicted Class | Predicted Positive | Predicted Negative |
---|
Actual Positive | TP | FN |
Actual Negative | FP | TN |
Table 7.
Ablation Study Table.
Table 7.
Ablation Study Table.
Model | Baseline | channels | CBAM | Optimization | Accuracy |
---|
A | ✔ | | | | 87.78% |
B | ✔ | ✔ | | | 92.13% |
C | ✔ | ✔ | ✔ | | 95.11% |
D | ✔ | ✔ | ✔ | ✔ | 95.51% |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).