Next Article in Journal
Environmental Enrichment Attenuates Acute Noise-Induced Bursal Injury in Broilers via Suppressing NF-κB and Mitochondrial Apoptotic Pathways
Previous Article in Journal
Siloxane and Nano-SiO2 Dual-Modified Bio-Polymer Coatings Based on Recyclable Spent Mushroom Substrate: Excellent Performance, Controlled-Release Mechanism, and Effect on Plant Growth
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Detection of External Defects in Seed Potatoes Using Spectral–Spatial Fusion of Hyperspectral Images and Deep Learning

1
College of Mechanical and Electrical Engineering, Inner Mongolia Agricultural University, Hohhot 010018, China
2
Inner Mongolia Autonomous Region Engineering Research Center for Intelligent Equipment in Forage and Feed Production, Hohhot 010018, China
*
Author to whom correspondence should be addressed.
Agriculture 2026, 16(1), 77; https://doi.org/10.3390/agriculture16010077 (registering DOI)
Submission received: 6 November 2025 / Revised: 22 December 2025 / Accepted: 27 December 2025 / Published: 29 December 2025
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

Abstract

To improve the accuracy of detecting external defects in seed potatoes and address the reliance of current hyperspectral imaging methods on single-dimensional data, this study proposes a multi-dimensional spectral–spatial information fusion approach via concatenation based on a one-dimensional convolutional neural network (1DCNN) within the framework of deep learning. Hyperspectral three-dimensional data were acquired for normal seed potatoes and for samples presenting six types of external defects—decay, mechanical damage, wormhole, common scab, black scurf, and frostbite—across a wavelength range of 935–1721 nm. From the hyperspectral images, one-dimensional spectral data and two-dimensional spatial data were extracted. The one-dimensional spectral data were preprocessed using six methods: Savitzky–Golay smoothing (SG), standard normal variate (SNV), multiplicative scatter correction (MSC), first derivative (FD), second derivative (SD), and orthogonal signal correction (OSC). Feature wavelengths were subsequently selected through the successive projections algorithm (SPA) and competitive adaptive reweighted sampling (CARS), serving as inputs for traditional machine learning models. Two-dimensional spatial data were first subjected to dimensionality reduction via principal component analysis (PCA). Texture features were then extracted from each principal component using the gray-level co-occurrence matrix (GLCM). Following normalization, all spatial texture data were fused with the preprocessed spectral data to form the inputs for the deep learning models Basic1DCNN and Stacked1DCNN. The results demonstrate that the fusion data with the Stacked1DCNN model yielded the best performance in identifying normal seed potatoes and six types of external defects. The overall accuracy, precision, recall, F1 score, and mean average precision reached 98.77%, 98.77%, 98.93%, 98.73%, and 99.66%, respectively, outperforming traditional machine learning approaches. Compared with the Stacked1DCNN model trained using spectral data alone, these metrics improved by 2.81%, 2.78%, 3.20%, 3.01%, and 1.11%. This study offers theoretical and technical insights into the development of automated sorting and non-destructive detection systems for seed potatoes.

1. Introduction

As the fourth largest food crop in China, potato serves both as a staple and a vegetable, and is valued for its remarkable adaptability and significant economic potential [1]. High-quality seed potatoes can substantially enhance both the yield and quality of potatoes, boost market competitiveness, and deliver greater economic benefits to growers [2]. Seed potatoes are vulnerable to pathogen infections and environmental stresses throughout cultivation, harvesting, transportation and storage, often resulting in various external defects that severely compromise planting efficiency and quality standards. Prompt identification and removal of defective seed potatoes are essential for preserving planting efficiency. In practice, defect screening largely depends on manual visual inspection, which is labor-intensive, subjective, and inefficient [3]. Consequently, the development of automated, non-destructive detection technologies is vital for improving the efficiency of external defect detection in seed potatoes.
In potato production, high-quality ware potatoes are usually used as retained seed stock [4]. As such, methodologies for inspecting the quality of ware potatoes offer valuable guidance for this study. Currently, potato defect detection technologies mainly include machine vision-based object detection, traditional near-infrared spectroscopy, and hyperspectral imaging. Xu et al. [5] proposed a lightweight DATW-YOLOv8 model for potato defect detection by optimizing network modules and the detection head and introducing the Wise-EIoU loss, resulting in improved feature extraction and detection accuracy. Wang et al. [6] applied deep transfer learning to potato surface defect detection by fine-tuning SSD Inception V2, RFCN ResNet101, and Faster R-CNN ResNet101, achieving enhanced detection performance. Imanian et al. [7] combined Vis, NIR, and SWIR spectroscopy with intelligent algorithms to identify internal potato defects, selecting optimal wavelengths and achieving effective classification performance. Guo et al. [8] used Vis/NIR spectroscopy combined with an improved ResNet model to detect potato black heart disease, achieving an accuracy of 0.971. Although machine vision-based object detection has made certain progress in identifying external defects of potatoes, it mainly relies on RGB image features such as color, shape, and texture, making it difficult to reflect internal chemical information. Furthermore, detection accuracy is susceptible to lighting conditions and surface reflections, and the algorithms are relatively complex with longer response times [9,10]. In contrast, traditional near-infrared spectroscopy enables non-destructively acquire the internal chemical properties of potatoes, making it suitable for detecting internal defects such as black heart disease. However, its spectral data lack spatial distribution information, limiting the ability to detect external defects and surface textures.
Hyperspectral technology integrated with machine learning has become well-established in agricultural product inspection and has demonstrated strong performance in detecting external defects [11]. Hyperspectral imaging technology has the advantage of simultaneously acquiring spectral and spatial information of a target, representing its chemical composition and surface morphology, respectively [12]. Zhao et al. [13] combined hyperspectral spatial information with PCA and image processing, using a Bayesian classifier and BP neural network to detect potato defects. Al Riza et al. [14] used multispectral images with preprocessing and pseudo-color transformation to enhance contrast among defects, normal skin, and soil, effectively reducing soil interference. Ji et al. [15] used linear discriminant analysis to reduce spectral dimensionality and combined a multi-class SVM with K-means clustering to identify six potato defect types, achieving 90% test accuracy. Zhang et al. [16] built a CNN model to detect the incubation period of potato dry rot, significantly improving accuracy compared with four traditional machine learning methods. Zhao et al. [17] applied SG smoothing and standard normal variate preprocessing with K-nearest neighbor to detect normal, green-skin, and scab potatoes, achieving accuracies of 93%, 93%, and 83%, respectively. These studies confirm the feasibility of using hyperspectral technology to detect external defects in potatoes. However, existing methods specifically for seed potato defect detection mainly rely on single-dimensional information and fail to fully exploit the features of hyperspectral data, which still poses certain limitations in identifying subtle or morphologically similar defects.
At present, research on integrating spectral and spatial multidimensional information to enhance accuracy in the detection of external potato defects remains limited. Jin et al. [18] proposed a detection method based on fused information, which combines diffusion map manifold learning with extreme learning machine. Compared with traditional machine learning, deep learning can automatically extract key features from hyperspectral data through its multilayer structure, reducing reliance on manual feature selection and more effectively exploiting high-dimensional deep information. Among them, the one-dimensional convolutional neural network (1DCNN) is suitable for processing sequential data and has good application potential in hyperspectral analysis [19].
This study proposes a method for detecting external defects in seed potatoes based on spectral–spatial fusion of hyperspectral images and deep learning. This method acquires three-dimensional hyperspectral data of seed potatoes, separately extracts spectral data and spatial texture data, and fuses them after individual processing. Further, traditional machine learning models are compared with deep learning models to select the optimal one, achieving efficient and non-destructive detection of six types of defects—decay, mechanical damage, wormholes, common scab, black scurf and frostbite—providing theoretical basis and technical support for automated seed potato sorting systems.

2. Materials and Methods

2.1. Experiment Materials

The experiment utilized Zihuabai seed potatoes from Ulanqab, Inner Mongolia, China. After manual sorting, the potatoes were categorized into six groups: normal, decay, mechanical damage, wormholes, common scab, and black scurf seed potatoes. Frostbitten seed potato samples were prepared using temperature control. First, normal samples with intact epidermis and no defects were selected. These samples were then placed in a −18 °C environment for continuous freezing for 48 h. Finally, the samples were transferred to a constant 20 °C environment for thawing until typical frost damage characteristics appeared, including black-brown spots, tissue softening, water-soaked exudation, and epidermal wrinkling [20]. Figure 1 displays typical morphological characteristics of normal seed potatoes and six categories of defects. Samples were cleaned before data collection to remove surface soil and impurities.

2.2. Data Collection

The experiment employed a hyperspectral imaging system manufactured by SPECIM, Spectral Imaging Ltd. (Oulu, Finland) to collect data on seed potatoes, as shown in Figure 2. The system primarily comprised a hyperspectral imaging camera (FX17, SPECIM, Spectral Imaging Ltd., Oulu, Finland), a lens, four 150 W halogen lamps arranged at a 45° illumination angle toward the sample center, and an electrically controlled displacement platform. The FX17 camera operated in the spectral range of 935–1721 nm with 224 spectral bands, a spectral resolution of 8 nm, a slit width of 30 μm, and a spatial resolution of 640 pixels (along the scan line).
Before hyperspectral data acquisition, a white PTFE pad was used for lens calibration. The wavelengths of 1266 nm, 1322 nm, and 1520 nm were selected as the red, green, and blue channels, respectively, to generate pseudo-color images that better represent the real appearance of the samples. After calibration of the scanning system, the hyperspectral camera was set with a frame rate of 50 Hz and an exposure time of 3.9 ms. The electrically controlled displacement platform was adjusted to a speed of 15.8 mm/s, with the imaging area defined between 70 mm and 370 mm from the platform edge. To eliminate the influence of camera dark current and ensure uniform illumination for high-quality hyperspectral image acquisition [21], a black–white correction method was applied [22]. The black reference image was acquired with the camera shutter closed, while the white reference image was obtained by imaging the white PTFE pad. The calculation formula is shown in Equation (1):
R c = R R b R w R b
where R c is the hyperspectral image after black–white correction, R is the raw hyperspectral image, R b is the black reference image, R w is the white reference image.
During the data-acquisition process, the seed potato samples were neatly arranged within the imaging area, with four samples placed on the platform for each trial. Both the front and back sides of each sample were captured once. To ensure measurement accuracy, the samples were positioned directly on the platform surface rather than in glass Petri dishes, thereby avoiding reflection artifacts.

2.3. Hyperspectral 3D Data Structures and Data Information Representation

The acquired hyperspectral data are three-dimensional data cubes in the form of “rows × columns × bands,” consisting of 224 two-dimensional images stacked in order of increasing wavelength. Each spectral band corresponds to one two-dimensional image, and the reflectance values of an individual pixel across all wavelengths constitute a one-dimensional spectral curve, as illustrated in Figure 3.

2.3.1. One-Dimensional Spectral Data Extraction Based on ROI

In the experimental stage, the hyperspectral 3D data of seed potatoes were imported into ENVI 5.6. Based on pseudo-color images, regions of interest (ROIs) for different types of defects were manually annotated using the ROI tool, and the average spectral values of all pixels within each ROI were extracted to represent the spectral characteristics of that region [23]. While this manual annotation is crucial for establishing precise ground truth, it is recognized that the process inherently involves a degree of subjectivity. To minimize potential class imbalance, a balanced sampling strategy was adopted, and a comparable number of ROIs was extracted for each defect category. The labels and the number of ROIs extracted for each type are shown in Table 1. The dataset was divided into training and validation sets at a 3:1 ratio using the Kennard–Stone (KS) algorithm [24], which selects the most representative samples of each type based on distance metrics, thereby ensuring that both sets encompass a comprehensive range of class features.
Spectral data are highly sensitive to changes in illumination, instrument noise, and scattering effects, which can lead to spectral curve distortion or baseline drift [25]. To improve the quality of the spectral data, six preprocessing methods were applied in this study, including Savitzky–Golay smoothing (SG), standard normal variate (SNV), multiplicative scattering correction (MSC), first derivative (FD), second derivative (SD), and orthogonal signal correction (OSC), and their performances were compared.

2.3.2. Two-Dimensional Spatial Texture Data Extraction Based on PCA and GLCM

The single-band two-dimensional images of hyperspectral data contain abundant texture information, reflecting the spatial distribution characteristics of image gray levels. To reduce data redundancy and improve the efficiency of texture feature extraction, this study applied principal component analysis (PCA) [26] to the 224-band hyperspectral images for effective dimensionality reduction and noise suppression. This process generated a small number of principal component images that collectively capture the majority of the data’s variance, from which subsequent texture feature extraction based on the gray-level co-occurrence matrix (GLCM) was performed.
GLCM extracts texture features that describe the spatial relationships of gray levels by counting the occurrence frequency of pixel gray-level pairs at specific directions and distances [27]. In this study, GLCMs were constructed at angles of 0°, 45°, 90°, and 135° with a pixel distance of 1, and six texture features were selected for extraction through experiments: entropy, homogeneity, energy, correlation, information measure correlation (IMC), and maximal correlation coefficient (MCC). Based on the principal component images of seed potatoes segmented using Otsu thresholding [28], GLCMs were constructed and texture features were extracted within the selected ROI regions.

2.4. Traditional Machine Learning Methods

2.4.1. Characteristic Wavelength Selecting

To address the challenges of high dimensionality and redundancy in spectral data, competitive adaptive reweighted sampling (CARS) and the successive projections algorithm (SPA) were applied to the preprocessed spectra to identify representative feature wavelengths, which were subsequently used as input variables for traditional machine learning modeling. CARS combines Monte Carlo sampling with partial least squares regression, employing adaptive weighting to screen wavelengths with larger regression coefficients and selecting the optimal feature subset based on the minimum RMSECV [29]. SPA reduces multicollinearity and improves model performance by progressively selecting wavelength features with high information content and low mutual redundancy [30].

2.4.2. Modeling Methods

Based on the selected feature wavelengths, traditional machine learning classification algorithms—including support vector machine (SVM), extreme learning machine (ELM), partial least squares discriminant analysis (PLS-DA), random forest (RF), and k-nearest neighbors (KNN)—were constructed to build recognition models for different regions of seed potatoes. All traditional machine learning models were trained and validated using the same dataset partition and evaluation metrics as the deep learning models to ensure a fair comparison.
SVM is a supervised learning model that performs classification by constructing an optimal separating hyperplane. A one-vs-one (OvO) encoding strategy was adopted for multi-class classification tasks. Key hyperparameters include the penalty parameter C, kernel type, and the kernel coefficient γ. To determine the optimal combination of these hyperparameters, a grid search strategy with 4-fold cross-validation was employed, and the model was automatically refit using the best parameters found.
ELM is a single-hidden-layer feedforward neural network that conducts training by randomly initializing the hidden layer parameters and efficiently computing the output layer weights. Key hyperparameters include the number of hidden neurons, and the choice of hidden layer activation function.
PLS-DA combines partial least squares regression with discriminant analysis to classify samples by extracting latent variables that capture the covariance between predictors and class labels. A key hyperparameter, the number of latent variables, was optimized using a validation curve across a defined range of components with 4-fold cross-validation.
RF is an ensemble learning method that builds multiple decision trees based on randomly selected features and determines the final prediction by aggregating their voting results. Key hyperparameters include the number of trees, maximum depth of each tree, minimum number of samples required to split an internal node, and the minimum number of samples at a leaf node. The maximum number of features considered for splitting each node was set to the square root of the total features.
KNN is an instance-based supervised learning algorithm that classifies new samples by measuring the distance between samples and using the majority voting results of the K nearest neighbors. Key hyperparameters include the number of neighbors K, the weighting scheme of neighbors, and the distance metric.

2.5. Deep Learning Methods

2.5.1. Spectral–Spatial Data Fusion

Spectral–spatial data fusion integrates complementary spectral and spatial image information to achieve a more comprehensive representation of hyperspectral data. Spectral information reflects the chemical composition and molecular structure of materials, while spatial information describes surface morphology, texture, and their spatial distribution [31]. For agricultural products with complex defect types, relying solely on spectral or spatial information alone is often insufficient, particularly when different defects exhibit similar spectral responses or surface characteristics, leading to increased classification ambiguity.
According to the stage of information integration, spectral–spatial fusion strategies can generally be categorized into data-level fusion, feature-level fusion, and decision-level fusion [32]. Feature-level fusion relies on separate extraction of spectral and spatial features and is thus susceptible to subjective choices and prior assumptions. Decision-level fusion combines information only at the classification output stage, limiting its ability to explicitly model the intrinsic relationships between spectral and spatial texture information. In contrast, data-level fusion directly integrates multi-dimensional raw data at the model input stage, preserving data integrity and providing more favorable conditions for deep learning models to automatically explore latent cross-dimensional correlations.
Based on the above considerations, a data-level fusion strategy is adopted in this study. Although this approach may appear as a simple concatenation of spectral and spatial information in form, the core complexity lies in the subsequent feature learning rather than the fusion operation itself, thereby avoiding premature information loss caused by manual feature selection. By jointly mapping preprocessed spectral data and normalized spatial data into a unified input space, the deep learning network can adaptively learn nonlinear relationships between spectral and spatial features. Since the fused data are represented as one-dimensional sequences, 1DCNNs are employed to effectively capture local dependencies and nonlinear feature representations within the high-dimensional spectral–spatial sequences.
Considering that the model is capable of handling high-dimensional data, no feature wavelength selection was performed on the spectral data. The normalization operation [33], which aligns the scale of image texture data with that of spectral data, is shown in Equation (2):
T n o r m = R m i n + T r a w T m i n T m a x T m i n · ( R m a x R m i n )
where T n o r m is the normalized spatial image texture data value, T r a w is the original texture data value, T m i n and T m a x is the minimum and maximum values of the texture data, and R m i n and R m a x is the minimum and maximum values of the preprocessed spectral data.

2.5.2. Design of CNN Architectures

Leveraging the PyTorch 2.3.0 framework, this study introduces two cascaded progressive 1DCNN architectures: the Basic1DCNN and the enhanced Stacked1DCNN. Figure 4 illustrates the network structures using the fused data as input. The red plus sign indicates data-level fusion, while the red arrows emphasize the structural improvements of Stacked1DCNN relative to Basic1DCNN. In the network diagram, different colors represent the feature tensors of each layer: blue corresponds to the outputs of convolutional layers, green to max pooling layer outputs, purple to flattened features, orange to post-Dropout results, and red to the outputs of fully connected layers. Both networks take one-dimensional sequential data as input and add a channel dimension along the second axis using the unsqueeze function, forming a two-dimensional tensor suitable for the model input. The one-dimensional convolution layers (Conv1d) are configured with a kernel size of 3, stride of 1, and padding of 1, while the max pooling layers (MaxPool1d) use a window size and stride of 2.
Basic1DCNN consists of three successive Conv1d and MaxPool1d layers. The first Conv1d layer uses 16 kernels to extract initial features, followed by pooling for downsampling. The second and third Conv1d layers expand the number of channels to 32 and 64, gradually compressing the feature dimensions to one-eighth of the original input. The flattened feature vector is then fed into a fully connected layer with 128 neurons, alongside a Dropout layer with a rate of 0.1 to prevent overfitting. Finally, it passes through a second fully connected layer and a Softmax function to predict the probability distribution over the seven seed potato regions.
As Basic1DCNN is relatively shallow, it is limited in its ability to capture fine-grained features [34]. To overcome this, a deeper Stacked1DCNN, designed with a modular approach of stacked 1D convolutional layers, is introduced to strengthen feature extraction capabilities beyond the Basic1DCNN [35]. The network follows a modular approach, with each module containing two Conv1d layers. The first module expands the number of channels from 1 to 32, with subsequent modules increasing them to 64 and 128. Within each module, ReLU activations are applied, followed by pooling and Dropout layers. After hierarchical feature extraction through the three modules, the features are flattened and passed through fully connected layers, yielding the probability distribution over the seven regions.

2.5.3. Training Strategy for CNN

During the training of the CNN, hyperparameters and optimization strategies were appropriately configured: the training and validation sets were loaded using a DataLoader with a batch size of 32 [36], and the training set was randomly shuffled to enhance generalization. The loss function was the commonly used cross-entropy (CrossEntropyLoss) for multi-class tasks, and the Adam optimizer was selected with an initial learning rate of 0.0001. Combined with the learning rate scheduler ReduceLROnPlateau, the learning rate was halved when the validation loss did not decrease for 10 consecutive epochs, with a minimum learning rate set to 0.000001. The training spanned 300 epochs. During the training phase, forward and backward propagation [37] was employed to compute the loss and update the network parameters, while in the validation phase, the validation loss and performance metrics were calculated. An early stopping strategy was implemented by saving the model weights only when the overall accuracy on the validation set reached a new historical best, ensuring that the selected model demonstrated optimal generalization performance and prevented overfitting. In this study, spectral data, spatial texture data, and their fused representations were fed into the CNN model to identify seven types of surface regions in seed potatoes, with the comparative results summarized below. Details of the model training platform configuration are provided in Table 2.

2.6. Statistical Analysis and Model Evaluation

The detailed hardware and software configurations for model training are listed in Table 2. In addition to the deep learning framework, the Scikit-learn library (version 1.6.1) in Python was utilized for the statistical analysis and implementation of traditional machine learning algorithms. To comprehensively evaluate the recognition performance of the model for seed potato external defects, traditional machine learning models were evaluated using overall accuracy (OA), precision (Pr), recall (Re), average precision (AP), F1-score (F1), and training time, while deep learning models further incorporated mean average precision (mAP) and number of parameters. OA represents the proportion of correctly predicted samples among all samples; Pr represents the proportion of true positives among the samples predicted as positive; Re represents the proportion of actual positive samples that were correctly predicted as positive; F1 represents the harmonic mean of Pr and Re, as defined in Equation (3). The Pr, Re, and F1 for each region type were averaged with equal weights. AP represents the area under the precision–recall curve for each class, quantifying the model’s detection capability for that class; mAP is the mean of all APs, providing a comprehensive evaluation of the model’s overall performance. Training time indicates computational efficiency, while the number of parameters reflects the model’s complexity and resource requirements.
F 1 = 2 · P r · R e P r + R e

3. Results and Analysis

3.1. Spectral Analysis of Different Defect Regions in Seed Potatoes

The near-infrared spectral region primarily reflects the overtone and combination vibration characteristics of hydrogen-bearing groups within molecules [38]. Different regions of seed potatoes exhibit distinct spectral characteristics due to variations in their chemical composition, and these characteristics are closely related to the vibration modes of specific functional groups. A deeper absorption trough signifies a higher content of the corresponding functional group, characterized by stronger stretching vibrations and greater light absorption. Conversely, a higher reflection peak indicates that the light is predominantly reflected rather than absorbed, suggesting weaker vibrations of the associated functional group and a lower concentration. Figure 5 shows the original overall spectral curves and the average spectral curves for each type of region.
These are derived from the figure above: (1) Distinct absorption troughs appear near 950 nm, 1180 nm, and 1450 nm in the spectral curves, corresponding to stretching vibrations of O-H triple frequency (moisture), C-H triple frequency (starch), and O-H double frequency (moisture), respectively [35]. The reflection peak near 1070 nm is associated with a weaker N-H triple frequency (protein) vibration, leading to increased reflectance [39]. (2) The spectral curves for decay and common scab areas are generally higher overall. This is due to microbial and pathogenic degradation of proteins, resulting in reduced protein content [40]. Cell wall degradation in rotten areas releases bound water and inhibits water evaporation, increasing free water content and leading to higher humidity compared to diseased areas [41]. (3) The spectral curve of the frostbitten area resembles that of the normal area but exhibits lower overall reflectance. This occurs because frostbite causes cellular dehydration and tissue softening [42], leading to greater light absorption rather than reflection. (4) Spectral curves of wormholes and mechanically damaged areas exhibit similarities, as surface physical damage creates holes or cracks that expose tissue to oxidation, leading to comparable chemical composition changes. Within the 935–1140 nm range, normal areas show higher reflectance than wormholes and mechanically damaged areas, while the opposite holds true in the 1140–1721 nm range. This primarily stems from spectral differences caused by protein release, starch degradation, and moisture loss [15]. (5) The spectral curve of the black scurf disease area exhibits a relatively flat peak near 1180 nm, with an overall trend similar to that of the common scab area but lower reflectance. This may be related to surface structural changes caused by fungal infection in both cases, though they differ in surface texture [43]. The sclerotia of black scurf disease form black, hard particles on the surface of seed potatoes, rich in pigments and complex organic compounds [44]. These particles absorb more light, thereby reducing reflectance.

3.2. Preprocessing Method Selection and Comparative Analysis

PLS-DA was applied to the spectral data to select the optimal preprocessing method, the results are shown in Table 3, with OSC achieving the best classification performance. After OSC preprocessing, the OA, Pr, Re, and F1 on the validation set were 89.39%, 92.04%, 84.71%, and 86.61%, respectively, all showing improvement compared with the raw data. OSC employs orthogonal signal correction theory to eliminate systematic variations unrelated to classification while preserving critical spectral information [45]. In contrast, SNV and MSC primarily eliminate scattering effects and baseline drift but may weaken some useful spectral features. FD and SD can enhance spectral details through derivative computation but may also amplify high-frequency noise, reducing data stability.

3.3. Dimensionality Reduction Based on PCA and Texture Feature Extraction

The metadata file of the hyperspectral image was first imported, and the mean value of each spectral band was computed to convert the original hyperspectral data into a grayscale representation. Subsequently, an Otsu thresholding method was applied to generate a mask, removing the background and retaining only the potato region. The three-dimensional hyperspectral data were then flattened into a two-dimensional matrix (number of pixels × number of bands), and PCA was performed to extract the first four principal component images (PC1–PC4). As shown in Figure 6, the contribution rates of the first four principal components were 90.63%, 7.89%, 1.19%, and 0.12%, respectively. While the cumulative contribution of the first three PCs exceeded 99%, indicating their strong capacity for preserving the majority of the original hyperspectral data’s variance. Considering this high information content and the practical need for computational manageability in texture analysis, subsequent texture feature extraction was performed on these first three principal component images.
When establishing ROIs, select different types of regions to generate corresponding extensible markup language (.xml) files for recording pixel coordinates. Based on these coordinates, the GLCM was calculated to extract six texture features in four directions at a spatial distance of 1, yielding a total of 24 parameters. In combination with the three principal component images, 72 texture parameters were extracted for each ROI. This extraction procedure was consistent with the method described in Section 2.3.2. Finally, all extracted data were normalized and stored as structured files for subsequent data fusion and modeling analyses.
To compare the differences between spectral data and spatial texture data in the feature space, PCA was applied separately to each data type for dimensionality reduction, followed by 3D and 2D visualization. As illustrated in Figure 7, the two data types are distinguished by different colors. The results indicate that the distributions of spectral and spatial data in the principal component space are clearly separated. In particular, the PC1/PC2 plane exhibits minimal overlap, demonstrating that the first two principal components effectively capture the primary differences between the two feature types. This highlights the complementary nature of spectral and spatial features in information representation, providing a theoretical basis for data fusion.

3.4. Performance Comparison of Traditional Machine Learning Models

After OSC preprocessing, the spectral data were subjected to feature wavelength selection using the SPA and CARS. SPA aims to minimize the root mean square error of cross-validation (RMSECV) while balancing the number of variables and model simplicity. When the RMSECV decreased to 1.16, a total of 37 feature wavelengths were selected, as shown in Figure 8a.
The variations of the parameters during the CARS process are illustrated in Figure 8c–e. The lowest RMSECV was achieved at the 22nd sampling iteration, resulting in the selection of 30 feature wavelengths. The paths of the regression coefficient indicated that the selected wavelengths exhibited stable coefficients, demonstrating good robustness. The detailed feature selection results of CARS are presented in Figure 8b.
The wavelengths selected by both methods were mainly concentrated in the ranges of 950–1100 nm and 1300–1700 nm, which are to some extent associated with the characteristic vibrations of hydrogen-containing groups in substances such as moisture, starch, and protein.
Based on the feature wavelength subsets selected by SPA and CARS, five traditional machine learning classification models—SVM, ELM, PLS-DA, RF, and KNN—were constructed. As shown in Table 4, the CARS–ELM model achieved the best overall performance, with the highest OA of 94.44%, Pr of 94.62%, Re of 91.68%, and F1 of 92.66%. After feature wavelength selection, the reduced input dimensionality led to shorter training times for all models compared with the original data. The hyperparameter settings for each model are shown in Table 5.
The ELM model initializes the weights between the input and hidden layers randomly and computes the output weights in a single step, making it sensitive to noisy wavelengths. The CARS algorithm reduces data redundancy and enhances the linear mapping capability of ELM, thereby improving the effectiveness of hidden-layer nodes. In contrast, the wavelengths retained by SPA may contain nonlinear noise, which can adversely affect model performance.
The overall performance of SVM, PLS-DA, RF, and KNN was relatively weaker, which can be attributed to limitations in feature adaptability and model mechanisms. PLS-DA relies on linear mapping for classification and struggles to capture the strong nonlinear relationships inherent in spectral data, resulting in limited discriminative ability. Without feature wavelength selection, its OA and F1 were 89.39% and 86.61%, respectively, and showed little improvement after applying SPA or CARS, indicating that linear models are insensitive to redundant or nonlinear information and fail to fully exploit spectral variability. RF is an ensemble of multiple decision trees which is sensitive to high-dimensional and highly correlated features. Regardless of using full-spectrum or selected feature wavelengths, its OA remained around 81–83% and F1 around 69–74%, suggesting that redundant wavelengths and correlated bands constrain tree-splitting effectiveness and reduce discriminative power. Moreover, excessive feature numbers increase model complexity, making certain nodes more susceptible to noise-induced misclassification. KNN classifies samples based on distance metrics, but in high-dimensional spectral space, data sparsity and noisy wavelengths blur class boundaries. Although SPA-based feature wavelengths selection slightly improved its performance (OA = 88.64%, F1 = 84.88%) by reducing redundancy and enhancing local discrimination, its overall performance remained inferior to ELM and SVM. The performance of SVM depends on the kernel type and hyperparameters C and γ. With CARS-selected feature wavelengths, its OA decreased to 87.12%, lower than that with full-spectrum input (91.67%), likely due to CARS over-optimizing certain bands and failing to preserve nonlinear characteristics. In contrast, ELM, by leveraging randomly initialized hidden-layer weights and non-linear activation functions, is inherently capable of modeling complex nonlinear relationships within spectral data [46]. This characteristic, which allows for robust feature transformation and effective classification, likely contributed to its superior performance, achieving the highest OA and F1 among the traditional machine learning models tested.

3.5. Performance Comparison of Deep Learning Models

Based on the data-level fusion strategy and the 1DCNN, the modeling results demonstrated that the synergistic effect of multi-dimensional information effectively enhanced the detection performance of external defects in seed potatoes, as shown in Table 6.
In the comparison using single spectral data, single spatial data, and fused spectral–spatial data as inputs, the fused dataset exhibited the best performance. Compared with the single spectral input, the Stacked1DCNN model with fused input achieved improvements of 2.81%, 2.78%, 3.20%, 3.01%, and 1.11% across the first five evaluation metrics. Similarly, the Basic1DCNN model showed corresponding increases of 3.13%, 2.88%, 3.25%, 3.22%, 1.86%. As illustrated in Figure 9a, the validation set accuracy curve under fused input is generally higher than that under single input. Under the spectral input condition, the OA of the two models reached 94.10% and 96.07%, which were significantly higher than those obtained with spatial input alone (69.04% and 75.18%). Combined with spectral analysis, these results indicate that hyperspectral detection of external defects in seed potatoes primarily depends on their spectral response characteristics. Although image data provide some additional information complementary to the spectral data, relying on this alone is insufficient for accurate identification.
The Basic1DCNN model employs a single convolutional layer with a relatively small receptive field and introduces a dropout layer only at the output layer. In contrast, the Stacked1DCNN model is constructed using stacked dual-layer convolutional modules, with twice the number of channels as Basic1DCNN, thereby enhancing feature resolution and encoding capacity. A dropout layer is added at the end of each module to suppress overfitting and improve generalization. As illustrated in Figure 9b, the loss value of Stacked1DCNN decreases more rapidly, indicating a stronger learning capability. Both models show no upward trend in validation loss and eventually converge. Further confirmation of their generalization capabilities and the absence of significant overfitting is provided by the highly performance metrics on the held-out validation set, as systematically presented in Table 6. Under the fused input, Stacked1DCNN exhibited systematic improvements over Basic1DCNN, with OA, Pr, Re, F1, and mAP increased by 1.77%, 1.78%, 2.06%, 1.85%, and 0.62%, respectively.
The validation confusion matrices of Basic1DCNN and Stacked1DCNN under different input types are shown in Figure 10. Overall, the Stacked1DCNN model exhibits higher recognition accuracy than Basic1DCNN. The fused input achieves the best performance across all categories, particularly by significantly reducing misclassifications between the easily confused categories of mechanical damage and wormholes. For example, in the Basic1DCNN, a total of 17 misclassifications occurred between the mechanical damage and wormhole categories under spectral input, which decreased to 9 under fused input. In the Stacked1DCNN, the corresponding misclassifications were 14 with spectral input and were significantly reduced to 3 with fused input.
In practical applications, the risk of missed detections (false negatives) is generally more critical than that of false alarms (false positives). As shown in Figure 10, for the Basic1DCNN, 13 defective samples were misclassified as normal under spatial input, 1 under spectral input, and none under fused input (0 cases). Similarly, in the Stacked1DCNN model, 8 defective samples were misclassified as normal under spatial input, whereas no missed detections occurred under either spectral or fused input. Furthermore, as shown in Figure 11, the P–R curves of Stacked1DCNN demonstrate that the fused input achieves the best precision–recall performance, maintaining high precision at high recall levels and effectively balancing false negatives and false positives. From the area enclosed by the curves, the fused input exhibits the largest P-R curve area, followed by the spectral input, while the spatial input shows the smallest. These findings demonstrate that spectral–spatial fusion strategy effectively reduces the missed detection rate of defects, enhancing the robustness of defect detection and significantly improving the stability and reliability of the model.
Beyond performance, computational efficiency metrics such as training time and parameter count are also critical. As shown in Table 6 compared to Basic1DCNN, Stacked1DCNN requires more parameters and longer training times across all input types. This architectural difference directly affects model performance. The higher parameter count of Stacked1DCNN enables it to learn more complex features, thereby improving accuracy. However, when the input information is limited, such as with spatial-only data, the benefit of additional parameters diminishes.
For Stacked1DCNN, the additional computational cost is justified, as it achieves higher OA and mAP in precise defect detection, which is particularly important for agricultural applications. As shown in Table 4, compared to traditional machine learning models, deep learning models inherently require longer training times, due to their ability to automatically learn hierarchical features and their stronger feature extraction and generalization capabilities. In complex tasks, deep learning models clearly outperform traditional methods, demonstrating that extra computational investment is reasonable to achieve higher performance.
Ultimately, the Stacked1DCNN model using spectral–spatial fusion data provides the best balance for this application. Its high performance ensures robust defect detection, and although the computational demand is relatively higher, it remains manageable, with the resulting accuracy outweighing the computational cost.

3.6. Comparison with Recent Deep Learning-Based Methods

To further evaluate the proposed method in a broader research context, this section compares the experimental results of the present study with those reported in recent deep learning–based studies on potato defect detection and related agricultural products. Table 7 summarizes the performance of different models across defect detection tasks involving potatoes and other agricultural products. The results indicate that, even with an increased number of defect categories, spectral–spatial fused data combined with the Stacked1DCNN model achieved higher detection accuracy than other models, demonstrating its accuracy advantage in seed potato defects detection. With further optimization and extension, the proposed approach also has the potential to provide a reference for the practical production inspection of similar agricultural seeds.
Despite differences in data acquisition conditions and defect types, such comparative analysis is still helpful for clarifying the positioning of the proposed spectral–spatial fusion method within current deep learning research. This study focuses on investigating the effectiveness of spectral–spatial data fusion under a controllable and interpretable 1D-CNN framework. For this study, considering the characteristics of the one-dimensional spectral–spatial input data and practical application requirements, Basic1DCNN and Stacked1DCNN were selected as representative shallow and deep 1D convolutional networks, respectively.

4. Discussion

This study adopts a data-level fusion strategy that integrates spectral responses related to chemical composition with the spatial texture information extracted from images, thereby effectively enhancing the defect recognition capability of seed potatoes. For example, frostbitten regions exhibit spectral reflectance characteristics similar to those of normal samples, whereas their surface wrinkles and spots present distinct texture patterns. As shown in Table 6 and Figure 10, spectral–spatial fused data achieved improved performance metrics and more favorable confusion matrices, providing quantitative evidence of the critical role of texture features. Notably, a misclassification by the Basic1DCNN model—where one normal sample was incorrectly identified as frostbite using spectral data alone—was corrected when fused data were used, directly demonstrating how spatial information helps resolve spectrally ambiguous defects.
The proposed Stacked1DCNN model with fused data input achieved an OA of 98.77%, surpassing the traditional machine learning method CARS-ELM, with other evaluation metrics also showing superior performance. This advantage stems from deep learning’s ability to automatically extract complex nonlinear features from full-spectrum hyperspectral data through multiple convolutional layers, by processing the fused spectral and spatial information to capture their deep correlations. Moreover, deep learning does not require manual feature design, enabling adaptive discovery of key patterns, strong robustness to noise and redundant information, and efficient training on large-scale datasets, thus allowing accurate recognition of complex defect types. In contrast, traditional machine learning relies on feature wavelengths selection, which, while reducing redundancy, may overlook potential nonlinear relationships, such as the effects of starch and protein degradation on spectral reflectance.
Although the proposed method demonstrates encouraging performance, it is worth noting that the dataset used in this study is relatively modest in size, primarily due to the cost and complexity of hyperspectral data acquisition and the need for careful manual annotation. As a result, the sample diversity may be somewhat constrained, which could influence the model’s generalization ability under more diverse real-world conditions. In addition, all experiments were conducted in a controlled laboratory environment. Future practical deployment would therefore benefit from further validation under varying operational conditions, such as changes in illumination, the presence of dust, and temperature fluctuations, which may affect hyperspectral data acquisition and model robustness. Future work will therefore focus on several aspects: (1) expanding the dataset by including more samples, different potato varieties, and commercial virus-free seed potatoes to further validate the robustness and applicability of the proposed approach under broader sample conditions; (2) conducting validation experiments in real production environments to assess the stability and reliability of hyperspectral imaging–based detection under different practical scenarios; and (3) incorporating attention mechanisms and other advanced deep learning techniques to enhance the model’s focus on key discriminative features and further improve overall robustness and detection performance. These efforts are expected to lay a more solid foundation for the practical application of hyperspectral imaging technology in seed potato defect detection and promote the continuous optimization and high-quality development of the potato industry.

5. Conclusions

This study focused on seed potatoes of the Zihuabai potato variety. Using a hyperspectral imaging system, one-dimensional spectral data within the 935–1721 nm range and two-dimensional image spatial data comprising 224 spectral bands were collected for both normal seed potatoes and six types of defect regions. By integrating multiple data processing methods, both traditional machine learning and deep learning classification models were constructed to achieve accurate identification of external defects in seed potatoes. The main conclusions are as follows:
Six preprocessing methods were compared for the original spectral data, and the OSC method was determined to be the most effective based on PLS-DA modeling, thus adopted for subsequent analysis. On this basis, SPA and CARS algorithms were employed to extract 37 and 30 feature wavelengths, respectively. For image spatial data, PCA was applied to extract the first three principal components, from which 24 texture parameters were derived from each component image using GLCM, resulting in a total of 72 texture features.
Using the preprocessed full-band spectra and the selected feature wavelengths as inputs, five traditional machine learning models were developed to classify normal and six defective regions. The results indicated that the CARS–ELM combination achieved the best performance, with an overall accuracy of 94.44%, precision of 94.62%, recall of 91.68%, and F1-score of 92.66%.
Using preprocessed spectral data, normalized spatial texture data, and their fusion data as inputs, two deep learning models were constructed: Basic1DCNN and an improved Stacked1DCNN. The results demonstrated that the Stacked1DCNN model with fused data input achieved the best overall performance, reaching an overall accuracy of 98.77%, precision of 98.77%, recall of 98.93%, F1-score of 98.73%, and mAP of 99.66%, outperforming the CARS–ELM traditional machine learning model.
In conclusion, the fusion of hyperspectral spectral and spatial data combined with Stacked1DCNN deep learning modeling enables highly effective detection of external defects in seed potatoes. This approach provides valuable insights for the development of automatic sorting and nondestructive detection equipment for seed potatoes.

Author Contributions

Conceptualization, X.C. and M.H.; methodology, X.C. and M.H.; software, X.C. and M.H.; validation, X.C., M.H., J.S., Y.S., J.W. and H.Z.; formal analysis, J.S.; investigation, Y.S.; resources, M.H.; data curation, X.C. and J.W.; writing—original draft preparation, X.C. and M.H.; writing—review and editing, X.C. and M.H.; visualization, X.C. and H.Z.; supervision, M.H. and J.S.; project administration, M.H.; funding acquisition, M.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (62366041).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author; however, since the author has not yet graduated and there is a confidentiality agreement, the disclosure of the data will be considered after graduation.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Huang, J.; Wang, X.; Wu, H.; Liu, S.; Yang, X.; Liu, W. Detecting potato seed bud eye using lightweight convolutional neural network (CNN). Trans. Chin. Soc. Agric. Eng. 2023, 39, 172–182. [Google Scholar]
  2. Hintze, A.; Holden, Z.J.; Pavek, M.J. Economic Performance of Potato Crops Dependent on Variety Specific Seed Piece Weight and Uniformity Within a Seed Lot. Am. J. Potato Res. 2025, 102, 152–165. [Google Scholar] [CrossRef]
  3. Li, X.L.; Wang, F.Y.; Guo, Y.; Liu, Y.; Lv, H.Z.; Zeng, F.; Lv, C. Improved YOLO v5s-based detection method for external defects in potato. Front. Plant Sci. 2025, 16, 1527508. [Google Scholar] [CrossRef] [PubMed]
  4. Song, S. Design of Dynamic Screening System for Potato Seed and Research on External Defect Detection. Master’s Dissertation, Inner Mongolia Agricultural University, Hohhot, China, 2024. [Google Scholar]
  5. Xu, Y.; Liu, S.; Wang, X.; Wu, H.; Huang, J.; Wang, H.; Wang, Y. Lightweight online detection method for potato surface defects based on the improved YOLOv8n model. Trans. Chin. Soc. Agric. Eng. 2025, 41, 135–144. [Google Scholar]
  6. Wang, C.; Xiao, Z. Potato Surface Defect Detection Based on Deep Transfer Learning. Agriculture 2021, 11, 863. [Google Scholar] [CrossRef]
  7. Imanian, K.; Pourdarbani, R.; Sabzi, S.; García-Mateos, G.; Arribas, J.I.; Molina-Martínez, J.M. Identification of Internal Defects in Potato Using Spectroscopy and Computational Intelligence Based on Majority Voting Techniques. Foods 2021, 10, 982. [Google Scholar] [CrossRef] [PubMed]
  8. Guo, Y.; Zhang, L.; He, Y.; Lv, C.; Liu, Y.; Song, H.; Lv, H.; Du, Z. Online inspection of blackheart in potatoes using visible-near infrared spectroscopy and interpretable spectrogram-based modified ResNet modeling. Front. Plant Sci. 2024, 15, 1403713. [Google Scholar] [CrossRef]
  9. Fu, L.; Feng, Y.; Wu, J.; Liu, Z.; Gao, F.; Majeed, Y.; Al-Mallahi, A.; Zhang, Q.; Li, R.; Cui, Y. Fast and accurate detection of kiwifruit in orchard using improved YOLOv3-tiny model. Precis. Agric. 2021, 22, 754–776. [Google Scholar] [CrossRef]
  10. Dalal, M.; Mittal, P. A Systematic Review of Deep Learning-Based Object Detection in Agriculture: Methods, Challenges, and Future Directions. Comput. Mater. Contin. 2025, 84, 57–91. [Google Scholar] [CrossRef]
  11. Ram, B.; Oduor, P.; Igathinathane, C.; Howatt, K.; Sun, X. A systematic review of hyperspectral imaging in precision agriculture: Analysis of its current state and future prospects. Comput. Electron. Agric. 2024, 222, 109037. [Google Scholar] [CrossRef]
  12. Zhao, M.; Song, R.; Wang, X.; Fan, K.; Chen, J.; Gu, Y. Striping noise removal method in meat detection based on hyperspectral imaging. Trans. Chin. Soc. Agric. Eng. 2022, 38, 271–280. [Google Scholar]
  13. Zhao, M.; Liu, Z.; Zou, X.; Wu, L.; Zhang, F.; Long, J. Detection of defects on potatoes by hyperspectral imaging technology. Laser J. 2016, 37, 20–24. [Google Scholar] [CrossRef]
  14. Al Riza, D.F.; Widodo, S.; Yamamoto, K.; Ninomiya, K.; Suzuki, T.; Ogawa, Y.; Kondo, N. External defects and severity level evaluation of potato using single and multispectral imaging in near infrared region. Inf. Process. Agric. 2024, 11, 80–90. [Google Scholar] [CrossRef]
  15. Ji, Y.; Sun, L.; Li, Y.; Li, J.; Liu, S.; Xie, X.; Xu, Y. Non-destructive classification of defective potatoes based on hyperspectral imaging and support vector machine. Infrared Phys. Technol. 2019, 99, 71–79. [Google Scholar] [CrossRef]
  16. Zhang, F.; Wang, W.; Wang, C.; Zhou, J.; Pan, Y.; Sun, J. Study on hyperspectral detection of potato dry rot in gley stage based on convolutional neural network. Spectrosc. Spectral Anal. 2024, 44, 480–489. [Google Scholar]
  17. Zhao, P.; Wang, X.; Zhao, Q.; Xu, Q.; Sun, Y.; Ning, X. Non-Destructive Detection of External Defects in Potatoes Using Hyperspectral Imaging and Machine Learning. Agriculture 2025, 15, 573. [Google Scholar] [CrossRef]
  18. Jin, R.; Li, X.; Yan, Y.; Xu, M.; Ku, J.; Xu, S.; Hu, X. Detection method of multi-target recognition of potato based on fusion of hyperspectral imaging and spectral information. Trans. Chin. Soc. Agric. Eng. 2015, 31, 258–263. [Google Scholar]
  19. Yang, Q.; Yin, L.; Hu, X.; Wang, L. Detection of drug residues in bean sprouts by hyperspectral imaging combined with 1DCNN with channel attention mechanism. Microchem. J. 2024, 206, 111497. [Google Scholar] [CrossRef]
  20. Zou, Z.; Wu, X.; Chen, Y.; Bie, Y.; Wang, L.; Lin, P. Investigation of hyperspectral imaging technology for detecting frozen and mechanical damaged potatoes. Spectrosc. Spectral Anal. 2019, 39, 3571–3578. [Google Scholar]
  21. Abdollahi Moghaddam, M.R.; Rafe, A.; Taghizade, M. Kinetics of color and physical attributes of cookie during deep-fat frying by image processing techniques. J. Food Process. Preserv. 2014, 39, 91–99. [Google Scholar] [CrossRef]
  22. Hao, M.; Zhang, M.; Tian, H.; Sun, J. Research on Silage Corn Forage Quality Grading Based on Hyperspectroscopy. Agriculture 2024, 14, 1484. [Google Scholar] [CrossRef]
  23. Wang, F.; Li, Q.; Deng, W.; Wang, C.; Han, L. Detection of Anthocyanins in Potatoes Using Micro-Hyperspectral Images Based on Convolutional Neural Networks. Foods 2024, 13, 2096. [Google Scholar] [CrossRef]
  24. Babatunde, H.A.; McDougal, O.M.; Andersen, T. Automated Spectral Preprocessing via Bayesian Optimization for Chemometric Analysis of Milk Constituents. Foods 2025, 14, 2996. [Google Scholar] [CrossRef]
  25. Du, W.; Guo, P.; Liu, X. Detection of delinted cotton seed vigor based on hyperspectral wavelet features. Trans. Chin. Soc. Agric. Eng. 2024, 40, 174–186. [Google Scholar]
  26. Diao, Z.; Guo, P.; Zhang, B.; Yan, J.; He, Z.; Zhao, S.; Zhao, C.; Zhang, J. Spatial-spectral attention-enhanced Res-3D-OctConv for corn and weed identification utilizing hyperspectral imaging and deep learning. Comput. Electron. Agric. 2023, 212, 108092. [Google Scholar] [CrossRef]
  27. Liu, L.; Kuang, G. Overview of image textural feature extraction methods. J. Image Graph. 2009, 14, 622–635. [Google Scholar]
  28. Zhang, Y.; Liu, M.; Gong, J.; Lan, Y. Apple recognition based on two-level segmentation and region-marked gradient Hough circle transform. Trans. Chin. Soc. Agric. Eng. 2022, 38, 110–121. [Google Scholar]
  29. Zhang, M.; Hao, M.; Tian, H.; Li, P.; Zhao, K.; Ren, X. Nondestructive detection of the pH value of silage maize feeds based on hyperspectral images. Trans. Chin. Soc. Agric. Eng. 2023, 39, 239–247. [Google Scholar]
  30. Park, M.-S.; Faqeerzada, M.A.; Jang, S.H.; Kim, H.; Lee, H.; Kim, G.; Cho, Y.-S.; Hwang, W.-H.; Kim, M.S.; Baek, I.; et al. Detection of Abiotic Stress in Potato and Sweet Potato Plants Using Hyperspectral Imaging and Machine Learning. Plants 2025, 14, 3049. [Google Scholar] [CrossRef] [PubMed]
  31. Weng, S.; Tang, P.; Yuan, H.; Guo, B.; Yu, S.; Huang, L.; Xu, C. Hyperspectral imaging for accurate determination of rice variety using a deep learning network with multi-feature fusion. Spectrochim. Acta A 2020, 234, 118237. [Google Scholar] [CrossRef]
  32. Zhang, S.; Meng, X.; Liu, Q.; Yang, G.; Sun, W. Feature-Decision Level Collaborative Fusion Network for Hyperspectral and LiDAR Classification. Remote Sens. 2023, 15, 4148. [Google Scholar] [CrossRef]
  33. Song, Y.; Zhao, L.; Ning, J.; Dai, Q.; Cheng, F. Evaluation of tea similarity based on deep metric learning. Trans. Chin. Soc. Agric. Eng. 2023, 39, 260–269. [Google Scholar]
  34. Li, J.; Zhang, Y.; Zhou, B.; Weng, H.; Zhou, B.; Ye, D.; Qu, F. Inversing soil salinity in cotton fields using spectroscopy sample augmentation and moisture correction. Trans. Chin. Soc. Agric. Eng. 2024, 40, 171–181. [Google Scholar]
  35. Çelik, O.I.; Gazioğlu, C. Leveraging deep learning for coastal monitoring: A VGG16-based approach to spectral and textural classification of coastal areas with sentinel-2A data. Appl. Ocean. Res. 2024, 151, 104163. [Google Scholar] [CrossRef]
  36. Yang, S.; Feng, Q.; Zhang, J.; Wang, G.; Zhang, P.; Yan, H. Nondestructive Classification of Defects in Potatoes Based on Lightweight Convolutional Neural Network. Food Sci. 2021, 42, 284–289. [Google Scholar]
  37. Chen, Q.; Tang, B.; Long, Z.; Miao, J.; Huang, Z.; Dai, R.; Shi, S.; Zhao, M.; Zhong, N. Water quality classification using convolution neural network based on UV-Vis spectroscopy. Spectrosc. Spectral Anal. 2023, 43, 731–736. [Google Scholar]
  38. Chen, P.; Yang, J.; Chu, X.; Li, J.; Xu, Y.; Liu, D. Research and Application Progress of Near Infrared Spectroscopy Analytical Technology in China in the Past Five Years. Chin. J. Anal. Chem. 2024, 52, 1213–1224. [Google Scholar] [CrossRef]
  39. Yan, Y.; Chen, B. Molecular Vibration and Near-Infrared Spectroscopy. In Near Infrared Spectroscopy: Principles, Technologies and Applications; Yi, S., Ed.; China Light Industry Press: Beijing, China, 2013; pp. 21–28. [Google Scholar]
  40. Amit, S.K.; Uddin, M.M.; Rahman, R.; Islam, S.M.R.; Khan, M.S. A review on mechanisms and commercial aspects of food preservation and processing. Agric. Food Secur. 2017, 6, 51. [Google Scholar] [CrossRef]
  41. Ren, Y.; Sun, P.; Wang, X.; Zhu, Z. Degradation of cell wall polysaccharides and change of related enzyme activities with fruit softening in Annona squamosa during storage. Postharvest Biol. Technol. 2020, 166, 111203. [Google Scholar] [CrossRef]
  42. Stegner, M.; Buchner, O.; Schäfernolte, T.; Holzinger, A.; Neuner, G. Responses to Ice Formation and Reasons of Frost Injury in Potato Leaves. Crops 2022, 2, 378–389. [Google Scholar] [CrossRef]
  43. Zhang, N.; Yang, G.; Zhao, C.; Zhang, J.; Yang, X.; Pan, Y.; Huang, W.; Xu, B.; Li, M.; Zhu, X.; et al. Progress and prospects of hyperspectral remote sensing technology for crop diseases and pests. Natl. Remote Sens. Bull. 2021, 25, 403–422. [Google Scholar] [CrossRef]
  44. Xia, S.; Niu, Z.; Li, Q.; Zhang, L.; Sheng, W. Research Progress on Rhizoctonia solani and Integrated Control of Potato Black Spot Disease. Jiangsu Agric. Sci. 2022, 50, 28–33. [Google Scholar] [CrossRef]
  45. Chen, R.; Wang, Y.; Zhang, J.; Ding, Q.; Jia, K.; Xu, X. Hyperspectral inversion of soil salinity after correcting moisture effect in Yinchuan Plain using orthogonal signals. Trans. Chin. Soc. Agric. Eng. 2023, 39, 122–130. [Google Scholar]
  46. Wang, J.; Lu, S.; Wang, S.; Zhang, Y. A review on extreme learning machine. Multimed. Tools Appl. 2022, 81, 41611–41660. [Google Scholar] [CrossRef]
  47. Yang, Y.; Liu, Z.; Huang, M.; Zhu, Q.; Zhao, X. Automatic detection of multi-type defects on potatoes using multispectral imaging combined with a deep learning model. J. Food Eng. 2023, 336, 111213. [Google Scholar] [CrossRef]
  48. Hu, Y.; Ma, B.; Wang, H.; Zhang, Y.; Li, Y.; Yu, G. Detecting different pesticide residues on Hami melon surface using hyperspectral imaging combined with 1D-CNN and information fusion. Front. Plant Sci. 2023, 14, 1105601. [Google Scholar] [CrossRef] [PubMed]
  49. Li, X.; Han, Y.; Pan, Y.; Lv, H.; Wang, F.; Lv, C. Detection on Potato Black Heart Disease by Near Infrared Spectroscopy Based on Improved ResNet. Trans. Chin. Soc. Agric. Mach. 2024, 55, 470–479. [Google Scholar] [CrossRef]
  50. Deb, N.; Rahman, T. An efficient VGG16-based deep learning model for automated potato pest detection. Smart Agric. Technol. 2025, 12, 101409. [Google Scholar] [CrossRef]
  51. Liu, X.; Chen, M.; Zhao, Z.; Gao, W.; Zhao, X. Identification of the Origin of Ziziphi Spinosae Semen Based on Near-infrared Spectroscopy and One-dimensional Convolutional Neural Network. Sci. Technol. Food Ind. 2025, 46, 319–329. [Google Scholar] [CrossRef]
  52. Jiang, Y.; Shang, J.; Cai, Y.; Liu, S.; Liao, Z.; Pang, J.; He, Y.; Wei, X. The Fusion of Focused Spectral and Image Texture Features: A New Exploration of the Nondestructive Detection of Degeneration Degree in Pleurotus geesteranus. Agriculture 2025, 15, 1546. [Google Scholar] [CrossRef]
Figure 1. Potato samples: (a) normal potato; (b) decay potato; (c) mechanical damage potato; (d) wormhole potato; (e) common scab potato; (f) black scurf potato; (g) frostbite potato.
Figure 1. Potato samples: (a) normal potato; (b) decay potato; (c) mechanical damage potato; (d) wormhole potato; (e) common scab potato; (f) black scurf potato; (g) frostbite potato.
Agriculture 16 00077 g001
Figure 2. Hyperspectral imaging system.
Figure 2. Hyperspectral imaging system.
Agriculture 16 00077 g002
Figure 3. Hyperspectral 3D data.
Figure 3. Hyperspectral 3D data.
Agriculture 16 00077 g003
Figure 4. Schematic diagram of spectral and spatial data extraction and the architectures of the two 1DCNN models.
Figure 4. Schematic diagram of spectral and spatial data extraction and the architectures of the two 1DCNN models.
Agriculture 16 00077 g004
Figure 5. Spectral curves for various regions: (a) original overall spectral curve; (b) average spectral curve.
Figure 5. Spectral curves for various regions: (a) original overall spectral curve; (b) average spectral curve.
Agriculture 16 00077 g005
Figure 6. Principal components of hyperspectral images of seed potato.
Figure 6. Principal components of hyperspectral images of seed potato.
Agriculture 16 00077 g006
Figure 7. PCA 3D and 2D visualization of spectral and spatial data.
Figure 7. PCA 3D and 2D visualization of spectral and spatial data.
Agriculture 16 00077 g007
Figure 8. Feature wavelength selection by SPA and CARS and analysis of CARS parameter variations. (a) SPA-selected feature wavelengths; (b) CARS-selected feature wavelengths; (c) Trend of the number of sampled variables during the CARS process; (d) variation of RMSECV with the number of sampling iterations in the CARS process; (e) trend of regression coefficients during the CARS process.
Figure 8. Feature wavelength selection by SPA and CARS and analysis of CARS parameter variations. (a) SPA-selected feature wavelengths; (b) CARS-selected feature wavelengths; (c) Trend of the number of sampled variables during the CARS process; (d) variation of RMSECV with the number of sampling iterations in the CARS process; (e) trend of regression coefficients during the CARS process.
Agriculture 16 00077 g008
Figure 9. Performance comparison of Basic1DCNN and Stacked1DCNN models under different input conditions. (a) Validation accuracy curves; (b) training and validation loss curves of Basic1DCNN and Stacked1DCNN (solid lines represent the training set; dashed lines represent the validation set).
Figure 9. Performance comparison of Basic1DCNN and Stacked1DCNN models under different input conditions. (a) Validation accuracy curves; (b) training and validation loss curves of Basic1DCNN and Stacked1DCNN (solid lines represent the training set; dashed lines represent the validation set).
Agriculture 16 00077 g009
Figure 10. Comparison of confusion matrices of Basic1DCNN and Stacked1DCNN with different input data. (a) Basic1DCNN with spectral data; (b) Basic1DCNN with spatial data; (c) Basic1DCNN with fused spectral–spatial data; (d) Stacked1DCNN with spectral data; (e) Stacked1DCNN with spatial data; (f) Stacked1DCNN with fused spectral–spatial data. The numbers 1–7 represent the following classes: 1 = normal; 2 = decay; 3 = mechanical damage; 4 = wormhole; 5 = common scab; 6 = black scurf; 7 = frostbite.
Figure 10. Comparison of confusion matrices of Basic1DCNN and Stacked1DCNN with different input data. (a) Basic1DCNN with spectral data; (b) Basic1DCNN with spatial data; (c) Basic1DCNN with fused spectral–spatial data; (d) Stacked1DCNN with spectral data; (e) Stacked1DCNN with spatial data; (f) Stacked1DCNN with fused spectral–spatial data. The numbers 1–7 represent the following classes: 1 = normal; 2 = decay; 3 = mechanical damage; 4 = wormhole; 5 = common scab; 6 = black scurf; 7 = frostbite.
Agriculture 16 00077 g010
Figure 11. Stacked1DCNN Precision–recall curves under different input types. (a) Spectral input; (b) spatial input; (c) fused spectral–spatial input.
Figure 11. Stacked1DCNN Precision–recall curves under different input types. (a) Spectral input; (b) spatial input; (c) fused spectral–spatial input.
Agriculture 16 00077 g011
Table 1. Statistical analysis of seed potato sample dataset.
Table 1. Statistical analysis of seed potato sample dataset.
TypeLabelDataset SizeTraining SizeValidation Size
Normal125218963
Decay221916455
Mechanical damage322416856
Wormhole421816355
Common scab524718562
Black scurf622616957
Frostbite723317459
Table 2. Model training platform configuration.
Table 2. Model training platform configuration.
ComponentSpecification
Operating SystemWindows 11 Pro
Processor (CPU)13th Gen Intel Core i7-13700KF @ 3.40 GHz
Graphics Card (GPU)NVIDIA GeForce RTX 4080 (16 GB)
Programming LanguagePython 3.10.15
Deep Learning FrameworkPyTorch 2.3.0
Compute FrameworkCUDA 11.8
Table 3. PLS-DA detection results using different preprocessing methods.
Table 3. PLS-DA detection results using different preprocessing methods.
MethodOA (%)Pr (%)Re (%)F1 (%)
None87.8890.0983.6385.33
SG88.1390.6984.0585.73
SNV86.8787.0381.6482.99
MSC86.3685.3380.3680.41
OSC89.3992.0484.7186.61
FD87.3787.5981.8183.01
SD85.1484.3780.9381.42
Table 4. Comparison of classification performance of each machine learning model under different feature wavelength selecting methods.
Table 4. Comparison of classification performance of each machine learning model under different feature wavelength selecting methods.
ModelFeature Wavelength Selecting MethodsEvaluation Metrics
OA (%)Pr (%)Re (%)F1 (%)Training Time (s)
SVMNone91.6791.3887.6088.496.89
SPA92.9392.3189.2589.664.27
CARS87.1289.1882.1483.135.55
ELMNone94.1993.0391.2691.802.25
SPA91.4190.0687.4688.341.85
CARS94.4494.6291.6892.661.37
PLS-DANone89.3992.0484.7186.618.50
SPA87.6391.1682.2384.113.21
CARS89.9092.5985.4187.333.49
RFNone82.5884.9575.4173.792.74
SPA81.0671.6573.3169.500.92
CARS82.5884.9875.6474.090.89
KNNNone86.8786.6081.8583.104.09
SPA88.6487.6784.0284.883.36
CARS87.1287.1682.3683.602.51
Table 5. Optimized hyperparameter settings of traditional machine learning models.
Table 5. Optimized hyperparameter settings of traditional machine learning models.
MethodHyperparameters
SVMC = 10, γ = 1, kernel type: rbf
ELMNumber of hidden neurons: 38; Activation function: ReLU
PLS-DANumber of latent variables: 9
RFNumber of trees: 100; Maximum depth of each tree: 5; Minimum split samples: 2; Minimum samples per leaf: 1
KNNNumber of neighbors: 24; Weighting scheme of neighbors: uniform; Distance metric: Euclidean distance
Table 6. Performance comparison of different 1DCNN models on multiple input data types.
Table 6. Performance comparison of different 1DCNN models on multiple input data types.
ModelInput DataEvaluation Metrics
OA (%)Pr (%)Re (%)F1 (%)mAP (%)Training Time (s)Parameters (K)
Basic1DCNNSpectral data94.1094.3393.8893.9197.2432.34238.25
Spatial data69.0470.4468.5267.3173.5032.7182.60
Spectral–spatial fusion data97.0597.0496.9396.9499.0532.39311.98
Stacked1DCNNSpectral data96.0796.1095.8695.8598.5743.541015.34
Spatial data75.1876.0774.8874.8280.7141.06392.74
Spectral–spatial fusion data98.7798.7798.9398.7399.6642.371310.25
Table 7. Comparison with recent deep-learning-based methods for detection in potatoes and other agricultural products.
Table 7. Comparison with recent deep-learning-based methods for detection in potatoes and other agricultural products.
ReferenceData TypeModelNo. of ClassesPerformance
Yang et al., 2023 [47]MSI spectraMDDNet6mAP = 90.26%
Hu et al., 2023 [48]HSI spectra1DCNN5OA = 94.00%
Guo et al., 2024 [8]Vis/NIR spectraImproved ResNet2OA = 97.1%
Li et al., 2024 [49]NIR spectraResNet2OA = 97.65%
Deb et al., 2025 [50]RGB imageVGG165OA = 96.3%
Liu et al., 2025 [51]HSI spectra1DCNN5OA = 91.11%
Jiang et al., 2025 [52]HSI spectral–spatial fusionLBP-CNN4OA = 95.6%
This studyHSI spectral–spatial fusionStacked1DCNN7OA = 98.77%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hao, M.; Cao, X.; Sun, J.; Sun, Y.; Wang, J.; Zhang, H. Detection of External Defects in Seed Potatoes Using Spectral–Spatial Fusion of Hyperspectral Images and Deep Learning. Agriculture 2026, 16, 77. https://doi.org/10.3390/agriculture16010077

AMA Style

Hao M, Cao X, Sun J, Sun Y, Wang J, Zhang H. Detection of External Defects in Seed Potatoes Using Spectral–Spatial Fusion of Hyperspectral Images and Deep Learning. Agriculture. 2026; 16(1):77. https://doi.org/10.3390/agriculture16010077

Chicago/Turabian Style

Hao, Min, Xingtai Cao, Jianying Sun, Yupeng Sun, Jiaxuan Wang, and Hao Zhang. 2026. "Detection of External Defects in Seed Potatoes Using Spectral–Spatial Fusion of Hyperspectral Images and Deep Learning" Agriculture 16, no. 1: 77. https://doi.org/10.3390/agriculture16010077

APA Style

Hao, M., Cao, X., Sun, J., Sun, Y., Wang, J., & Zhang, H. (2026). Detection of External Defects in Seed Potatoes Using Spectral–Spatial Fusion of Hyperspectral Images and Deep Learning. Agriculture, 16(1), 77. https://doi.org/10.3390/agriculture16010077

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop