Non-Destructive Detection of External Defects in Potatoes Using Hyperspectral Imaging and Machine Learning

Zhao, Ping; Wang, Xiaojian; Zhao, Qing; Xu, Qingbing; Sun, Yiru; Ning, Xiaofeng

doi:10.3390/agriculture15060573

Open AccessArticle

Non-Destructive Detection of External Defects in Potatoes Using Hyperspectral Imaging and Machine Learning

by

Ping Zhao

^*

,

Xiaojian Wang

,

Qing Zhao

,

Qingbing Xu

,

Yiru Sun

and

Xiaofeng Ning

College of Engineering, Shenyang Agricultural University, Shenyang 110866, China

^*

Author to whom correspondence should be addressed.

Agriculture 2025, 15(6), 573; https://doi.org/10.3390/agriculture15060573

Submission received: 28 January 2025 / Revised: 26 February 2025 / Accepted: 6 March 2025 / Published: 7 March 2025

(This article belongs to the Special Issue Agricultural Products Processing and Quality Detection)

Download

Browse Figures

Review Reports Versions Notes

Abstract

For potato external defect detection, ordinary spectral technology has limitations in detail detection and processing accuracy, while the machine vision method has the limitation of a long feedback time. To realize accurate and rapid external defect detection for red-skin potatoes, a non-destructive detection method using hyperspectral imaging and a machine learning model was explored in this study. Firstly, Savitzky–Golay (SG), standard normal variate transformation (SNV), multiplicative scatter correction (MSC), the normalization algorithm, and different preprocessing algorithms combined with SG were used to preprocess the hyperspectral data. Then, principal component regression (PCR), support vector machine (SVM), partial least squares regression (PLSR), and least squares support vector machine (LSSVM) algorithms were used to establish quantitative models to find the most suitable preprocessing algorithm. The successive projections algorithm (SPA) was used to obtain various characteristic wavelengths. Finally, the qualitative models were established to detect the external defects of potatoes using the machine learning algorithms of backpropagation neural network (BPNN), k-nearest neighbors (KNN), classification and regression tree (CART), and linear discriminant analysis (LDA). The experimental results showed that the SG–SNV fusion hyperspectral data preprocessing algorithm and the KNN machine learning model were the most suitable for the detection of external defects in red-skin potatoes. Moreover, multiple external defects can be detected without multiple models. For healthy potatoes, black/green-skin potatoes, and scab/mechanical-damage/broken-skin potatoes, the detection accuracy was 93%,93%, and 83%, which basically meets the production requirements. However, enhancing the prediction accuracy of the scab/mechanical-damage/broken-skin potatoes is still a challenge. The results also demonstrated the feasibility of using hyperspectral imaging technology and machine learning technology to detect potato external defects and provided new insights for potato external defect detection.

Keywords:

hyperspectral imaging technique; machine learning; external defect detection; red-skin potato

1. Introduction

Potatoes have several advantages, including cold tolerance, drought tolerance, barren tolerance, strong adaptability, high and stable yield, and comprehensive nutrition; therefore, they are cultivated worldwide. Potatoes have been identified as the fourth major staple crop after rice, corn, and wheat, as well as the main food crop in more than 75% of national-level poverty-stricken counties [1]. In 2015, China proposed the strategy for promoting potatoes as a staple food. This strategy is an important step toward ensuring food security and alleviating poverty. Identifying and sorting defective potatoes are indispensable steps for their deep processing and storage. Defective potatoes must be sorted out before storing in the cellar; otherwise, large-scale decay will occur, which will lead to economic losses and waste of resources. With the development and demand for intelligent sorting, it is crucial to accurately and quickly identify defective potatoes. Therefore, exploring rapid, efficient, and feasible methods for detecting defective potatoes is of great significance.

At present, the detection methods for potato external defects mainly include near-infrared spectroscopy and visual detection technology. Dimas Firmanda Al Riza et al. [2] successfully detected scab disease and mechanical damage in potatoes by using single and multi-spectral imaging techniques in the near-infrared region and enhancing the contrast between defective and normal areas through image preprocessing and false color conversion techniques. Al-Riza [3] adopted a dual CCD camera system, combined with principal component analysis (PCA) and principal component false-color image segmentation strategy, accurately identified the external defects of potatoes, and the segmentation accuracy reached 64%. Yu Yang et al. [4] proposed a multi-type defect detection network (MDDNet) based on multi-spectral imaging (MSI) and an improved YOLOv3-tiny model, which was used to identify 428 samples of potatoes as either defect-free or defective (sprouting, common scab, worm eye, dry rot, and bruising), with an average accuracy of 90.26%. Li Xiaoyu [5] established potato scab disease detection models using a support vector machine detection method based on machine vision technology and near-infrared spectroscopy technology, and the detection rates for potatoes in the test set were 89.17% and 91.67%, respectively. These research results have some significance for further research, but there are some limitations in practical application. Ordinary spectrum technology has only spectral information but no image information. This limitation results in lower collection efficiency and identification accuracy [6], which makes it applicable to local measurement, but it is difficult to apply to complex actual scenes of multiple external defects in potatoes. Visual detection technology is more easily affected by illumination conditions and usually requires a large amount of sample data for training and experimentation, which results in complex algorithms and a long feedback time. Thus, accuracy is not easy to guarantee in actual production, and it is difficult to achieve real-time detection.

Additionally, most existing studies have focused on yellow-skin potatoes. The external defects of red-skin potatoes (especially black and green skin) are significantly different in optical properties. Therefore, it is a challenge to detect these defects using existing detection methods.

In recent years, because of the advantages of hyperspectral imaging technology combined with spectral analysis and image detection, a number of studies have been performed on non-destructive experiments for assessing fruit and vegetable quality. Yuanyuan Shao [7] used hyperspectral imaging technology combined with PLS-DA and LDA models to classify healthy, frostbitten, and diseased sweet potatoes. Through characteristic wavelength extraction and optimization, the SPA-LDA model achieved 99.52% accuracy on the prediction set. Mohammad Akbar Faqeerzada et al. [8] developed a polarizing hyperspectral imaging (HSI) system, which combined PLS-DA, the continuous projection algorithm (SPA), and the improved watershed segmentation algorithm (IWSA) to achieve rapid detection of pepper powder fruit defects, with a multi-defect detection accuracy of 91.3%. Tanjima Akter [9] used the HSI system in the range of 400–2500 nm, combined with the PLS-DA model, to achieve defect detection of apples and pears in visible near-infrared (VNIR) and short-wave-infrared (SWIR) bands, with classification accuracies of 97.5% and 100%, respectively. K. S. Shanthini et al. [10] used Vis-NIR hyperspectral imaging technology combined with support vector machine (SVM) and linear discriminant analysis (LDA) to achieve early detection and classification of strawberry bruises with an accuracy of 99.99%. In addition, Keresztes [11], Wang Hailong [12], Kaitlin M. Gold [13], and others also used hyperspectral technology to carry out research in the field of fruit and vegetable detection.

In hyperspectral imaging, machine learning models significantly improve the accuracy of distinguishing objects through automatic characteristic extraction, nonlinear modeling, data enhancement, ensemble learning, and transfer learning [14]. Compared to traditional methods, machine learning can automatically capture complex spectral and spatial characteristics, process high-dimensional data, optimize pre-processing steps, and improve classification performance by integrating multiple models and leveraging pre-trained models for more accurate and efficient target identification and classification [15].

Hyperspectral imaging technology, which combines traditional digital imaging technology with spectral analysis technology, has the advantages of high resolution and multiple bands and has higher sensitivity and accuracy in detecting potato external defects [16]. Image information of surface defects can be obtained through hyperspectral imaging [17], especially in distinguishing defect types of potatoes with different colors.

Hyperspectral imaging has been used to analyze the most relevant compounds, diseases, and stress factors in potatoes [18]. This study used hyperspectral imaging technology combined with machine learning models to detect five different external defects of potatoes: scab, black skin, broken skin, green skin, and mechanical damage. This study aims to explore an efficient and feasible method for detecting external defects in potatoes. The work presented here provides a foundation for the development of intelligent defect detection systems.

2. Materials and Methods

2.1. Experimental Samples and Their Characteristics

The experimental material, “Qingshu No. 9”, was taken from a potato planting base located in Jianping Town, Jianping County, Liaoning Province, China (119.73° E, 41.91° N). The external defects of potatoes have a significant impact on their quality and storage and require rapid and accurate detection and sorting to separate them from healthy potatoes. Six types of potatoes were selected, including healthy potatoes and potatoes with five external defects: scab disease, black skin, broken skin, green skin, and mechanical damage. Each of the six types of potatoes had 30 samples collected, totaling 180. Potato samples are shown in Figure 1.

2.2. Main Instrument and Equipment

Hyperspectral images were obtained using a hyperspectral imaging system that mainly consisted of a hyperspectral imager, camera, optical fiber halogen lights, black case, precise displacement controller, and data processor (PDP). The black case (120 × 50 × 140 cm) ensured that the sample was not affected by external light during collection. The spectral wavelength range of the spectrometer was 400 nm to 1100 nm, the number of spectral bands was 472, the spectral resolution (30 µm slit) was 2.8 nm, the spectrometer adopted transmission grating splitting, the dispersion was 97.5 nm/mm, the spatial resolution spot diameter was less than 9 µm, and the stable output halogen light source was 21 V/200 W. In addition, a double-branch linear light guide pipeline was used. The acquisition mode of the hyperspectral spectrometer was push-sweep (line scan), and the hyperspectral image was acquired by moving the displacement platform under the condition of a fixed light source. HSI Analyzer 1.0 software was used for spectrum acquisition.

Before acquiring hyperspectral image data, the hyperspectral imaging system was preheated for 30 min. In addition, to ensure the stability of the sampling system and the quality of the obtained images, the intensity of the light source, the exposure time of the hyperspectral camera, and the object distance were adjusted to ensure a sharp image. The speed of the transmission device was also set to avoid distortion of the spatial resolution of the image. To avoid the impact of parameters such as sample moisture content on the results, the sealed samples were collected for half a day. Hyperspectral images were corrected with black-and-white negatives and a sample blackboard, and after several fine-tunings, the intensity of the light source was set to 1. The camera exposure time was set to 29 ms, the transmission device X-Speed was set to 1.32 mm·s⁻¹, and the displacement platform moving speed was set to 0.9 mm·s⁻¹. Potato sample images were then collected, and hyperspectral images were obtained by moving the displacement platform and collecting data from the hyperspectral imager.

Hyperspectral images collected by the hyperspectral image acquisition system often contain large amounts of noise. Therefore, black-and-white plate correction must be performed on the original hyperspectral images collected to eliminate part of the noise and ensure that the information from the corrected image was closer to that from the sample [19]. In the collection of potato samples, the calibration image W was first acquired under full white (an image with spectral reflectance of 100% was obtained by scanning a standard white correction plate), and then the calibration image B was acquired under full black (an image with spectral reflectance of 0% in a full dark environment with a lens cover). Finally, hyperspectral images of the potato samples were collected. The original spectral image I was obtained. The black-and-white plate correction of the spectrum was completed according to Formula (1), and the absolute spectral image I acquired was transformed into the relative spectral image R (i.e., the calibrated spectral image) [20].

R = \frac{I - B}{W - B} \times D N

(1)

where W is the calibrated image of the whiteboard, B is the calibrated image of the blackboard, I is the original hyperspectral image, R is the calibrated hyperspectral image, and DN is the highest brightness value (1700).

2.3. Hyperspectral Image Correction and Data Extraction

Hyperspectral data were extracted using ENVI 5.6 in this study. Before extracting hyperspectral data, the region of interest (ROI) of the sample hyperspectral image should be determined. The potato original image were devided into grayval images of different wave length. Then, the grayval image with defect characteristics and the grayval image without defect characteristics were picked out. And the ROI region result would be calculted out by conducting image subtraction. In image subtraction process, the difference between images was obtained by calculating the difference between the image with and without defect characteristics at the pixel level. Taking the potato black skin defect in Figure 2 as an example, the images were decomposed and subtracted and the ROI region was extracted, where the white regions represent these regions contain defects.

In this experiment, 30 samples of healthy and defective potato samples were taken from each sample, and 5 sampling points were collected on each sample, totaling 150 sampling points, i.e., 150 repeated tests. The sampling area of each sampling point has an important impact on the quality of the hyperspectral data; too large an area may bring too much noise, whereas too small an area will cause statistical bias. Through repeated experiments, this study finally determined the sampling area of 25 pixels × 25 pixels (the red square in Figure 3) for each sampling point to ensure the reliability of statistical analysis. The potato hyperspectral data extracted within the ROI are shown in Figure 3. Then, by calculating the average value of all pixels, the average hyperspectral data for each sample were obtained [21]. The hyperspectral data were saved in ASCII format, and the noisy bands at the beginning and end of the hyperspectral data were eliminated, leaving 288 bands (550 nm to 920 nm). In the initial phase and near the end of the scanning process of the hyperspectrometer, significant stray light and noise interference were inevitably generated due to the influence of the test equipment (such as thermal noise and current noise). This interference led to the degradation of hyperspectral data quality, which was evident in the instability of spectral curves. Therefore, to ensure the reliability of the analysis results, the band interval data with relatively stable spectral curves were selected for further analysis.

2.4. Data Analysis Method

In order to detect the external defects in potatoes, this study first preprocessed the original hyperspectral data using a variety of algorithms. Next, by comparing the results of different preprocessing algorithms combined with four quantitative models, the optimal preprocessing algorithm for hyperspectral data was determined. Then, the characteristic bands were extracted from the hyperspectral data obtained by a better preprocessing algorithm, and the qualitative model of defect detection was established by using the hyperspectral data of the characteristic bands. Finally, a universal method for the detection of external defects in potatoes was obtained through experiments.

2.4.1. Hyperspectral Data Preprocessing

Before establishing a hyperspectral data model, the hyperspectral data must be preprocessed [22]. Hyperspectral data preprocessing is indispensable in the field of spectroscopy and essential for extracting useful information from complex spectral signals. In the study of a hyperspectral system for detecting external defects in potatoes, SG [23] was used to eliminate random noise in hyperspectral data, and MSC was used to correct baseline drift caused by sample surface inhomogeneity or particle scattering. SNV was used to eliminate spectral intensity differences caused by changes in light intensity or uneven sample surfaces, while the normalization algorithm [24] was used to eliminate spectral intensity differences caused by instrument response or environmental changes, making the data more suitable for comparison and analysis [25]. These methods are not only suitable for traditional spectroscopy research but also have broad application prospects in remote sensing, medical spectroscopy, and other fields [26]. When the particle distribution is not uniform or the particle size difference is large, the propagation path and scattering intensity of light in the sample change, resulting in spectral reflectance fluctuations, especially in the short-wave region. This scattering effect can mask the true spectral characteristics of the sample, increasing noise and uncertainty. Such processing was performed to remove noise, background interference, and other interference factors from hyperspectral data to improve the signal-to-noise ratio, accuracy, quality, and availability. The “availability” here refers to the higher quality of the pre-processed hyperspectral data, which is more suitable for subsequent modeling and analysis, thus improving the reliability and consistency of the experimental results.

2.4.2. Quantitative Model Establishing

In order to determine the optimal hyperspectral data preprocessing algorithm, SVM, PLSR, PCR, and LSSVM data prediction models were used to study the quantitative model of potato external defect detection. By comparing the reliability of hyperspectral data prediction after different preprocessing methods under four quantitative prediction models, this study aimed to find the optimal model and preprocessing algorithm for potato defect detection.

Dataset splitting

Before establishing a quantitative model for the raw hyperspectral data, the sample set needed to be divided into training and prediction sets in proper proportions. When analyzing the sample dataset, selecting suitable and effective samples for chemical modeling not only improves the accuracy of the model but also provides a more convenient method for the subsequent maintenance and updating of the model. Commonly used sample set partitioning methods include random sampling (RS), conventional selection (CS) [27], Kennard–Stone (KS) [28], and sample set partitioning by X–Y joint distance (SPXY) [29]. In this study, the KS method was used to calculate the difference between samples according to the Euclidean distance between samples. The KS algorithm not only reduces the calculation cost and accelerates the learning speed but also avoids overfitting. In the first step of the KS algorithm, all samples were treated as training sets, the Euclidean distance of the entire sample set was calculated, and the two samples with the largest Euclidean distance were selected as the training set. In the second step, the distance between the remaining and selected samples was calculated. Samples with the shortest distance were selected as the training set. After all the remaining samples were calculated, the sample corresponding to the longest distance among the shortest distances was selected as the training set. In the third step, the second step was repeated until the number of samples selected was equal to the number determined in advance. The Euclidean distance represents the true distance between two points in n-dimensional space or the natural length of a vector. The formula used was as follows:

d_{x} (p, q) = \sqrt{\sum_{j = 1}^{N} {[x_{p} (i) {- x}_{q} (j)]}^{2}} p, q \in [1, N]

(2)

where x_p and x_q represent two different samples, and N represents the number of spectral wave points of the samples [30].

In this study, 180 sets of hyperspectral data were extracted from 180 potatoes. Then, the KS algorithm was used to divide 180 sets of hyperspectral data into a training set and a verification set with a ratio of 7:3, namely, 126 sets were selected for the training set and 54 sets were selected for the verification set. To ensure the prediction effect of the test model, the cross-validation method was used to assess and refine the model effect.

Methods for establishing quantitative models

SVM, PLSR, PCR, and LSSVM were used to establish quantitative models [31]. These models were chosen because they are well suited for the regression task, guaranteeing the accuracy and reliability of the model through cross-validation and parameter optimization. PCA [32] is a statistical analysis method for data reduction and characteristic selection that aims to retain as much information as possible by reducing the dimensions of the data. The data after dimensionality reduction retain the information of the original variables, and there is no correlation between the variables. PCA achieves dimensionality reduction by projecting the original data into a new coordinate system to maximize the variance of the data in the new coordinate system.

Metrics to evaluate model performance

The determination coefficient (R²), root mean square error correction (RMSEC), root mean square prediction error (RMSEP), and relative prediction deviation (RPD) were used to evaluate the advantages and disadvantages of the constructed model. The evaluation index formula is (3–6). The closer R² is to 1, the higher the accuracy. The smaller the RMSEC and RMSEP values, the better the prediction ability of the model. The larger the RPD value, the better the prediction ability of the model.

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y_{i}})}^{2}}

(3)

where

y_{i}

is the actual value,

\hat{y_{i}}

is the predicted value, and

\bar{y_{i}}

is the mean of actual values.

R M S E C = \sqrt{\frac{\sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}{n}}

(4)

where

y_{i}

is the actual value,

\hat{y_{i}}

is the predicted value, and n is the number of samples in the calibration set.

R M S E P = \sqrt{\frac{\sum_{i = 1}^{m} {(y_{i} - \hat{y_{i}})}^{2}}{m}}

(5)

where

y_{i}

is the actual value,

\hat{y_{i}}

is the predicted value, and m is the number of samples in the prediction set.

R P D = \frac{S D}{R M S E P}

(6)

where SD is the standard deviation of the actual values in the prediction set, and RMSEP is the root mean square error of prediction.

2.4.3. Qualitative Model Establishment

In order to improve the efficiency of potato external defect detection and the feasibility of actual production, this study used the hyperspectral data of the optimal preprocessing algorithm proven by a quantitative model to carry out defect-detection research based on machine learning. However, the more spectral bands used for machine learning, the more complex and less efficient the model. Therefore, the characteristic wavelength was first extracted, and only the hyperspectral data of the characteristic wavelength were used to establish the qualitative model of defect detection.

Extraction characteristic wavelength

In this study, the successive projections algorithm (SPA) was used to extract the characteristic wavelength of hyperspectral data. The selection of a spectral characteristic wavelength typically involves projecting the wavelength onto other wavelengths, comparing the size of the projection vector, taking the maximum wavelength of the projection vector as the selected wavelength, and selecting the final characteristic wavelength according to the correction model.

Methods for establishing qualitative models

BPNN, CART, KNN, and LDA were used to establish qualitative models to detect the external defects of potatoes [33]. These models were chosen because they are effective for classification tasks, wherein the goal is to assign data points to specific classes based on spectral characteristics. The model was trained using the training set, and the parameters were adjusted to improve the performance of the model. The use of characteristic selection and dimensionality reduction techniques can enhance computational efficiency and the generalization ability of the model, ensuring that it can accurately classify new, unseen data.

2.4.4. Confusion Matrices

Confusion matrices [34] are a tool for evaluating the performance of classification models in machine learning, especially in supervised learning. They are mainly used to describe the relationship between the prediction results of the classification model and the actual sample class. The confusion matrix, in the form of a matrix, shows the number of samples predicted correctly and the number of samples predicted incorrectly in each category. Each column of the confusion matrix represents the prediction category, and the total of each column represents the total number of data predicted for that category; each row represents the true category to which the data belong, and the total number of data instances in each row represents the number of instances in that category.

3. Results

3.1. Spectral Preprocessing

The hyperspectral data of potatoes were extracted using ENVI5.6 and imported into Excel for editing. The full-band hyperspectral curve of potatoes was obtained by MATLAB. In the preview spectrogram, it was observed that the clutter noise was significant at 400–549 nm and 921–1100 nm, so the 550–920 nm band was selected for analysis and processing. The original average spectral curve in this range is shown in Figure 4. The hyperspectral data of six types of potatoes into three categories based on spectral curve characteristics: healthy potatoes, black/green-skin potatoes, and scab/mechanical-damage/broken-skin potatoes, which aligns with the actual production requirements for sorting potatoes based on external defects.

Unscrambler X10.4 software was used for preprocessing in this experiment. The SG, MSC, SNV, and normalization algorithms were used. Because there was a large amount of noise in the hyperspectral data, and the SG algorithm can smooth the noise, after comparing and analyzing other combined preprocessing methods, the SG algorithm and the other three preprocessing methods had the best effect. The kernel size used by the SG algorithm in this study was 11. Therefore, the SG, MSC, SNV, and normalization algorithms were selected to be combined with the SG algorithm for data preprocessing. This means that the SG algorithm was first used to perform preliminary smoothing for the data, and then the results of the preliminary smoothing were further processed using different algorithms to complete the data preprocessing process. The hyperspectral curves of the raw potato hyperspectral data processed by the SG, MSC, SNV, and normalization algorithms are shown in Figure 5, Figure 6 and Figure 7.

3.2. Quantitative Models

According to the method in Section 2.4.2, quantitative models of healthy, black/green-skin, and scab/mechanical-damage/broken-skin potatoes were established using Unscramble X10.4 software.

The optimal quantitative spectral prediction model for healthy potatoes is presented in Table 1. By comparing four types of spectral quantitative models for healthy potatoes, it was found that the hyperspectral data processed by SG–SNV combined with the PCR algorithm had the best effect, with an R² value of 0.9499, RMSEC of 0.0184, RMSEP of 0.0154, and RPD of 14.4893. The results showed that SG–SNV preprocessing combined with the PCA algorithm offers significant advantages in the spectral analysis of healthy potatoes.

The specific values of the hyperspectral quantitative model for black-green potatoes are listed in Table 2. After comparing SVM, PLSR, PCR, and LSSVM algorithms, the SG–SNV preprocessing model combined with the SVM algorithm had the best performance, with an R² value of 0.9942, RMSEC of 0.0071, RMSEP of 0.0193, and RPD of 13.2256. The R² of this model was close to 1, and the RPD value was high, indicating that its prediction ability was significantly better than other methods. Compared with the healthy potato model, the black/green-skin potato model had a higher R² but slightly larger RMSEP, which may be due to the more complex spectral characteristics of black/green-skin potatoes, yet still showed excellent predictive performance overall.

The optimal spectral quantitative model for the scab/mechanical-damage/broken-skin potatoes is shown in Table 3. In the spectral quantitative prediction of scab/mechanical-damage/broken-skin potatoes, the model of SG–SNV preprocessing combined with the LSSVM algorithm had the best performance, with an R² value as high as 0.9970, an RMSEC value of 0.0936, RMSEP of 0.0317, and RPD of 18.3182. The R² of this model was close to 1, the RMSEC and RMSEP values were extremely low, and the RPD value was significantly higher than those of other models, indicating its strong prediction ability for complex defects. Compared with the healthy potato and black/green-skin potato models, the RPD value of this model was significantly improved, indicating that it is better at interpreting hyperspectral data when dealing with complex defects.

Compared to previous studies, the R², RMSEC, RMSEP, and RPD values of these models were significantly optimized, especially when dealing with complex defects, showing higher prediction accuracy and stability. Future research can further optimize the preprocessing methods and algorithms to improve the generalization ability and applicability of the model.

In a quantitative analysis model using hyperspectral techniques to detect external defects in potatoes, PCA selected the key bands by the criteria of variance maximization and eigenvalue ranking and preferentially retained the bands with the largest variance to capture the most information. The actual measurement index was mainly spectral reflectance. The number of bands in PCA was reduced from the initial 228 by two principal components after PCA. The criteria for selecting bands in PCA were based on variance maximization and eigenvalue ordering, with the principal component with the largest variance being retained first. The training set (blue dots) was used for model training, and the prediction set (red dots) was used for verification. The higher the coincidence degree of red and blue dots and their proximity to the baseline indicates that the model prediction was accurate. The closer the data points are to the baseline, the greater the variance of the model interpretation, which indicates that the model is more effective for detecting defects. The scatterplots of variance for PCA of three types of potatoes are shown in Figure 8.

According to the prediction results of the three types of potatoes and the PCA analysis results of the above different data prediction models, it can be seen that for the prediction of the three types of potatoes, the PCR, SVM, and LSSVM data prediction methods can be used to predict the data preprocessed by the SG–SNV algorithm, achieving the best prediction results. Therefore, the hyperspectral data processed by the SG–SNV preprocessing algorithm can be used for subsequent defect detection research.

3.3. Qualitative Models

This study adopts the hyperspectral data preprocessed by the SG–SNV algorithm proved above to be suitable for data prediction to conduct defect detection research based on machine learning. Characteristic hyperspectral data were extracted using MATLAB R2020b. Characteristic hyperspectral data refers to the characteristic pattern of light absorbed, emitted, or scattered by a substance over a specific wavelength range. The characteristic hyperspectral data, combined with a qualitative model, can be used to identify potato external defects. The hyperspectral data used in this study had 288 wavelength points and contained a large number of spectral variables, including redundant and collinear variables, which needed to be optimized to obtain the most effective spectral information. The SPA algorithm selected the combination of variables with the least redundant information and collinearity. During the projection of multi-dimensional hyperspectral data, a new spectral distribution matrix was generated, and the characteristic wavelengths of these three types of potatoes were selected based on the redundant and collinear variables. The characteristic wavelengths selected results are shown in Figure 9. In Figure 9, the horizontal axis of small rectangles represents the selected characteristic wavelength, while the vertical axis of them represents the variable index value. The variable index value indicates the specific position of each spectral wavelength in the spectral distribution matrix, which only reflects the mathematical projection relationship but does not have direct physical significance. The characteristic wavelengths of hyperspectral data of three types of potatoes after SG–SNV preprocessing were 550, 610, 667, 768, 781, 790, 803, 861, 865, 867, 882, and 883 nm for healthy potatoes; 565, 597, 661, 684, 723, 772, 798, 854, 857, 859, 862, 887, and 891 nm for black/green-skin potatoes; and 550, 668, 701, 753, 831, 840, 857, 859, 864, 867, 868, 872, 877, and 880 nm for scab/mechanical-damage/broken-skin potatoes.

This study successfully distinguished the external defects of different potatoes by establishing qualitative models based on characteristic wavelengths extracted from hyperspectral data. Results show that the qualitative models based on different algorithms show good detection performance in both the training and test sets, providing reliable technical support for the classification of potato external defects.

In the healthy potato model, the CART algorithm showed high generalization ability, and the detection rate of the test set reached 100%, which indicates that the model can accurately identify healthy samples. However, the detection rate of the training set was relatively low (66.7%), suggesting that the model may have a certain risk of overfitting, and its robustness should be further improved by increasing sample diversity or optimizing algorithm parameters. In the black/green-skin potato model, the KNN algorithm showed a high detection rate (71.4% and 94.3%, respectively) on the training set and the test set, indicating that the model has good classification performance and generalization ability. This result is consistent with previous studies, further validating the effectiveness of the KNN algorithm in hyperspectral data classification. In the scab/mechanical-damage/broken-skin potato model, the BPNN algorithm achieved the best classification effect, and the detection rate of the training set and the test set reached 75.5% and 93.1%, respectively. This result shows that the BPNN algorithm has strong characteristic extraction and classification ability when dealing with complex defect types, but its detection rate of the training set still has room to improve, and further optimization of the network structure or introduction of more characteristic information may be needed. The results of the training set of LDA for the three types of potato qualitative models were all lower than 64%, and the effect was poor. The three qualitative discriminant models with the highest detection rates are shown in Figure 10.

3.4. Experimental Verification and Result Analysis

In order to come up with a clear method to distinguish potato external defects, a systematic comparative analysis of the above models was carried out. We explored and screened out a general method suitable for healthy potatoes, black/green-skin potatoes, and scab/mechanical-damage/broken-skin potatoes, so that a universal detection scheme can be obtained. By using the SG–SNV combined preprocessing method and SPA to extract characteristic wavelength, three optimal models of CART, KNN, and BPNN were used to test and verify the three groups of potatoes. Due to the poor effect of the LDA model, the LDA model was eliminated. Each group selected 30 potatoes, and the results were displayed in the confusion matrix.

The confusion matrix chart is used to observe the classification results of three types of potato samples based on the three optimal models. As shown in Figure 11a–c, the numbers on the diagonal from the upper left to the lower right represent the number of samples that were correctly predicted. For example, in (a), all 30 healthy potatoes were correctly predicted, 21 black/green-skin potatoes were correctly predicted, and 24 potatoes with scab/mechanical damage/broken skin were correctly predicted. It can be seen that most samples were predicted correctly, which indicates that the established model has good performance in identifying external defects of potatoes. The prediction accuracy of the three experimental verification models is shown in Figure 11d. The prediction accuracy rate of the experimental verification model is almost the same as that of the qualitative model, indicating that the model has good robustness. After comparing and analyzing the prediction results of all qualitative models, it can be concluded that the qualitative model based on the KNN algorithm has the most stable and good effect. However, the identification of a small number of samples was wrong, so the ability of the model to identify different damage properties of potatoes needs to be further improved.

4. Discussion

It can be concluded from the experimental results that hyperspectral imaging technology has high accuracy in the nondestructive detection of external defects in red-skin potatoes, and established quantitative models and qualitative models are reliable. Compared with previous studies mainly focused on yellow-skin potatoes, this study systematically explored the application of hyperspectral imagery technology in the detection of external defects in red potatoes, which further confirmed the universality of hyperspectral imaging technology in potato defect detection.

In terms of methodology, this study established and compared preprocessing models based on SG, SNV, MSC, and normalization algorithms, proposed a characteristic wavelength extraction model with the SPA algorithm, and designed qualitative analysis models for defect detection with different machine learning models such as CART, KNN, and BPNN. Which are similar to the previous research methods [4,5,16] in yellow-skin potatoes. However, in previous studies on external defect detection of potatoes, independent models are usually established for each different type of damage or defect. Although different types of potato defects may require different preprocessing methods and different machine learning modeling methods to accurately detect these defects, the excessive number of models makes it difficult to meet the demand for efficient detection of external defects in potatoes in practical applications using hyperspectral imaging technology.

By combining the models in each step of potato external defect detection mentioned above, it can be determined that the adoption of the SG–SNV combined preprocessing model, the SPA characteristic wavelength extraction model, and the KNN quantitative model allow the detection accuracy to reach 93%, 93%, and 83% for healthy potatoes, black/green-skin potatoes, and scab/mechanical-damage/broken-skin potatoes, respectively. The experimental results in Figure 11 show that the model proposed in this study has a higher detection accuracy for three types of synthesis than other models. The model combination in this study realizes a complete defect recognition process, which can simplify the recognition process and improve efficiency. It is more suitable for practical production applications compared to previous research.

However, it is worth noting that the constructed quantitative model shows high prediction accuracy (R² > 0.99), there may be a risk of model overfitting, which needs to be further verified in practical applications by expanding the sample size and increasing the validation steps. Also, better results were achieved under laboratory conditions, there may be more cases of targets classification failure in the actual production environment. due to sample diversity and environmental complexity, such as light conditions and potato surface humidity may affect the accuracy of spectral measurements.

Future research should focus on verifying the feasibility of the technology in real production environments and exploring more robust algorithms to reduce the impact of environmental factors. At the same time, expanding the sample size and increasing the defect types will help to improve the generalization ability of the model. These improvements will lay a solid foundation for the practical application of hyperspectral imagery technology in the potato industry, thus promoting the further development of the red-skin potato industry.

5. Conclusions

To solve the problems of low efficiency and accuracy currently existing in ordinary spectroscopy and machine vision for potato external defect detection, the hyperspectral imaging technology and machine learning algorithm were used in this study to achieve nondestructive detection of healthy potatoes and external defects such as scab, black skin, broken skin, green skin, and mechanical damage of red-skin potatoes. The detection experimental results have verified the feasibility and accuracy of the recognition model proposed in this study. The main conclusions and contributions of this study are as follows.

By comparing and analyzing the hyperspectral data processing methods of healthy potatoes and potatoes with different external defects, this study proposes a hyperspectral data preprocessing model based on SG–SNV combined algorithm, characteristics wavelengths extracting model based on SPA algorithm, and qualitative analyzing model based on KNN machine learning method. By combining the models of these three parts, a complete detection method and model for potato external defects was established. Experimental results show that the combined method proposed in this study can achieve a detection accuracy of 93%, 93%, and 83% for healthy potatoes, black/green-skin potatoes, and scab/mechanical-damage/broken-skin potatoes, respectively.

It can be concluded that the potato external defect detection method and model proposed in this study have high accuracy and are more suitable for practical production. Further evaluation is still needed by expanding the sample dataset size and validating it in the actual production environment. Also, future research should be focused on combining more features or introducing more advanced algorithms such as deep learning to further improve accuracy and generality.

Above all, the study results provided a theoretical basis for the use of hyperspectral imaging technology in potato defect detection equipment designing and its application, which also provides references for quality detection of other varieties of potatoes, fruits, and vegetables using hyperspectral imaging technology and machine learning.

Author Contributions

Conceptualization, P.Z. and X.W.; data curation, Q.Z. and X.N.; formal analysis, X.W.; investigation, X.W., Y.S. and X.N.; project administration, P.Z.; validation, Q.X.; writing—original draft, P.Z., X.W, Q.Z., Q.X. and Y.S.; writing—review and editing, Q.Z., Q.X. and Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Basic Research Program Project of Liaoning Province of China (2023JH2/101300117).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

All applicable data are published and referenced in the paper.

Acknowledgments

We would like to thank the Agrotechnical Extension Center of Jianping County of Liaoning Province in China, for providing test materials.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

SG	Savitzky–Golay
MSC	multiplicative scatter correction
SNV	standard normal variate transformation
SVM	support vector machine
PLSR	partial least squares regression
PCA	principal component analysis
PCR	principal component regression
LSSVM	least squares support vector machine
BPNN	backpropagation neural network
CART	classification and regression tree
KNN	k-nearest neighbors
LDA	linear discriminant analysis
RMSEC	root mean squared error of calibration
RMSEP	root mean squared error of prediction
RPD	relative prediction deviation
SPA	successive projections algorithm
RS	random sampling
PLS-DA	partial least squares discriminant analysis
MDDNet	multi-type defect detection network
HSI	hyperspectral imaging
MSI	multi-spectral imaging
IWSA	improved watershed segmentation algorithm
VNIR	visible near infrared
SWIR	short wave infrared
CS	conventional selection
KS	Kennard–Stone
SPXY	sample set partitioning based on joint x–y distance
ROI	region of interest

References

Ke, J.; Yang, B.; Jiao, D.; Yang, L.; Zhou, J. Current situation and countermeasures of potato mechanization production in China. South China Agric. 2017, 11, 71–72. [Google Scholar] [CrossRef]
Riza, D.F.A.; Widodo, S.; Yamamoto, K.; Ninomiya, K.; Suzuki, T.; Ogawa, Y.; Kondo, N. External defects and severity level evaluation of potato using single and multispectral imaging in near infrared region. Inf. Process. Agric. 2024, 11, 80–90. [Google Scholar] [CrossRef]
Al Riza, D.F.; Suzuki, T.; Ogawa, Y.; Kondo, N. External Defects and Soil Deposits Identification on Potato Tubers using 2CCD Camera and Principal Component Images. Ind. J. Teknol. Manaj. Agroindustri 2023, 12, 13. [Google Scholar] [CrossRef]
Yu, Y.; Zhenfang, L.; Min, H.; Qibing, Z.; Xin, Z. Automatic detection of multi-type defects on potatoes using multispectral imaging combined with a deep learning model. J. Food Eng. 2023, 336, 111213. [Google Scholar] [CrossRef]
Li, X.; Tao, H.; Gao, H.; Li, P.; Huang, T.; Ren, J. Nondestructive detection method of potato scab based on multi-sensor information fusion technology. Trans. Chin. Soc. Agric. Eng. 2013, 29, 277–284. [Google Scholar] [CrossRef]
Cozzolino, D. Advantages and limitations of using near infrared spectroscopy in plant phenomics applications. Comput. Electron. Agric. 2023, 212, 108078. [Google Scholar] [CrossRef]
Yuanyuan, S.; Yi, L.; Guantao, X.; Yukang, S.; Quankai, L.; Zhichao, H. Detection and analysis of sweet potato defects based on hyperspectral imaging technology. Infrared Phys. Technol. 2022, 127, 104403. [Google Scholar] [CrossRef]
Faqeerzada, M.A.; Kim, Y.N.; Kim, H.; Akter, T.; Kim, H.; Park, M.S.; Kim, M.S.; Baek, I.; Cho, B.K. Hyperspectral imaging system for pre- and post-harvest defect detection in paprika fruit. Postharvest Biol. Technol. 2024, 218, 113151. [Google Scholar] [CrossRef]
Akter, T.; Faqeerzada, M.A.; Kim, Y.; Pahlawan, M.F.R.; Aline, U.; Kim, H.; Kim, H.; Cho, B.-K. Hyperspectral imaging with multivariate analysis for detection of exterior flaws for quality evaluation of apples and pears. Postharvest Biol. Technol. 2025, 223, 113453. [Google Scholar] [CrossRef]
Shanthini, K.S.; Francis, J.; George, S.N.; George, S.; Devassy, B.M. Early bruise detection, classification and prediction in strawberry using Vis-NIR hyperspectral imaging. Food Control 2025, 167, 110794. [Google Scholar] [CrossRef]
Keresztes, J.C.; Diels, E.; Goodarzi, M.; Nguyen-Do-Trong, N.; Goos, P.; Nicolai, B.; Saeys, W. Glare based apple sorting and iterative algorithm for bruise region detection using shortwave infrared hyperspectral imaging. Postharvest Biol. Technol. 2017, 130, 103–115. [Google Scholar] [CrossRef]
Wang, H.; Yang, X.; Zhang, C.; Guo, D.; Bao, Y.; He, Y.; Liu, F. Fast Identification of Transgenic Soybean Varieties Based Near Infrared Hyperspectral Imaging Technology. Spectrosc. Spectr. Anal. 2016, 36, 1843–1847. [Google Scholar] [CrossRef]
Gold, K.M.; Townsend, P.A.; Herrmann, I.; Gevens, A.J. Investigating potato late blight physiological differences across potato cultivars with spectroscopy and machine learning. Plant Sci. 2020, 295, 110316. [Google Scholar] [CrossRef] [PubMed]
Chao, Q.; Murilo, S.; Jesper, C.W.; Ea, H.R.S.; Merethe, B.; Erik, A.; Junfeng, G. In-field classification of the asymptomatic biotrophic phase of potato late blight based on deep learning and proximal hyperspectral imaging. Comput. Electron. Agric. 2023, 205, 495. [Google Scholar] [CrossRef]
Li, Q.; Fu, X.; Li, H.; Zhou, H. Advancing County-Level Potato Cultivation Area Extraction: A Novel Approach Utilizing Multi-Source Remote Sensing Imagery and the Shapley Additive Explanations–Sequential Forward Selection–Random Forest Model. Agriculture 2025, 15, 92. [Google Scholar] [CrossRef]
Zhao, M.; Liu, Z.; Zou, X.; Wu, L.; Zhang, F.; Long, J. Detection of defects on potatoes by hyperspectral imaging technology. Laser J. 2016, 37, 20–24. [Google Scholar] [CrossRef]
He, L.; Pan, Q.; Di, W.; Li, Y. Research Advance on Target Detection for Hyperspectral Imagery. Acta Electron. Sin. 2009, 37, 2016–2024. [Google Scholar] [CrossRef]
Peraza-Alemán, C.M.; López-Maestresalas, A.; Jarén, C.; Rubio-Padilla, N.; Arazuri, S. A Systematized Review on the Applications of Hyperspectral Imaging for Quality Control of Potatoes. Potato Res. 2024, 67, 1539–1561. [Google Scholar] [CrossRef]
Morales, A.; Horstrand, P.; Guerra, R.; Leon, R.; Ortega, S.; Díaz, M.; Melián, J.M.; López, S.; López, J.F.; Callico, G.M.; et al. Laboratory Hyperspectral Image Acquisition System Setup and Validation. Sensors 2022, 22, 2159. [Google Scholar] [CrossRef]
Rahman, M.H.; Busby, S.; Ru, S.; Hanif, S.; Saez, A.S.; Zheng, J.; Rehman, T.U. Transformer-Based hyperspectral image analysis for phenotyping drought tolerance in blueberries. Comput. Electron. Agric. 2025, 228, 109684. [Google Scholar] [CrossRef]
Sun, J.; Jin, X.; Mao, H.; Wu, X.; Yang, N. Application of hyperspectral imaging technology for detecting adulterate rice. Trans. Chin. Soc. Agric. Eng. 2014, 30, 301–307. [Google Scholar] [CrossRef]
Leila, L.; Amir, D.M.; Asim, B.; Shahrokh, F.; Thomas, S. Spectral prediction of soil salinity and alkalinity indicators using visible, near-, and mid-infrared spectroscopy. J. Environ. Manag. 2023, 345, 118854. [Google Scholar] [CrossRef]
Wang, W.; Feng, W.; Chang, N.; Liu, Q.; LIi, Z.; Chen, Y.; Li, C.; Chen, X.; Zhang, Y. Prediction of chlorophyll content in flue-cured tobacco based on spectral pretreatment and machine learningalgorithm. Soil Fertil. Sci. China 2023, 3, 194–201. [Google Scholar] [CrossRef]
Diwu, P.; Bian, X.; Wang, Z.; Liu, W. Study on the Selection of Spectral Preprocessing Methods. Spectrosc. Spectr. Anal. 2019, 39, 2800–2806. [Google Scholar] [CrossRef]
Yu, G.J.; Zheng, W.D.; Xiang, W.; Cheng, T.J.; Tong, X.J.; Ping, Z.; Feng, N.X. Detection of Powdery Mildew of Bitter Gourd Based on NIR/Fluorescence Spectra. J. Biosyst. Eng. 2023, 48, 319–328. [Google Scholar] [CrossRef]
Sujitra, F.; Chanat, T.; Parichat, T.; Sila, K. Development of new fruit quality indices through aggregation of fruit quality parameters and their predictions using near-infrared spectroscopy. Postharvest Biol. Technol. 2023, 204, 112438. [Google Scholar] [CrossRef]
Dong, X.; Michele, K.; Busolo, W.; Karl, H.; James, C. Development of robust quantitative methods by near-infrared spectroscopy for rapid pharmaceutical determination of content uniformity in complex tablet matrix. Analyst 2009, 134, 1405–1415. [Google Scholar] [CrossRef]
Morais, C.L.M.; Santos, M.C.D.; Lima, K.M.G.; Martin, F.L. Improving data splitting for classification applications in spectrochemical analyses employing a random-mutation Kennard-Stone algorithm approach. Bioinformatics 2019, 35, 5257–5263. [Google Scholar] [CrossRef]
Cai, Y.; Ma, X.; Huang, B.; Zhang, R.; Wang, X. LIBS combined with SG-SPXY spectral data pre-processing for cement raw meal composition analysis. Appl. Opt. 2024, 63, A24–A31. [Google Scholar] [CrossRef]
Chen, S.; Huang, D.; Yu, S.; Gao, X.; Zhen, J.; Chen, X. Developing a rapid COD detection method based on the fusion strategy of multi-depth hyperspectral data. Biochem. Eng. J. 2025, 215, 109630. [Google Scholar] [CrossRef]
Asghari, A.; Khorrami, M.K.; Garmarudi, A.B. Comparison between partial least square and support vector regression with a genetic algorithm wavelength selection method for the simultaneous determination of some oxygenate compounds in gasoline by FTIR spectroscopy. Infrared Phys. Technol. 2020, 105, 103177. [Google Scholar] [CrossRef]
Guo, L.; Yin, Y.; Yuan, Y.; Yu, H. A robust characteristic wavelength extraction strategy for hyperspectral information:Three cases of potato quality evaluation. Microchem. J. 2024, 200, 110346. [Google Scholar] [CrossRef]
Yujie, L.; Benxue, M.; Cong, L.; Guowei, Y. Accurate prediction of soluble solid content in dried Hami jujube using SWIR hyperspectral imaging with comparative analysis of models. Comput. Electron. Agric. 2022, 193, 106655. [Google Scholar] [CrossRef]
Berliana, E.V.; Riasetiawan, M. Comparative Analysis of Naïve Bayes Classifier, Support Vector Machine and Decision Tree in Rainfall Classification Using Confusion Matrix. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 560–567. [Google Scholar] [CrossRef]

Figure 1. Potato samples: (a) healthy potato; (b) green-skin potato; (c) black-skin potato; (d) scab disease potato; (e) broken-skin potato; (f) mechanical-damage potato.

Figure 2. Image binarization processing.

Figure 3. Extraction of hyperspectral data from the region of interest.

Figure 4. Average hyperspectral curve at 550–920 nm.

Figure 5. Spectrogram of different preprocessing methods for healthy potatoes: (a) original; (b) SG; (c) MSC; (d) SNV; (e) normalization; (f) SG–MSC; (g) SG–SNV; and (h) SG–normalization.

Figure 6. Spectrogram of different preprocessing methods for black/green-skin potatoes: (a) original; (b) SG; (c) MSC; (d) SNV; (e) normalization; (f) SG–MSC; (g) SG–SNV; and (h) SG–normalization.

Figure 7. Spectrogram of different preprocessing methods for scab/mechanical-injury/broken-skin potatoes: (a) original; (b) SG; (c) MSC; (d) SNV; (e) normalization; (f) SG–MSC; (g) SG–SNV; and (h) SG–normalization.

Figure 8. The scatterplot of variance for PCA of three types of potatoes: (a) healthy potatoes; (b) black/green-skin potatoes; (c) scab/mechanical-damage/broken-skin potatoes.

Figure 9. Characteristic wavelength curves of selected potatoes under the optimal hyperspectral model.

Figure 10. Discriminant accuracy of the three spectral qualitative models.

Figure 11. Confusion matrix chart of prediction results: (a) CART; (b) KNN; (c) BPNN; (d) experimental verification of model detection rate.

Table 1. Optimal spectral quantitative model for healthy potatoes.

Model Method	Preprocess Algorithms	R²	RMSEC	RMSEP	RPD
LSSVM	Original	0.9546	0.0018	0.1219	4.8032
	SNV	0.9702	0.0018	0.0061	6.3696
	SG	0.9671	0.0018	0.0081	5.5457
	Normalization	0.7432	0.0017	0.0319	2.0644
	MSC	0.7785	0.3186	0.0291	2.1572
	SG–SNV	0.7853	0.8436	0.0224	2.2613
	SG–normalization	0.9313	0.0016	0.0158	3.8431
	SG–MSC	0.7649	0.0017	0.0278	2.0839
PCR	Original	0.9673	0.0285	0.0407	5.5344
	SNV	0.9972	0.0180	0.0194	11.0049
	SG	0.9583	0.3214	0.4051	6.5369
	Normalization	0.7492	0.0475	0.0272	3.0568
	MSC	0.7587	0.0248	0.0446	1.6601
	SG–SNV	0.9499	0.0184	0.0154	14.4893
	SG–normalization	0.9099	0.0696	0.0447	42.9731
	SG–MSC	0.8776	0.0256	0.0574	3.3721
PLSR	Original	0.9872	0.2668	0.4506	7.4802
	SNV	0.9862	0.0238	0.0204	8.9260
	SG	0.9772	0.4085	0.4306	4.2544
	Normalization	0.9288	0.1245	0.0352	3.5671
	MSC	0.9635	0.0424	0.0309	5.2369
	SG–SNV	0.9789	0.2261	0.2896	6.5714
	SG–normalization	0.9884	0.0367	0.3239	64.8022
	SG–MSC	0.8145	0.1083	0.0353	2.1637
SVM	Original	0.9979	0.9513	0.8594	1.1013
	SNV	0.9569	0.5406	0.4807	6.1628
	SG	0.2515	0.8727	0.2085	1.2220
	Normalization	0.9884	0.4645	0.0979	1.0323
	MSC	0.9984	0.0433	0.0192	1.0764
	SG–SNV	0.8632	0.0517	0.0822	2.8107
	SG–normalization	0.9956	0.0049	0.0048	50.7900
	SG–MSC	0.9798	0.6107	0.0135	1.3007

The bold number is the value of the evaluation index of the optimal quantitative model.

Table 2. Optimal spectral quantitative model for black/green-skin potatoes.

Model Method	Preprocess Algorithms	R²	RMSEC	RMSEP	RPD
LSSVM	Original	0.9995	0.0022	0.0031	36.0454
	SNV	0.9971	0.0315	0.0196	13.6751
	SG	0.9985	0.0018	0.0038	26.0084
	Normalization	0.9975	0.0022	0.0008	63.6733
	MSC	0.9987	0.0023	0.0037	11.8601
	SG–SNV	0.9971	0.0041	0.0163	16.8621
	SG–normalization	0.9972	0.0184	0.0112	17.6351
	SG–MSC	0.9954	0.0014	0.0041	14.8136
PCR	Original	0.9989	0.0032	0.0038	30.9618
	SNV	0.9955	0.0194	0.0177	15.3956
	SG	0.9986	0.0028	0.0037	32.5557
	Normalization	0.9994	0.0009	0.0034	47.2952
	MSC	0.9946	0.0339	0.0031	16.8008
	SG–SNV	0.9883	0.0240	0.0167	9.1554
	SG–normalization	0.9975	0.0013	0.0023	64.9067
	SG–MSC	0.9976	0.0034	0.0029	21.7409
PLSR	Original	0.9873	0.0127	0.3576	27.8184
	SNV	0.9953	0.0196	0.0052	14.4648
	SG	0.9494	0.0167	0.0043	32.0425
	Normalization	0.9982	0.0374	0.0016	74.0153
	MSC	0.9949	0.0016	0.0042	13.5957
	SG–SNV	0.9983	0.0076	0.0126	23.1621
	SG–normalization	0.9989	0.0017	0.0683	82.8501
	SG–MSC	0.9952	0.0015	0.0040	14.1826
SVM	Original	0.4547	0.0032	0.0807	1.3548
	SNV	0.9699	0.0065	0.0446	6.2774
	SG	0.7278	0.0036	0.0628	1.917
	Normalization	0.9952	0.0031	0.0069	15.8604
	MSC	0.7765	0.0017	0.0278	2.334
	SG–SNV	0.9942	0.0071	0.0193	13.2256
	SG–normalization	0.9963	0.0037	0.0087	16.6649
	SG–MSC	0.7411	0.0017	0.0271	1.983

The bold number is the value of the evaluation index of the optimal quantitative model.

Table 3. Optimal spectral quantitative model for scab/mechanical-damage/broken-skin potatoes.

Model Method	Preprocess Algorithms	R²	RMSEC	RMSEP	RPD
LSSVM	Original	0.9982	0.0789	0.0353	23.6735
	SNV	0.9976	0.0044	0.0291	20.4414
	SG	0.9449	0.0021	0.0032	42.2290
	Normalization	0.9885	0.0245	0.0204	8.1345
	MSC	0.9986	0.0021	0.0039	11.8122
	SG–SNV	0.9970	0.0936	0.0317	18.3182
	SG–normalization	0.9995	0.0019	0.0015	84.5735
	SG–MSC	0.9984	0.0021	0.0039	25.3372
PCR	Original	0.9980	0.0034	0.0034	30.8110
	SNV	0.9975	0.0277	0.0316	21.1826
	SG	0.9988	0.0038	0.0033	28.9727
	Normalization	0.9876	0.0357	0.0289	8.2371
	MSC	0.9969	0.0037	0.0035	18.4770
	SG–SNV	0.9981	0.0256	0.0331	16.2945
	SG–normalization	0.9985	0.0014	0.0021	24.7903
	SG–MSC	0.9969	0.0314	0.0036	17.9501
PLSR	Original	0.9989	0.0031	0.0016	29.6294
	SNV	0.9605	0.0064	0.0351	15.9038
	SG	0.9989	0.0019	0.0036	30.9011
	Normalization	0.9494	0.0014	0.0017	12.8549
	MSC	0.9918	0.0012	0.0038	11.0638
	SG–SNV	0.9599	0.0113	0.0332	16.8489
	SG–normalization	0.9992	0.0001	0.0015	104.4002
	SG–MSC	0.9911	0.0018	0.0047	9.9476
SVM	Original	0.5005	0.0399	0.0778	1.4450
	SNV	0.9212	0.0148	0.1612	3.6033
	SG	0.6776	0.0397	0.0548	1.7792
	Normalization	0.9695	0.0037	0.0314	5.8785
	MSC	0.9300	0.0024	0.0145	3.9679
	SG–SNV	0.9857	0.0151	0.0693	8.4233
	SG–normalization	0.9921	0.0041	0.0149	11.5070
	SG–MSC	0.7721	0.0020	0.0284	2.1677

The bold number is the value of the evaluation index of the optimal quantitative model.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, P.; Wang, X.; Zhao, Q.; Xu, Q.; Sun, Y.; Ning, X. Non-Destructive Detection of External Defects in Potatoes Using Hyperspectral Imaging and Machine Learning. Agriculture 2025, 15, 573. https://doi.org/10.3390/agriculture15060573

AMA Style

Zhao P, Wang X, Zhao Q, Xu Q, Sun Y, Ning X. Non-Destructive Detection of External Defects in Potatoes Using Hyperspectral Imaging and Machine Learning. Agriculture. 2025; 15(6):573. https://doi.org/10.3390/agriculture15060573

Chicago/Turabian Style

Zhao, Ping, Xiaojian Wang, Qing Zhao, Qingbing Xu, Yiru Sun, and Xiaofeng Ning. 2025. "Non-Destructive Detection of External Defects in Potatoes Using Hyperspectral Imaging and Machine Learning" Agriculture 15, no. 6: 573. https://doi.org/10.3390/agriculture15060573

APA Style

Zhao, P., Wang, X., Zhao, Q., Xu, Q., Sun, Y., & Ning, X. (2025). Non-Destructive Detection of External Defects in Potatoes Using Hyperspectral Imaging and Machine Learning. Agriculture, 15(6), 573. https://doi.org/10.3390/agriculture15060573

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Non-Destructive Detection of External Defects in Potatoes Using Hyperspectral Imaging and Machine Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Experimental Samples and Their Characteristics

2.2. Main Instrument and Equipment

2.3. Hyperspectral Image Correction and Data Extraction

2.4. Data Analysis Method

2.4.1. Hyperspectral Data Preprocessing

2.4.2. Quantitative Model Establishing

2.4.3. Qualitative Model Establishment

2.4.4. Confusion Matrices

3. Results

3.1. Spectral Preprocessing

3.2. Quantitative Models

3.3. Qualitative Models

3.4. Experimental Verification and Result Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI