Early Detection of Heat-Damaged Soybean Seeds Based on Hyperspectral Imaging Technology

Tan, Kezhu; Zhuo, Zonghui; Sun, Weiqi; Gan, Hao; Lu, Chao; Liu, Qi; Zhang, Xihai

doi:10.3390/agriengineering8020045

Open AccessArticle

Early Detection of Heat-Damaged Soybean Seeds Based on Hyperspectral Imaging Technology

by

Kezhu Tan

¹

,

Zonghui Zhuo

¹,

Weiqi Sun

¹,

Hao Gan

²

,

Chao Lu

¹,

Qi Liu

¹ and

Xihai Zhang

^1,3,*

¹

College of Electrical Engineering and Information, Northeast Agricultural University, Harbin 150030, China

²

Department of Biosystems Engineering and Soil Science, University of Tennessee, Knoxville, TN 37922, USA

³

National Key Laboratory of Smart Farm Technology and System, Northeast Agricultural University, Harbin 150030, China

^*

Author to whom correspondence should be addressed.

AgriEngineering 2026, 8(2), 45; https://doi.org/10.3390/agriengineering8020045

Submission received: 30 November 2025 / Revised: 23 January 2026 / Accepted: 28 January 2026 / Published: 2 February 2026

Download

Browse Figures

Versions Notes

Abstract

Heat damage caused by elevated temperature and humidity during storage significantly affects soybean seed quality and viability. Early detection remains challenging due to the lack of visible symptoms in mildly damaged seeds. In this study, hyperspectral imaging (HSI) was adopted to capture detailed spectral information associated with internal physiological changes in soybean seeds. To simulate realistic thermal stress scenarios, soybean seeds were subjected to two temperature conditions: a control group stored at 25 °C and 55% RH and a heat-treated group stored at 45 °C and 80% RH. Within the high-temperature group, different durations (20 and 30 days) were used to generate mildly and severely damaged seeds, respectively. After image acquisition using a 400–1000 nm VNIR-HSI system, multiplicative scatter correction (MSC) was selected as the optimal preprocessing method to reduce scattering effects. Spectral dimensionality was then reduced using a recursive feature elimination–principal component analysis (RFE-PCA) cascade to retain key discriminative features. Finally, a support vector classifier was constructed and optimized using a Lévy–Sine-enhanced Egret Swarm Optimization Algorithm (LSESOA), yielding a classification accuracy of 96.75% and a macro-F1 score of 96.76% on the test set. This study demonstrates the feasibility of applying HSI combined with metaheuristic optimization to achieve accurate, non-destructive evaluation of heat-damaged soybean seeds, providing technical support for quality control during storage and logistics.

Keywords:

heat-damaged soybean seeds; support vector classifier; hyperspectral; non-destructive detection

1. Introduction

Soybean seeds are rich in protein and oil but low in cholesterol, making them a key global source of food and plant-based oil [1]. In 2023, global soybean production reached approximately 370 million tons, necessitating large-scale storage and long-distance transportation [2]. During both warehouse storage and transoceanic shipping, poor ventilation, elevated temperatures, and excess moisture often result in heat accumulation within bulk soybean piles [3,4]. Under these conditions, intensified respiration and microbial activity release heat and moisture, which are difficult to dissipate, particularly in the core of the stack. This self-heating effect can lead to protein degradation, lipid oxidation, and other forms of irreversible quality loss, commonly referred to as heat damage [5].

Previous studies have shown that soybean seeds stored at ambient temperatures (25 °C) for 30 days experience minimal deterioration, with protein solubility decreasing by only 2–5% [6,7]. However, in real-world storage environments, internal pile temperatures can rise 15–20 °C above ambient due to heat generated by seed respiration and microbial metabolism. In extreme cases, such as during summer storage or in sealed ship holds, temperatures may exceed 45 °C [8,9,10,11,12]. After 30 days at 45 °C, soybeans typically exhibit visible signs of heat damage, including seed coat wrinkling, discoloration, reduced protein and oil extractability, and lower unsaturated fatty acid content [6,13]. For this study, 25 °C was chosen to prepare normal soybean samples, while 45 °C was used to induce heat damage, establishing a controlled and practically relevant thermal damage model.

Traditional methods for detecting heat-damaged soybean seeds often rely on RGB imaging and visual classification [14,15,16]. These approaches identify damaged seeds based on external traits, such as color and texture. Healthy seeds typically have smooth, yellowish surfaces, while severely damaged seeds appear brown or reddish-brown [17]. However, these methods are less effective at detecting mildly heat-damaged seeds, where internal biochemical changes occur without visible surface alterations. This limitation highlights the need for more sensitive detection techniques.

Hyperspectral imaging (HSI) offers a powerful alternative to traditional methods, as it captures spectral data across hundreds of narrow bands, revealing subtle chemical and structural changes that are invisible to RGB imaging [18,19]. Previous studies have demonstrated HSI’s effectiveness in seed quality evaluation, such as predicting sunflower seed vigor or assessing the oil content of soybean seeds [18,20]. However, the practical application of HSI is often hindered by high data dimensionality, which increases computational costs and reduces model generalization. As a result, effective dimensionality reduction becomes essential for its efficient use [17,21,22,23,24].

To address this challenge, various dimensionality reduction methods have been explored, and they can be broadly categorized into feature selection and feature extraction techniques. Feature selection methods, such as the Successive Projections Algorithm (SPA) and Competitive Adaptive Reweighted Sampling (CARS), focus on retaining the most informative wavelengths while discarding redundant ones. For instance, Zhang et al. used SPA combined with a Random Forest (RF) model to predict grain filling rate [25], and Wang et al. applied CARS to identify key wavelengths in sweet corn seed evaluation [26]. On the other hand, feature extraction methods, such as Principal Component Analysis (PCA) [27], transform the data into uncorrelated variables, improving compactness at the cost of interpretability. Huang et al. used a combination of methods, including Recursive Feature Elimination (RFE), Correlation Analysis (CA), and Variable Importance in Projection (VIP), to select a small number of relevant wavelengths for evaluating maize seed viability [28].

Support Vector Machines (SVM) are known for their strong generalization ability and robustness [29], making them widely used in hyperspectral data modeling. For example, Almoujahed et al. applied SVM to detect fusarium head blight in wheat using visible–near-infrared spectral data, achieving a classification accuracy of 95.6% [30]. Feng et al. developed an SVM model to predict nitrogen content in winter wheat based on hyperspectral vegetation indices [31], while Liu et al. used support vector regression combined with band selection to estimate chlorophyll content in peanut leaves [32]. These studies confirm the suitability of SVM for agricultural spectral analysis.

However, the performance of SVM models heavily depends on the appropriate selection of hyperparameters, particularly the penalty parameter c and the kernel parameter g. To tackle this issue, we employed a modified version of the Egret Swarm Optimization Algorithm (ESOA), known as the Lévy–Sine-enhanced Egret Swarm Optimization Algorithm (LSESOA). By integrating the Lévy flight strategy and sine–cosine search mechanism, LSESOA enhances global exploration capabilities and mitigates the risk of premature convergence. This metaheuristic was used to optimize the parameters of the Support Vector Classifier (SVC), leading to the development of the LSESOA–SVC model for classifying heat-damaged soybean seeds.

In this work, we integrate hyperspectral imaging with a support vector classifier (SVC) for the detection of heat-damaged soybean seeds. The methodological framework comprises: (1) an RFE-PCA cascade for spectral dimensionality reduction, (2) the LSESOA metaheuristic for SVC hyperparameter optimization, and (3) a comparative evaluation against other optimization-classifier combinations.

2. Materials and Methods

2.1. Heat Treatment of Soybean Seeds

In this study, 1200 soybean seeds of the widely cultivated variety “Heihe 43” were manually selected. The seeds were harvested on 2 October 2024, from the Xiangyang Experimental Farm of Northeast Agricultural University. After air-drying under natural ventilation, they were stored in a dry laboratory at ambient temperature. All selected seeds were uniform in size, had smooth surfaces, and were free from pests or disease symptoms. The basic information and pretreatment conditions are summarized in Table 1.

The seeds were randomly divided into three groups, each with 400 samples. The first group was stored at 25 °C with 55% relative humidity (RH) for 30 days as a control group. The second group was stored at 45 °C and 80% RH for 20 days for 20 days to make samples of mild thermal injury. The third group was stored at 45 °C and 80% RH for 30 days to make samples of severe heat damage. Figure 1 shows the sample preparation process.

To ensure that the thermal damage level of soybean seeds after treatment met the requirements of the experimental design, this study followed the National Agricultural Standard NY/T 1599-2008 [13], issued by the Ministry of Agriculture and Rural Affairs of China in 2008. The standard color chart provided in this guideline was used for manual visual assessment of the seed samples. Two researchers with experience in agricultural testing conducted the evaluation independently under consistent lighting and background conditions. Based on the colorimetric assessment, a total of 1080 seeds (360 seeds per category) were selected from 1200 treated samples for subsequent hyperspectral image acquisition and modeling analysis.

2.2. Hyperspectral Data Acquisition

Hyperspectral images were acquired using a hyperspectral imaging system (HeadWall, Bolton, MA, USA), comprising a Hyperspec^® VNIR (Visible and Near-Infrared) hyperspectral imager (Headwall Photonics, Bolton, MA, USA) equipped with a charge-coupled device (CCD) camera (Headwall Photonics, Bolton, MA, USA) (Figure 2a), a 150 W adjustable fiber-optic halogen lamp (Figure 2b), and a motorized sample translation platform (Figure 2c,d). Detailed system specifications are listed in Table 2. The effective spectral range of the system was 400–1000 nm.

Prior to image acquisition, the system was preheated for 20–30 min to ensure light source stability. To prevent displacement or vibration of soybean seeds during the translational movement of the platform, a black, perforated, light-absorbing tray was used to securely hold each seed in place. The tray contained 108 circular holes arranged in a

9 \times 12

grid, with each hole accommodating one seed (Figure 2c). The platform moved horizontally at a constant speed of

3.6 m m / s

over a

150 m m

range to capture hyperspectral images. If any image exhibited significant distortion, the acquisition was repeated.

After image acquisition, spectral data were extracted using ENVI software (version 5.3, NV5, USA). Each hyperspectral band contained both sample-specific information and background noise. As shown in Figure 3, a

25 \times 25 - pixel

region of interest (ROI) was manually selected from the center of each soybean seed, and the mean reflectance across all spectral bands was calculated to represent the spectral feature of each seed. A total of 1080 samples were analyzed, resulting in a spectral matrix with 184 wavelengths (rows) and 1080 samples (columns).

To correct for sensor and illumination inconsistencies, black-and-white calibration was performed using standard reference images [33]. The corrected spectral reflectance

R_{c}

was calculated using Equation (1), where

R_{c}

is the corrected spectral data,

R_{s}

is the raw spectral data,

R_{w}

is the white reference data, and

R_{d}

is the dark current data.

R_{c} = \frac{R_{s} - R_{d}}{R_{w} - R_{d}}

(1)

Subsequently, envelope correction was applied to eliminate baseline shifts and enhance spectral contrast. This step reduced the effects of uneven sample geometry and light scattering, thereby improving the identification of spectral absorption features. The entire preprocessing procedure—from raw hyperspectral image to corrected reflectance curve—is illustrated in Figure 4. The original and corrected spectral curves are shown in Figure 5. The corrected spectral data were used for all subsequent analyses.

2.3. Data Preprocessing

Despite efforts to reduce interference during the acquisition of soybean spectral images and data extraction, spectral noise remains inevitable. Therefore, four preprocessing techniques—Z-score normalization, first derivative, multiplicative scatter correction (MSC), and standard normal variate transformation (SNV)—were compared to improve data quality [34,35,36].

The efficacy of each method was assessed using a Random Forest (RF) classifier. RF is well-suited for this benchmarking role due to its robustness in handling high-dimensional spectral data and its relative insensitivity to hyperparameter tuning [37,38].

The dataset was divided into training and testing subsets at an 8:2 ratio. Using 5-fold cross-validation on the training set, the mean classification accuracy served as the primary metric to compare the preprocessing methods. This objective comparison identified the most effective preprocessing strategy before proceeding to dimensionality reduction.

2.4. Data Dimensionality Reduction

To systematically evaluate the effects of different dimensionality reduction strategies and their combinations on the classification performance of hyperspectral data and determine the optimal number of dimensions for reduction, four groups of comparative experiments were designed: PCA alone, RFE alone, PCA-RFE cascade, and RFE-PCA cascade. In all experiments, the Random Forest classifier was used as the benchmark classification model, and the average classification accuracy obtained under 5-fold cross-validation was used as the core evaluation metric to quantify the dimensionality reduction effect. The specific objectives and procedures of each experiment group are as follows:

Separate PCA dimension reduction: In Principal Component Analysis, all possible reserved dimensions (from 1 to 184, the full set of original dimensions) are traversed. The performance of the Random Forest model is evaluated, and the optimal dimension $P C A_o p t_d i m$ is determined.
Separate RFE dimension reduction: In recursive feature elimination, all possible numbers of retained features (from 1 to 184) are traversed. The performance of the Random Forest model is evaluated at each step, and the optimal number of features, denoted as $R F E_o p t_d i m$ is selected.
PCA-RFE cascade dimensionality reduction: Initial dimensionality reduction using the optimal PCA dimension $P C A_o p t_d i m$ from separate PCA experiments. On the obtained $P C A_o p t_d i m$ dimensional data, RFE is applied and the reserved dimension $k$ ( $1 \leq k \leq P C A_o p t_d i m$ ) is traversed to evaluate the performance of the Random Forest model. Determine cascade optimal dimension $P C A_R F E_o p t_d i m$ .
RFE-PCA cascade dimensionality reduction: Select a feature subset using the optimal RFE dimension $R F E_o p t_d i m$ in a separate RFE experiment. On this $R F E_o p t_d i m$ dimensional subset, apply PCA and traverse the preserved dimension $m$ ( $1 \leq m \leq R F E_o p t_d i m$ ), evaluate the performance of Random Forest model, determine the cascade optimal dimension $R F E_P C A_o p t_d i m$ .

To maintain model interpretability and reduce computational complexity, the dimensionality used in the first stage of each cascaded method was fixed to the optimal value obtained from its respective standalone approach. This ensures that each combined method contributes the most informative feature subset or principal components to the second stage. Such a strategy balances performance and practicality by avoiding extensive multi-stage parameter tuning.

Following the execution of the above comparative framework, the most effective dimensionality reduction strategy was identified based on the highest achieved 5-fold cross-validation accuracy. The feature subset produced by this optimal strategy was then fixed and served as the exclusive input for all subsequent classifier training and hyperparameter optimization experiments (Section 2.5 and Section 2.6). This approach ensures that all models are evaluated on a consistent and objectively selected feature space, enabling a fair comparison of algorithmic performance.

2.5. Lévy–Sine-Enhanced Egret Swarm Optimization Algorithm (LSESOA)

The Egret Swarm Optimization Algorithm is a swarm intelligence optimization algorithm [39]. The algorithm simulates two kinds of predatory behaviors of egrets: sit-and-wait strategy of snowy egret and aggressive strategy of great egret, and achieves a dynamic balance between them through a discrimination mechanism, thus giving consideration to both global search and local exploitation. However, its local disturbance mechanism is single and its jumping ability is insufficient, so it is easy to fall into local optimum. In order to enhance the local search ability, an improved strategy combining nonlinear sine and cosine disturbance and Lévy flight mechanism is proposed in this paper.

To enhance the local search capability, inspired by the Sine-Cosine Algorithm (SCA), a nonlinear sine learning factor (Equation (2)) is introduced into the position update formula of the great egret (Equation (3)), allowing the search range to adjust periodically [40]. This strategy ensures that individuals retain sufficient diversity even near local optima, reducing the likelihood of becoming trapped in suboptimal regions.

w = w_{\min} + (w_{\max} - w_{\min}) \cdot \sin (\frac{π t}{T_{\max}})

(2)

where

t

is the current number of iterations;

T_{\max}

is the maximum number of iterations; and

w_{\min}

and

w_{\max}

are the minimum and maximum values of the factor, respectively, typically set to 0.1 and 0.9.

{X_g}_{i, j}^{t + 1} = \{\begin{matrix} w \cdot {X_g}_{i, j}^{t} + (1 - w) \cdot r_{1} \cdot \sin (r_{2}) \cdot ({X_g}_{best, j}^{t} - {X_g}_{i, j}^{t}), & if r < δ \\ w \cdot {X_g}_{i, j}^{t} + (1 - w) \cdot r_{1} \cdot \cos (r_{2}) \cdot ({X_g}_{best, j}^{t} - {X_g}_{i, j}^{t}), & else \end{matrix}

(3)

where

{X_{g}}_{i, j}^{t + 1}

is the position of the

j

th dimension of the

i

th individual at the iteration

t + 1

;

r ~ U (0, 1)

is a random number for policy switching;

δ \in (0, 1)

is a fixed decision threshold;

r_{1} \in [0, 2]

and

r_{2} \in [0, 2 π]

are the perturbation amplitude parameters;

{X_g}_{best, j}^{t}

is the global optimal position of the current population.

In the renewal formula for snow egret individuals, the Lévy flight strategy (Equations (5) and (6)) is introduced [41]. When the individual-generated random number

p < p_{t}

, the position update formula fused with the Lévy flight strategy is executed; otherwise, the exponential perturbation update (Equation (4)) is executed. This strategy can guide the individual to jump out of the current region with a certain probability when the local search is stalled, so as to improve the global search ability of the algorithm.

{X_s}_{i, j}^{t + 1} = \{\begin{matrix} {X_s}_{i, j}^{t} + L e v y, & if p < p_{t} \\ {X_s}_{i, j}^{t} + α \cdot \exp (- \frac{|{X_s}_{i, j}^{t} - {X_s}_{i, j}^{worst}|}{2}), & else \end{matrix}

(4)

where

{X_s}_{i, j}^{worst}

represents the worst position in the current population;

p ~ U (0, 1)

is a random number uniformly distributed in the interval

[0, 1]

, and

p_{t} \in (0, 1)

is the control probability;

α

is the coefficient for adjusting the search amplitude; and

L e v y

denotes the jump step generated by the Lévy flight strategy, defined as follows:

Levy = 0.01 \cdot \frac{r_{1} \cdot σ}{{|r_{2}|}^{1 / ξ}}, r_{1}, r_{2} \sim N (0,1), ξ = 1.5

(5)

σ = {(\frac{Γ (1 + ξ) \cdot \sin (π ξ / 2)}{Γ (\frac{1 + ξ}{2}) \cdot ξ \cdot 2^{(ξ - 1) / 2}})}^{1 / ξ}

(6)

where

r_{1}

and

r_{2} ~ N (0,1)

are random variables drawn from a standard normal distribution;

ξ

is the stability index;

σ

is the scaling factor of the Lévy distribution;

Γ (\cdot)

is the Gamma function.

2.6. Model Construction

To obtain a classification model with optimal performance, this study systematically investigated the use of various optimization algorithms for tuning key model parameters. Eight sets of experiments were conducted, incorporating four optimization strategies: the Genetic Algorithm (GA), Particle Swarm Optimization (PSO), the original ESOA, and the improved LSESOA. Each algorithm was used to optimize the regularization coefficient

c

and the kernel function parameter

g

of the SVC, as well as the number of base learners

n_e s t i m a t o r s

and the learning rate

l r

of the Gradient Boosting Decision Tree (GBDT).

All experiments were conducted on feature datasets that had undergone dimensionality reduction using the RFE-PCA method, as outlined in Section 2.4. Model performance was evaluated using 5-fold cross-validation, with the average classification accuracy serving as the primary evaluation metric. The detailed parameter settings and search space configurations are summarized in Table 3.

3. Results and Discussion

3.1. Pre-Processing of Spetral Data

Figure 6 shows the preprocessing effects of the four spectral correction techniques. As can be seen from Figure 6a, the first derivative can enhance the local variation characteristics of the spectrum, but it also amplifies the noise signal. The SNV processing results (Figure 6b) have a more compact curve distribution, indicating that it has a good effect in correcting the scattering effect. The spectra after MSC processing (Figure 6c) not only showed higher consistency, but also retained the shape of the original curve, reflecting its advantage in reducing baseline drift. In contrast, the Z-score normalization (Figure 6d) is helpful to unify the data scale, but some sample curves fluctuate significantly, indicating that it is more sensitive to outliers.

As can be seen from the quantitative results in Table 4, all the preprocessing methods outperform the raw data in terms of classification performance. Among them, MSC achieves the highest classification accuracy (82.0%) and Macro-F1 score (80.6%), outperforming methods such as SNV and First Derivative. Therefore, combining the performance in Table 4 with the visual evaluation results in Figure 6, MSC is finally selected as the optimal preprocessing strategy for subsequent modeling.

3.2. Dimensionality Reduction Result Analysis

Figure 7 illustrates how the classification accuracy of the RF model varies with the number of features after applying four dimensionality reduction methods: PCA, RFE, PCA-EFA, and RFE-PCA. The analysis indicates that different feature selection strategies result in notable differences in model performance. When 26 features are selected, the RFE method achieves an accuracy of 86.2%, demonstrating strong feature selection capability. The PCA method reaches 87.4% accuracy with 20 principal components, but its performance shows considerable fluctuation across dimensions, indicating a degree of instability. In contrast, the RFE-PCA method yields the highest accuracy of 88.2% with only 9 features, effectively preserving key discriminative information while exhibiting better generalization and stability. However, the PCA-RFE combination shows slightly lower performance under the same dimensional setting, with an accuracy of 87.1%. The optimal results for all four methods are summarized in Table 5.

To interpret the meaning of the optimal RFE-PCA features, we analyzed the PCA loading matrix. Initially, RFE selected 26 discriminative spectral bands. PCA was then applied to compress these bands into nine principal components, forming the final feature set. To associate these abstract principal components with physically meaningful wavelengths, we calculated the integrated contribution (i.e., the sum of squared loadings across all nine principal components) for each of the 26 candidate bands. The results show that the feature space is predominantly governed by a small set of core wavelengths, with the five most influential bands being 670.5 nm, 696.5 nm, 761.6 nm, 849.5 nm, and 969.9 nm. These wavelengths are concentrated in three key spectral regions—the visible range (445–670 nm), the red-edge region (around 696 nm), and the near-infrared range (760–1000 nm)—which correspond, respectively, to specific heat-damage mechanisms: changes in surface texture and pigment, chlorophyll degradation, and moisture migration or internal structural breakdown [17,42]. This interpretation, grounded in physicochemical principles, provides a clear rationale for the model’s high classification accuracy.

3.3. Classification Model Analysis

Figure 8 illustrates the average fitness curves of eight model configurations over 100 iterations. In this study, the fitness value is defined as the classification error rate derived from 5-fold cross-validation, where a lower value indicates better model performance. As shown in the figure, the LSESOA–SVC and LSESOA–GBDT models exhibit a sharp decline in fitness during the first 10 iterations, followed by a gradual stabilization. This trend suggests that both models are capable of efficient global exploration in the early stages and progressively fine-tune their parameter search as the optimization proceeds.

In contrast, models optimized using GA and PSO demonstrate slower convergence and tend to plateau at higher fitness values, indicating a limited capacity for effective parameter space exploration. The models based on the original ESOA display moderate convergence behavior—superior to GA and PSO, yet still falling short of the performance achieved by LSESOA-based models. Notably, the relatively flat fitness curves observed in GA–SVC and PSO–GBDT suggest the occurrence of premature convergence or inadequate exploration in the later stages of optimization.

To further evaluate classification performance, Figure 9 presents the confusion matrices on the independent test set, while Table 6 summarizes the corresponding accuracy and macro-F1 scores. Among all tested configurations, LSESOA–SVC achieved the best performance, with a classification accuracy of 96.8% and a macro-F1 score of 96.8%, demonstrating robust generalization ability. The confusion matrix reveals strong diagonal dominance, indicating accurate identification across all three damage categories, with only seven samples misclassified. Particularly noteworthy is this model’s reduced confusion between mildly damaged and normal seeds, which are typically the most challenging to distinguish.

In comparison, models optimized using GA and PSO generally exhibited inferior classification accuracy, especially in identifying mildly damaged samples. For example, both PSO–SVC and GA–SVC showed evident misclassification between the mildly damaged and normal categories. These results highlight the limitations of traditional optimization algorithms in navigating high-dimensional feature spaces, particularly when dealing with subtle inter-class distinctions.

Overall, the results validate the effectiveness of swarm intelligence optimization in enhancing classification performance for hyperspectral soybean analysis. Among all tested methods, LSESOA–SVC consistently delivered the most accurate and reliable results.

4. Conclusions

This study proposed a non-destructive framework for the early detection of heat-damaged soybean seeds by integrating hyperspectral imaging with a support vector classifier optimized using a novel Lévy–Sine-Enhanced Egret Swarm Optimization Algorithm (LSESOA). The application of HSI enabled the effective identification of internal biochemical changes, particularly in mildly damaged seeds where visible symptoms were lacking. To reduce spectral dimensionality while preserving classification performance, a combination of multiplicative scatter correction (MSC) and a recursive feature elimination–principal component analysis (RFE-PCA) cascade was employed, reducing the number of features from 184 to 9. This efficient processing chain achieved a dramatic data compression: from approximately 32 GB of raw hyperspectral images for the 1200 seeds to a final discriminative feature set of only about 0.7 MB, representing a compression ratio exceeding 45,000:1. Consequently, the “digital footprint” for the classification model is reduced to a practical level of approximately 3.1 MB per kilogram of seeds. The optimized LSESOA–SVC model achieved a classification accuracy of 96.8% on the test set.

These results indicate that the proposed method has potential for rapid and non-invasive assessment of soybean seed quality under heat stress. Since the current validation was conducted under laboratory conditions, further testing in large-scale storage and in-line inspection scenarios is needed. Future work should also consider representative sampling strategies and broader temperature–humidity conditions to improve model robustness. In addition, establishing clearer links between spectral responses, heat damage severity, and physiological indicators such as germination performance and seed vigor will help enhance the practical relevance of this approach.

Author Contributions

Conceptualization, K.T. and X.Z.; methodology, K.T., X.Z. and W.S.; software, K.T. and Z.Z.; validation, X.Z., Z.Z. and W.S.; formal analysis, K.T. and X.Z.; investigation, K.T., Q.L. and C.L.; resources, K.T., Z.Z. and X.Z.; data curation, W.S. and Z.Z.; writing—original draft preparation, K.T.; writing—review and editing, X.Z. and H.G.; supervision, X.Z.; project administration, K.T.; funding acquisition, X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the NATURAL SCIENCE FOUNDATION OF HEILONGJIANG PROVINCE, CHINA, grant number JJ2024ZL0087.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

All authors thank Northeast Agricultural University for providing instrument support.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CA	Correlation Analysis
CARS	Competitive Adaptive Reweighted Sampling
CCD	Charge-coupled device
CV	5-fold Cross-Validation
ESOA	Egret Swarm Optimization Algorithm
GA	Genetic algorithm
GBDT	Gradient Boosting Decision Tree
HSI	Hyperspectral imaging
LSESOA	Lévy–Sine-enhanced Egret Swarm Optimization Algorithm
MSC	Multiplicative scatter correction
PCA	Principal Component Analysis
PCA-RFE	Principal component analysis–recursive feature elimination cascade
PSO	Particle Swarm Optimization
RF	Random Forest
RFE	Recursive Feature Elimination
RFE-PCA	Recursive feature elimination–principal component analysis cascade
RGB	Red Green Blue
RH	Relative humidity
ROI	Region of interest
SCA	Sine-Cosine Algorithm
SD	Standard Deviation
SNV	Standard normal variate transformation
SPA	Successive Projections Algorithm
SVM	Support Vector Machines
VIP	Variable Importance in Projection

References

Yoosefzadeh-Najafabadi, M.; Torabi, S.; Tulpan, D.; Rajcan, I.; Eskandari, M. Genome-Wide Association Studies of Soybean Yield-Related Hyperspectral Reflectance Bands Using Machine Learning-Mediated Data Integration Methods. Front. Plant Sci. 2021, 12, 777028. [Google Scholar] [CrossRef]
Home|USDA Foreign Agricultural Service. Available online: https://www.fas.usda.gov/home (accessed on 24 July 2025).
International Marine Insurance|The Swedish Club. Available online: https://www.swedishclub.com/ (accessed on 24 July 2025).
Hou, H.J.; Chang, K.C. Storage Conditions Affect Soybean Color, Chemical Composition and Tofu Qualities. J. Food Process. Preserv. 2004, 28, 473–488. [Google Scholar] [CrossRef]
Samadi; Yu, P. Dry and Moist Heating-Induced Changes in Protein Molecular Structure, Protein Subfraction, and Nutrient Profiles in Soybeans. J. Dairy Sci. 2011, 94, 6092–6102. [Google Scholar] [CrossRef]
GB/T 7415-2008; Seed Storage of Agricultural Crops. Ministry of Agriculture and Rural Areas: Beijing, China, 2008.
Chen, P.; He, H.; Shan, Z.; Li, P.; Zhao, S. Influences of storage temperature and moisture content of imported soybean on heat-damage of soybean. China Oils Fats 2015, 40, 5. [Google Scholar]
Hartman, G.L.; West, E.D.; Herman, T.K. Crops That Feed the World 2. Soybean—Worldwide Production, Use, and Constraints Caused by Pathogens and Pests. Food Secur. 2011, 3, 5–17. [Google Scholar] [CrossRef]
Huang, Z. Safe Storage Technology of Soybean. Agric. Eng. 2016, 6, 57–60. [Google Scholar]
Liu, H. Effect of ventilation on the damage of imported soybean. China Oils Fats 2021, 46, 147–149. [Google Scholar]
Staton, M.J. Reducing Soybean Harvest Losses. In Encyclopedia of Digital Agricultural Technologies; Zhang, Q., Ed.; Springer International Publishing: Cham, Switzerland, 2023; pp. 1109–1119. ISBN 978-3-031-24861-0. [Google Scholar]
Zhu, D.; Guan, D.; Fan, B.; Sun, Y.; Wang, F. Untargeted Mass Spectrometry-Based Metabolomics Approach Unveils Molecular Changes in Heat-Damaged and Normal Soybean. LWT 2022, 171, 114136. [Google Scholar] [CrossRef]
NY/T 1599-2008; Determination of Heat Damage Rate of Soybeans. Ministry of Agriculture and Rural Areas: Beijing, China, 2008.
Furlanetto, R.H.; Crusiol, L.G.T.; Nanni, M.R.; de Oliveira Junior, A.; Sibaldelli, R.N.R. Hyperspectral Data for Early Identification and Classification of Potassium Deficiency in Soybean Plants (Glycine max (L.) Merrill). Remote Sens. 2024, 16, 1900. [Google Scholar] [CrossRef]
Jeong, S.W.; Lyu, J.I.; Jeong, H.; Baek, J.; Moon, J.-K.; Lee, C.; Choi, M.-G.; Kim, K.-H.; Park, Y.-I. SUnSeT: Spectral Unmixing of Hyperspectral Images for Phenotyping Soybean Seed Traits. Plant Cell Rep. 2024, 43, 164. [Google Scholar] [CrossRef]
Xiao, D.; Zhang, L. An Effective Adaptive Deep Learning Method Combined with a Hyperspectral System to Identify the Soybeans Quality from Different Regions. Sens. Actuators Phys. 2024, 374, 115470. [Google Scholar] [CrossRef]
Liu, Y.; Li, M.; Wang, S.; Wu, T.; Jiang, W.; Liu, Z. Identification of Heat Damage in Imported Soybeans Based on Hyperspectral Imaging Technology. J. Sci. Food Agric. 2020, 100, 1775–1786. [Google Scholar] [CrossRef]
Huang, P.; Yuan, J.; Yang, P.; Xiao, F.; Zhao, Y. Nondestructive Detection of Sunflower Seed Vigor and Moisture Content Based on Hyperspectral Imaging and Chemometrics. Foods 2024, 13, 1320. [Google Scholar] [CrossRef]
Kalpana, R.; Murugesh, K.A.; Kumara Perumal, R.; Mithilasri, M.; Kannan, B. Spectral Analysis of Pink Mealy Bug Damaged Mulberry Plants Using Hyperspectral Radiometry. J. Adv. Biol. Biotechnol. 2024, 27, 81–91. [Google Scholar] [CrossRef]
Yuan, W.; Zhou, H.; Zhang, C.; Zhou, Y.; Jiang, X.; Jiang, H. Prediction of Oil Content in Camellia Oleifera Seeds Based on Deep Learning and Hyperspectral Imaging. Ind. Crops Prod. 2024, 222, 119662. [Google Scholar] [CrossRef]
Aulia, R.; Amanah, H.Z.; Lee, H.; Kim, M.S.; Baek, I.; Qin, J.; Cho, B.-K. Protein and Lipid Content Estimation in Soybeans Using Raman Hyperspectral Imaging. Front. Plant Sci. 2023, 14, 1167139. [Google Scholar] [CrossRef] [PubMed]
Graham Ram, B.; Zhang, Y.; Costa, C.; Raju Ahmed, M.; Peters, T.; Jhala, A.; Howatt, K.; Sun, X. Palmer Amaranth Identification Using Hyperspectral Imaging and Machine Learning Technologies in Soybean Field. Comput. Electron. Agric. 2023, 215, 108444. [Google Scholar] [CrossRef]
Xiaobo, Z.; Jiewen, Z.; Holmes, M.; Hanpin, M.; Jiyong, S.; Xiaopin, Y.; Yanxiao, L. Independent Component Analysis in Information Extraction from Visible/near-Infrared Hyperspectral Imaging Data of Cucumber Leaves. Chemom. Intell. Lab. Syst. 2010, 104, 265–270. [Google Scholar] [CrossRef]
Zheng, L.; Zhao, M.; Zhu, J.; Huang, L.; Zhao, J.; Liang, D.; Zhang, D. Fusion of Hyperspectral Imaging (HSI) and RGB for Identification of Soybean Kernel Damages Using ShuffleNet with Convolutional Optimization and Cross Stage Partial Architecture. Front. Plant Sci. 2023, 13, 1098864. [Google Scholar] [CrossRef]
Zhang, B.; Wu, W.; Zhou, J.; Dai, M.; Sun, Q.; Sun, X.; Chen, Z.; Gu, X. A Spectral Index for Estimating Grain Filling Rate of Winter Wheat Using UAV-Based Hyperspectral Images. Comput. Electron. Agric. 2024, 223, 109059. [Google Scholar] [CrossRef]
Wang, Y.; Peng, Y.; Qiao, X.; Zhuang, Q. Discriminant Analysis and Comparison of Corn Seed Vigor Based on Multiband Spectrum. Comput. Electron. Agric. 2021, 190, 106444. [Google Scholar] [CrossRef]
Lymperopoulou, T.; Balta-Brouma, K.; Tsakanika, L.-A.; Tzia, C.; Tsantili-Kakoulidou, A.; Tsopelas, F. Identification of Lentils (Lens culinaris Medik) from Eglouvi (Lefkada, Greece) Based on Rare Earth Elements Profile Combined with Chemometrics. Food Chem. 2024, 447, 138965. [Google Scholar] [CrossRef]
Huang, X.; Guan, H.; Bo, L.; Xu, Z.; Mao, X. Hyperspectral Proximal Sensing of Leaf Chlorophyll Content of Spring Maize Based on a Hybrid of Physically Based Modelling and Ensemble Stacking. Comput. Electron. Agric. 2023, 208, 107745. [Google Scholar] [CrossRef]
Oliveri, P. Class-Modelling in Food Analytical Chemistry: Development, Sampling, Optimisation and Validation Issues—A Tutorial. Anal. Chim. Acta 2017, 982, 9–19. [Google Scholar] [CrossRef]
Almoujahed, M.B.; Rangarajan, A.K.; Whetton, R.L.; Vincke, D.; Eylenbosch, D.; Vermeulen, P.; Mouazen, A.M. Detection of Fusarium Head Blight in Wheat under Field Conditions Using a Hyperspectral Camera and Machine Learning. Comput. Electron. Agric. 2022, 203, 107456. [Google Scholar] [CrossRef]
Feng, H.; Li, Y.; Wu, F.; Zou, X. Estimating Winter Wheat Nitrogen Content Using SPAD and Hyperspectral Vegetation Indices with Machine Learning. Trans. Chin. Soc. Agric. Eng. 2023, 40, 227–237. [Google Scholar] [CrossRef]
Liu, X.; Suo, T.; Lei, B.; Zhu, F.; Di, J.; Cheng, M.; Liangquan, X.U.; Renyi, W. Inverse Model for the Photosynthetic Pigment Content of Peanut Leaves Using Coupling Algorithm. Trans. Chin. Soc. Agric. Eng. Trans. CSAE 2023, 39, 198–207. [Google Scholar] [CrossRef]
Burger, J.; Geladi, P. Hyperspectral NIR Image Regression Part I: Calibration and Correction. J. Chemom. 2005, 19, 355–363. [Google Scholar] [CrossRef]
Cui, H.; Bing, Y.; Zhang, X.; Wang, Z.; Li, L.; Miao, A. Prediction of Maize Seed Vigor Based on First-Order Difference Characteristics of Hyperspectral Data. Agronomy 2022, 12, 1899. [Google Scholar] [CrossRef]
Dhanoa, M.S.; Lister, S.J.; Sanderson, R.; Barnes, R.J. The Link between Multiplicative Scatter Correction (MSC) and Standard Normal Variate (SNV) Transformations of NIR Spectra. J. Near Infrared Spectrosc. 1994, 2, 43–47. [Google Scholar] [CrossRef]
Peng, L.; Lu, Z.; Lei, T.; Jiang, P. Dual-Structure Elements Morphological Filtering and Local Z-Score Normalization for Infrared Small Target Detection against Heavy Clouds. Remote Sens. 2024, 16, 2343. [Google Scholar] [CrossRef]
Wang, Z.; Zhao, Z.; Yin, C. Fine Crop Classification Based on UAV Hyperspectral Images and Random Forest. ISPRS Int. J. Geo-Inf. 2022, 11, 252. [Google Scholar] [CrossRef]
Xia, J.; Falco, N.; Benediktsson, J.A.; Du, P.; Chanussot, J. Hyperspectral Image Classification with Rotation Random Forest Via KPCA. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 1601–1609. [Google Scholar] [CrossRef]
Huang, Z.; Zhang, Z.; Hua, C.; Liao, B.; Li, S. Leveraging Enhanced Egret Swarm Optimization Algorithm and Artificial Intelligence-Driven Prompt Strategies for Portfolio Selection. Sci. Rep. 2024, 14, 26681. [Google Scholar] [CrossRef] [PubMed]
Gabis, A.B.; Meraihi, Y.; Mirjalili, S.; Ramdane-Cherif, A. A Comprehensive Survey of Sine Cosine Algorithm: Variants and Applications. Artif. Intell. Rev. 2021, 54, 5469–5540. [Google Scholar] [CrossRef] [PubMed]
Seyyedabbasi, A. WOASCALF: A New Hybrid Whale Optimization Algorithm Based on Sine Cosine Algorithm and Levy Flight to Solve Global Optimization Problems. Adv. Eng. Softw. 2022, 173, 103272. [Google Scholar] [CrossRef]
Hu, M.; Li, R.; Zhong, X.; Zheng, Z.; Luo, S.; Zhao, Y. Chemometric Evaluation and Machine Learning Classification for Near-Infrared Hyperspectral Analysis of Heat-Damaged Soybeans. J. Food Compos. Anal. 2026, 149, 108789. [Google Scholar] [CrossRef]

Figure 1. Sample preparation process: (A) Original soybean seeds; (B) Soybean seeds after thermal treatment; (C) Hyperspectral image acquisition system; (D) RGB images of three categories of soybean seeds; (E) Hyperspectral pseudo-color images generated using the 475.1170 nm, 536.9730 nm, and 637.8950 nm bands, including; (a) Normal soybean seeds (stored at 25 °C, 55% RH for 30 days); (b) Slightly heat-damaged soybean seeds (stored at 45 °C, 80% RH for 20 days); (c) Severely heat-damaged soybean seeds (stored at 45 °C, 80% RH for 30 days).

Figure 2. Headwall Photonics Hyperspec^® VNIR equipment: (a) CCD sensor; (b) Halogen light source; (c) Soybean seeds sample; (d) Servo motor controller.

Figure 3. The spectral information was extracted from each sample by a

25 \times 25 - pixel

region of interest.

Figure 3. The spectral information was extracted from each sample by a

25 \times 25 - pixel

region of interest.

Figure 4. Flowchart of spectral curve extraction from raw image acquisition to corrected reflectance curve.

Figure 5. Spectral curves of soybean samples: (a) Original spectral curve of the soybean seeds; (b) Spectral curve of the soybean seeds after removing the envelope line.

Figure 6. Spectral curves of heat-damaged soybean seeds after different preprocessing methods: (a) First-order derivative; (b) SNV; (c) MSC; (d) Z-score Normalization.

Figure 7. Model accuracy versus feature dimensionality under different reduction methods: (a) RFE; (b) PCA; (c) RFE-PCA; (d) PCA-RFE.

Figure 8. Average fitness convergence curves of eight optimized models.

Figure 9. Confusion matrices of different models on the independent test set: (a) GA-SVC; (b) GA-GBDT; (c) PSO-SVC; (d) PSO-GBDT; (e) ESOA-SVC; (f) ESOA-GBDT; (g) LSESOA-SVC; (h) LSESOA-GBDT.

Table 1. Basic characteristics and pretreatment conditions of soybean seed samples.

Property	Description/Condition
Cultivar	Heihe 43
Harvest date	2 October 2024
Production site	Xiangyang Experimental Farm, Northeast Agricultural University
Sample source	College of Agriculture, Northeast Agricultural University
Post-harvest treatment	Air-dried naturally, then stored in a dry laboratory at ambient temperature
Visual selection criteria	Uniform size, smooth and intact surface, no visible defects or disease

Table 2. Basic technical specifications of the Headwall Photonics Hyperspec^® VNIR-A system.

Properties	Parameters
Aperture	F/2.0
Spectral resolution (FWHM)	2–3 nm
Spectral sampling interval	0.74 nm
Wavelength range	400–1000 nm
Maximum number of spectral bands	810
Frame rate	>90 frames per second (fps)
Illumination source	21 V, 150 W fiber optic halogen lamp
Platform scanning speed	3.6 mm/s

Table 3. Parameter settings and search space for model evaluation.

Parameter	Value/Search Space
Population Size	50
Max Iterations	100
$c$	0.01–10
$g$	0.001–10
$n_e s t i m a t o r s$	50–100
$l r$	0.001–0.3

Table 4. The results using the different preprocessing methods.

Pre-Processing Method	CV ¹ Accuracy (±SD ²)	CV Macro-F1 (±SD)
Original data	77.9% ± 2.2%	76.3% ± 2.4%
Z-score Normalization	78.9% ± 2.0%	77.2% ± 2.2%
First-order Derivative	79.6% ± 1.8%	78.1% ± 2.1%
MSC	82.0% ± 1.5%	80.5% ± 1.8%
SNV	80.7% ± 1.7%	79.3% ± 1.9%

¹ CV: 5-fold Cross-Validation. ² SD: Standard Deviation.

Table 5. Results of selected feature bands using different preprocessing methods.

Dimensionality Reduction Methods	Feature Number	CV Accuracy (±SD)	CV Macro-F1 (±SD)
RFE	26	86.2% ± 1.4%	84.9% ± 1.6%
PCA	20	87.4% ± 1.2%	86.1% ± 1.5%
RFE-PCA	9	88.2% ± 1.1%	87.2% ± 1.3%
PCA-RFE	9	87.1% ± 1.3%	86.0% ± 1.5%

Table 6. Performance of all models on the cross-validation and independent test sets.

Model	CV Validation Dataset		Test Dataset
Model	CV Accuracy (±SD)	CV Macro-F1 (±SD)	Accuracy	Macro-F1
LSESOA-SVC	97.1% ± 0.6%	97.2% ± 0.6%	96.8%	96.8%
LSESOA-GBDT	96.2% ± 0.6%	96.3% ± 0.6%	95.8%	95.9%
ESOA-SVC	91.4% ± 0.9%	91.3% ± 0.9%	90.7%	90.8%
ESOA-GBDT	91.0% ± 0.9%	90.8% ± 0.8%	90.3%	90.3%
PSO-SVC	92.1% ± 1.1%	91.9% ± 1.1%	87.5%	87.5%
PSO-GBDT	88.5% ± 0.9%	88.6% ± 1.0%	88.0%	88.4%
GA-SVC	89.0% ± 0.8%	89.0% ± 0.8%	88.4%	88.5%
GA-GBDT	91.8% ± 0.9%	91.7% ± 0.9%	86.6%	87.0%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tan, K.; Zhuo, Z.; Sun, W.; Gan, H.; Lu, C.; Liu, Q.; Zhang, X. Early Detection of Heat-Damaged Soybean Seeds Based on Hyperspectral Imaging Technology. AgriEngineering 2026, 8, 45. https://doi.org/10.3390/agriengineering8020045

AMA Style

Tan K, Zhuo Z, Sun W, Gan H, Lu C, Liu Q, Zhang X. Early Detection of Heat-Damaged Soybean Seeds Based on Hyperspectral Imaging Technology. AgriEngineering. 2026; 8(2):45. https://doi.org/10.3390/agriengineering8020045

Chicago/Turabian Style

Tan, Kezhu, Zonghui Zhuo, Weiqi Sun, Hao Gan, Chao Lu, Qi Liu, and Xihai Zhang. 2026. "Early Detection of Heat-Damaged Soybean Seeds Based on Hyperspectral Imaging Technology" AgriEngineering 8, no. 2: 45. https://doi.org/10.3390/agriengineering8020045

APA Style

Tan, K., Zhuo, Z., Sun, W., Gan, H., Lu, C., Liu, Q., & Zhang, X. (2026). Early Detection of Heat-Damaged Soybean Seeds Based on Hyperspectral Imaging Technology. AgriEngineering, 8(2), 45. https://doi.org/10.3390/agriengineering8020045

Article Menu

Early Detection of Heat-Damaged Soybean Seeds Based on Hyperspectral Imaging Technology

Abstract

1. Introduction

2. Materials and Methods

2.1. Heat Treatment of Soybean Seeds

2.2. Hyperspectral Data Acquisition

2.3. Data Preprocessing

2.4. Data Dimensionality Reduction

2.5. Lévy–Sine-Enhanced Egret Swarm Optimization Algorithm (LSESOA)

2.6. Model Construction

3. Results and Discussion

3.1. Pre-Processing of Spetral Data

3.2. Dimensionality Reduction Result Analysis

3.3. Classification Model Analysis

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI