1. Introduction
Soybean seeds are rich in protein and oil but low in cholesterol, making them a key global source of food and plant-based oil [
1]. In 2023, global soybean production reached approximately 370 million tons, necessitating large-scale storage and long-distance transportation [
2]. During both warehouse storage and transoceanic shipping, poor ventilation, elevated temperatures, and excess moisture often result in heat accumulation within bulk soybean piles [
3,
4]. Under these conditions, intensified respiration and microbial activity release heat and moisture, which are difficult to dissipate, particularly in the core of the stack. This self-heating effect can lead to protein degradation, lipid oxidation, and other forms of irreversible quality loss, commonly referred to as heat damage [
5].
Previous studies have shown that soybean seeds stored at ambient temperatures (25 °C) for 30 days experience minimal deterioration, with protein solubility decreasing by only 2–5% [
6,
7]. However, in real-world storage environments, internal pile temperatures can rise 15–20 °C above ambient due to heat generated by seed respiration and microbial metabolism. In extreme cases, such as during summer storage or in sealed ship holds, temperatures may exceed 45 °C [
8,
9,
10,
11,
12]. After 30 days at 45 °C, soybeans typically exhibit visible signs of heat damage, including seed coat wrinkling, discoloration, reduced protein and oil extractability, and lower unsaturated fatty acid content [
6,
13]. For this study, 25 °C was chosen to prepare normal soybean samples, while 45 °C was used to induce heat damage, establishing a controlled and practically relevant thermal damage model.
Traditional methods for detecting heat-damaged soybean seeds often rely on RGB imaging and visual classification [
14,
15,
16]. These approaches identify damaged seeds based on external traits, such as color and texture. Healthy seeds typically have smooth, yellowish surfaces, while severely damaged seeds appear brown or reddish-brown [
17]. However, these methods are less effective at detecting mildly heat-damaged seeds, where internal biochemical changes occur without visible surface alterations. This limitation highlights the need for more sensitive detection techniques.
Hyperspectral imaging (HSI) offers a powerful alternative to traditional methods, as it captures spectral data across hundreds of narrow bands, revealing subtle chemical and structural changes that are invisible to RGB imaging [
18,
19]. Previous studies have demonstrated HSI’s effectiveness in seed quality evaluation, such as predicting sunflower seed vigor or assessing the oil content of soybean seeds [
18,
20]. However, the practical application of HSI is often hindered by high data dimensionality, which increases computational costs and reduces model generalization. As a result, effective dimensionality reduction becomes essential for its efficient use [
17,
21,
22,
23,
24].
To address this challenge, various dimensionality reduction methods have been explored, and they can be broadly categorized into feature selection and feature extraction techniques. Feature selection methods, such as the Successive Projections Algorithm (SPA) and Competitive Adaptive Reweighted Sampling (CARS), focus on retaining the most informative wavelengths while discarding redundant ones. For instance, Zhang et al. used SPA combined with a Random Forest (RF) model to predict grain filling rate [
25], and Wang et al. applied CARS to identify key wavelengths in sweet corn seed evaluation [
26]. On the other hand, feature extraction methods, such as Principal Component Analysis (PCA) [
27], transform the data into uncorrelated variables, improving compactness at the cost of interpretability. Huang et al. used a combination of methods, including Recursive Feature Elimination (RFE), Correlation Analysis (CA), and Variable Importance in Projection (VIP), to select a small number of relevant wavelengths for evaluating maize seed viability [
28].
Support Vector Machines (SVM) are known for their strong generalization ability and robustness [
29], making them widely used in hyperspectral data modeling. For example, Almoujahed et al. applied SVM to detect fusarium head blight in wheat using visible–near-infrared spectral data, achieving a classification accuracy of 95.6% [
30]. Feng et al. developed an SVM model to predict nitrogen content in winter wheat based on hyperspectral vegetation indices [
31], while Liu et al. used support vector regression combined with band selection to estimate chlorophyll content in peanut leaves [
32]. These studies confirm the suitability of SVM for agricultural spectral analysis.
However, the performance of SVM models heavily depends on the appropriate selection of hyperparameters, particularly the penalty parameter c and the kernel parameter g. To tackle this issue, we employed a modified version of the Egret Swarm Optimization Algorithm (ESOA), known as the Lévy–Sine-enhanced Egret Swarm Optimization Algorithm (LSESOA). By integrating the Lévy flight strategy and sine–cosine search mechanism, LSESOA enhances global exploration capabilities and mitigates the risk of premature convergence. This metaheuristic was used to optimize the parameters of the Support Vector Classifier (SVC), leading to the development of the LSESOA–SVC model for classifying heat-damaged soybean seeds.
In this work, we integrate hyperspectral imaging with a support vector classifier (SVC) for the detection of heat-damaged soybean seeds. The methodological framework comprises: (1) an RFE-PCA cascade for spectral dimensionality reduction, (2) the LSESOA metaheuristic for SVC hyperparameter optimization, and (3) a comparative evaluation against other optimization-classifier combinations.
4. Conclusions
This study proposed a non-destructive framework for the early detection of heat-damaged soybean seeds by integrating hyperspectral imaging with a support vector classifier optimized using a novel Lévy–Sine-Enhanced Egret Swarm Optimization Algorithm (LSESOA). The application of HSI enabled the effective identification of internal biochemical changes, particularly in mildly damaged seeds where visible symptoms were lacking. To reduce spectral dimensionality while preserving classification performance, a combination of multiplicative scatter correction (MSC) and a recursive feature elimination–principal component analysis (RFE-PCA) cascade was employed, reducing the number of features from 184 to 9. This efficient processing chain achieved a dramatic data compression: from approximately 32 GB of raw hyperspectral images for the 1200 seeds to a final discriminative feature set of only about 0.7 MB, representing a compression ratio exceeding 45,000:1. Consequently, the “digital footprint” for the classification model is reduced to a practical level of approximately 3.1 MB per kilogram of seeds. The optimized LSESOA–SVC model achieved a classification accuracy of 96.8% on the test set.
These results indicate that the proposed method has potential for rapid and non-invasive assessment of soybean seed quality under heat stress. Since the current validation was conducted under laboratory conditions, further testing in large-scale storage and in-line inspection scenarios is needed. Future work should also consider representative sampling strategies and broader temperature–humidity conditions to improve model robustness. In addition, establishing clearer links between spectral responses, heat damage severity, and physiological indicators such as germination performance and seed vigor will help enhance the practical relevance of this approach.