Identifying Optimal Wavelengths from Visible–Near-Infrared Spectroscopy Using Metaheuristic Algorithms to Assess Peanut Seed Viability

: Peanuts, owing to their composition of complex carbohydrates, plant protein, unsaturated fatty acids, and essential minerals (magnesium, iron, zinc, and potassium), hold signiﬁcant potential as a vital component of the human diet. Additionally, their low water requirements and nitrogen ﬁxation capacity make them an appropriate choice for cultivation in adverse environmental conditions. The germination ability of seeds profoundly impacts the ﬁnal yield of the crop; assessing seed viability is of extreme importance. Conventional methods for assessing seed viability and germination are both time-consuming and costly. To address these challenges, this study investigated Visible–Near-Infrared Spectroscopy (Vis/NIR) in the wavelength range of 500–1030 nm as a nondestructive and rapid method to determine the viability of two varieties of peanut seeds: North Carolina-2 (NC-2) and Spanish ﬂower (Florispan). The study subjected the seeds to three levels of artiﬁcial aging through heat treatment, involving incubation in a controlled environment at a relative humidity of 85% and a temperature of 50 ◦ C over 24 h intervals. The absorbance spectra noise was signiﬁcantly mitigated and corrected to a large extent by combining the Savitzky–Golay (SG) and multiplicative scatter correction (MSC) methods. To identify the optimal wavelengths for seed viability assessment, a range of metaheuristic algorithms were employed, including world competitive contest (WCC), league championship algorithm (LCA), genetics (GA), particle swarm optimization (PSO), ant colony optimization (ACO), imperialist competitive algorithm (ICA), learning automata (LA), heat transfer optimization (HTS), forest optimization (FOA), discrete symbiotic organisms search (DSOS), and cuckoo optimization


Introduction
Peanuts hold significant economic importance globally, serving as a stable and costeffective source of complex carbohydrates, plant protein, unsaturated fatty acids, and essential minerals such as magnesium, iron, zinc, and potassium [1].This plant's ability to thrive with low water requirements makes peanuts a favorable choice for cultivation in arid and semi-arid regions [2].Moreover, in adverse environmental conditions, it exhibits resistance and is capable of performing adequately in poor soil conditions [3].Additionally, peanuts' excellent nitrogen fixation properties make them a suitable rotation option with cereal crops in successive planting seasons, contributing to soil fertility improvement [4].Since peanut cultivation is conducted via seeds, evaluating seed germination vigor and viability is of paramount importance.
Seeds are considered the fundamental elements in agriculture and forestry, as they are directly or indirectly involved in establishing fields for various crops, vegetables, fruits, fodder, and economic forest products [5].The quality of seeds profoundly impacts crop growth uniformity, yield, and overall crop quality.Furthermore, the safety and quality of seeds and their products directly affect human health [6].Among the vital parameters associated with seed quality, seed vigor represents a key criterion for assessing seed quality, as it reflects the potential for seed germination, germination in the field, resistance to biotic and abiotic stresses, and the ability to withstand different storage conditions compared to standard germination tests [7].Furthermore, it is well-established that seeds with desirable viability capabilities, achieved through significant yield performance for farmers and reduced crop diversity, will be profitable for seed industries [8].A vigorous seed possesses the potential to thrive in environmental conditions that may not be optimal for its species.Such seeds exhibit high and uniform germination rates, quick germination, and produce robust seedlings, ultimately leading to higher field yields [9].The study of the relationship between the growth rates of different parts of an organism or the creature as a whole is known as allometry.In identical environmental conditions, the growth and functioning of both the root and aerial systems are closely interrelated, and this relationship can be quantified through allometric relationships.Specifically, the allometric coefficient, which is calculated based on the length of the aerial parts and roots, represents the ratio of shoot length to root length [10].Several research studies have indicated that the allometric coefficient is influenced by seed vigor and can be served as an indicator for diagnosing seed quality [11,12].Traditionally, various methods, including standard germination tests, electrical conductivity tests, seedling growth tests, accelerated aging tests, cold tests, and tetrazolium tests, have been proposed and employed to evaluate seed germination.However, these methods typically require significant time, are nonautomatic and may lead to seed destruction, and often necessitate specialized training and expertise.Consequently, they are not well-suited for large-scale applications or for protecting endangered species.Therefore, nondestructive and high-throughput screening methods are essential for the seed industry to provide high-quality seeds with superior characteristics to ensure the supply of high-germination seeds to farmers before planting [13].
In recent years, there have been significant advancements in electronic technologies and equipment, leading to notable improvements in the resolution and accuracy of lightand image-based systems.These systems are now capable of determining qualitative indicators of the chemical components of materials, either in a static setting or online in production lines.This progress has enabled the fast and precise classification of materials with reduced labor requirements [14].Light-and image-based detection systems have been successfully employed in assessing the quality of agricultural food products, offering reliable and accurate results.By minimizing the influence of human intervention, these systems have become a preferred approach due to their consistency and stability [15][16][17].Among the noninvasive methods used for identifying the chemical components of agricultural products, near-infrared spectroscopy (NIR) has gained widespread popularity in recent years.NIR operates based on the absorption of electromagnetic radiation within the wavelengths of 780 to 2500 nm [18].When agricultural products are exposed to this radiation, their spectral response varies depending on the wavelength due to scattering and absorption processes.The tissue structures of these products, consisting of cells and intracellular/extracellular environments, are responsible for radiation scattering.Additionally, the absorption of electromagnetic rays is mainly influenced by C\\H, O\\H, and N\\H bonds present in major compounds such as water, sugars, chlorophylls, carotenoids, and so on.The NIR spectrum comprises broad wavebands resulting from the overlapping of absorption bands, which are closely associated with the colors and combinations of these chemical bonds.As a result, organic and biological substances can be effectively detected using NIR spectroscopy [19].The investigation of artificially aged soybean seeds in comparison to healthy seeds revealed that changes in radiation absorption within the wavelength range of 1000-2500 nm can effectively distinguish between healthy and old seeds [20].Similarly, differentiating between viable and nonviable soybean seeds, which underwent accelerated aging through heat treatment, was accomplished using NIR reflectance spectra in the wavelength range of 400-2500 nm.Partial least square discriminant analysis (PLS-DA) was employed in this research to classify viable and nonviable seeds, with the best model achieving an accuracy of 95% in the short-wave infrared (SWIR) region of 750-2500 nm [21].In the case of tomato seeds subjected to accelerated aging, NIR spectroscopy in absorption mode, within the wavelength range of 911-2258 nm, was utilized to classify viable and nonviable seeds.Both PLS-DA and interval partial least squares discriminant analysis (iPLS-DA) were employed to construct corresponding models.Specific spectral regions (1160-1170, 1383-1397, 1647-1666, 1884-1860, and 1915-1940 nm) were identified via iPLS-DA for the classification of viable and nonviable tomato seeds, resulting in a classification accuracy of 94% [22].In a study focusing on spinach seeds, NIR spectroscopy within the wavelength range of 833-1667 nm was used to differentiate between viable and nonviable seeds.The optimal wavelengths were selected using successive projections algorithms (SPA), and classification models created with these 10 selected wavelengths demonstrated satisfactory accuracy in distinguishing viable seeds from nonviable ones [23].
Recently, several studies have investigated the feasibility of Fourier transform infrared spectroscopy (FTIR) and laser-induced breakdown spectroscopy (LIBS) methods to detect seed vigor of soybean and Brachiaria.The results of this research have stated that the FTIR method is able to provide information about carbohydrates, proteins, amides, and lipids in seeds, considering that it is sensitive to the molecular changes of substances.These substances are known as the main molecules influencing seed viability, so the FTIR method, due to its sensitivity to the changes of these molecules, was able to accurately determine the seed vigor of soybean [24] and Brachiaria [25].Also, LIBS is a technique that is able to identify the elements in a substance.In the research that focused on the identification of seed vigor of soybean and Brachiaria using LIBS, it was found that the presence of Ca elements in the seeds is the main characteristic responsible for the major variance in the data.Therefore, due to the fact that Ca elements play an important role in enzyme activities of plant during germination, LIBS was able to determine the seed vigor of soybean [26] and Brachiaria [27] by identifying Ca elements.
Although previous studies revealed that the use of NIR spectroscopy for diagnosing seed viability had acceptable accuracy, the practical and commercial feasibility of using the entire wavelength range is not economical.This can be explained by the high cost associated with producing spectroscopic instruments based on the full spectrum [28].Therefore, there is a need to explore methods for identifying optimal wavelengths that would enable the production of industrial-commercial tools at a lower cost.The application of chemometrics techniques in analyzing spectroscopic data poses a fundamental challenge due to the high dimensionality of the data set.This refers to situations where the number of features greatly exceeds the size of the data set itself [29].For instance, in spectroscopic applications with a large number of wavelengths, the classification parameters also increase, leading to a significant decrease in the performance of the classification tool [30].When obtaining a substantial number of training data becomes impractical, reducing the size of the feature subset becomes crucial as it helps in reducing the number of required training data and, in turn, enhances the performance of the classification algorithm [31].Dimension reduction serves as a common method to address this challenge by removing noise and unnecessary features.It proves to be an efficient approach for improving accuracy, reducing computational complexity, building more generalized models, and reducing storage space requirements [32].The main idea behind feature selection is to select a subset of features by eliminating those with little or no informative value and removing highly correlated features [33].Generally, feature selection methods aim to optimize two conflicting objectives: maximizing the association with the target class and minimizing redundancy (correlation) among the selected features [34].
The current research aimed to develop an intelligent model based on NIR spectroscopy data and machine learning analysis to assess the viability of peanut seeds.To achieve this, two peanut cultivars, North Carolina-2 (NC-2) and Florispan, were selected, that were exposed to three levels of artificial aging.The NIR spectroscopy data of the samples were collected, and deep-learning approaches were employed to create predictive models for germination indicators.Moreover, metaheuristic variable selection methods such as world competitive contest (WCC), league championship algorithm (LCA), genetic algorithm (GA), particle swarm optimization (PSO), ant colony optimization (ACO), imperialist competitive algorithm (ICA), learning automata (LA), heat transfer optimization algorithm (HTS), forest optimization algorithm (FOA), discrete symbiotic organism search (DSOS), and cuckoo optimization (CUK) were used to select optimal wavelengths based on seed age and qualitative classification of peanuts.

Materials and Methods
Due to the fact that the implementation of this research includes several different stages, the flow chart of the research implementation process has been drawn in Figure 1 so that the readers have a better insight into this research.In the rest of this section, the steps mentioned in Figure 1 are described in order.Due to the fact that the measurement methods used in this research are accepted as standard methods, their detailed description is omitted, and the reader is referred to the original sources.Also, considering that variable selection algorithms and machine learning methods are accepted as scientific and practical methods and can be implemented in different software packages, the description of the mathematical methods of their implementation has been omitted.In order to prevent the length of the article, interested readers have been given references to the original articles of the inventors of these methods.

Seed Selection and Aging Treatment
Two common peanut seed cultivars for cultivation in Iran, namely North Carolina 2 (NC-2) and Florispan, were chosen for the experiments.Seeds from the last crop year were selected to ensure optimal conditions for survival and germination vigor.The seeds with similar mass and size were selected to minimize the impact of unfavorable factors on the experiment results.Seeds with length in the range of 17-18 mm, width in the range of 8-9 mm, thickness in the range of 9-10 mm, and mass in the range of 1.1-1.2g were considered.To induce artificial senescence, 300 seeds of each cultivar were subjected to accelerated aging treatment in three time intervals, with 24 h between each interval.The seeds were placed in a single layer of aluminum nets positioned above water containers in an incubator.Before placing the seeds, the containers were thoroughly cleaned with a 15% sodium hypochlorite solution to prevent fungal contamination.The incubator was set to maintain a relative humidity of 85% and a temperature of 50 • C. At 24 h intervals, one-third of the samples were removed from the incubator, resulting in three different aging periods for the seeds [35].
Agronomy 2023, 13, x FOR PEER REVIEW 5 of 19 were placed in a single layer of aluminum nets positioned above water containers in an incubator.Before placing the seeds, the containers were thoroughly cleaned with a 15% sodium hypochlorite solution to prevent fungal contamination.The incubator was set to maintain a relative humidity of 85% and a temperature of 50 °C.At 24 h intervals, onethird of the samples were removed from the incubator, resulting in three different aging periods for the seeds [35].

Preparation of Vis/NIR Spectra from Samples
A PS-100 model spectroradiometer (Apogee Instruments, INC., Logan, UT, USA) was utilized to acquire the spectra of peanut seeds.This spectroradiometer is compact, lightweight, and portable, equipped with a sputtering-type monochromator with a resolution of 1 nm and a linear silicon CCD array detector containing 2048 pixels, covering the spectral range of 250-1150 nm (Vis/NIR).Furthermore, the spectroradiometer PS-100 can be connected to a computer via an optical fiber, and the acquired spectra are displayed and stored in the SpectraWiz ® spectrometer software through a USB port.For obtaining the absorption spectra of the samples, a probe-detector sensor was employed, which is designed in a way that the light source is positioned at a 45° angle to the detector, allowing for the acquisition of internal diffuse reflection rather than using mirror reflection acquisition from the sample.Internal diffuse reflection occurs when the radiation penetrates into the cell structure of the sample and after hitting the common surfaces of the cell wall and spreading, it goes back and leaves the sample surface [36].In this measuring method, light radiation penetrates into the sample, and a part of the radiation is absorbed depending on the molecular structure of the sample, and the rest of the radiation exits the sample at an angle of 45 degrees and is detected via the detector.In this way, the effects of external and internal composition of the sample on the amount of radiation absorption can be measured simultaneously.This internal diffuse reflection provides information about the internal contents of the sample.Figure 2 illustrates the process of acquiring the spectrum using the probe-detector sensor.

Preparation of Vis/NIR Spectra from Samples
A PS-100 model spectroradiometer (Apogee Instruments, Inc., Logan, UT, USA) was utilized to acquire the spectra of peanut seeds.This spectroradiometer is compact, lightweight, and portable, equipped with a sputtering-type monochromator with a resolution of 1 nm and a linear silicon CCD array detector containing 2048 pixels, covering the spectral range of 250-1150 nm (Vis/NIR).Furthermore, the spectroradiometer PS-100 can be connected to a computer via an optical fiber, and the acquired spectra are displayed and stored in the SpectraWiz ® spectrometer software through a USB port.For obtaining the absorption spectra of the samples, a probe-detector sensor was employed, which is designed in a way that the light source is positioned at a 45 • angle to the detector, allowing for the acquisition of internal diffuse reflection rather than using mirror reflection acquisition from the sample.Internal diffuse reflection occurs when the radiation penetrates into the cell structure of the sample and after hitting the common surfaces of the cell wall and spreading, it goes back and leaves the sample surface [36].In this measuring method, light radiation penetrates into the sample, and a part of the radiation is absorbed depending on the molecular structure of the sample, and the rest of the radiation exits the sample at an angle of 45 degrees and is detected via the detector.In this way, the effects of external and internal composition of the sample on the amount of radiation absorption can be measured simultaneously.This internal diffuse reflection provides information about the internal contents of the sample.Figure 2 illustrates the process of acquiring the spectrum using the probe-detector sensor.

Standard Germination Test
For the standard germination test, 100 samples were selected for each treatment.The germination test was conducted using the method of placing seeds between wet papers.The samples were placed in a germinator with a constant temperature of 25 • C and kept in these conditions for 10 days to facilitate germination.Before conducting the test, the containers used were disinfected with a 15% hypochlorite solution, and the peanut seeds were treated with 1% mercury chloride [37].The emergence of a two mm radicle was considered as a standard for seed germination.The identification and counting of normal and abnormal seedlings (abnormal seedlings include seedlings without a primary root system, or weak secondary roots, with necrotic spots in the tissue, and seedlings with a damaged terminal bud or a missing cotyledon) were performed according to the guidelines of the International Seed Testing Association (ISTA) from the fifth day up to the tenth day.On the tenth day, the seedlings were placed in a dryer at a temperature of 60 • C for 24 h [10].The mass and length of the seedlings were measured using a scale with an accuracy of 0.0001 g and a caliper with an accuracy of 0.01 mm, respectively.Based on the counts and measurements, various indicators related to seed germination were calculated for each treatment group.These indicators include germination energy (GE), mean daily germination (MDG), germination value (GV), daily germination speed (DGS), and germination vigor (GVI).Additionally, the allometric coefficient (AC) was calculated for each seed.The relationships used to calculate the seed germination indices were presented in Table 1.

Standard Germination Test
For the standard germination test, 100 samples were selected for each treatment.The germination test was conducted using the method of placing seeds between wet papers.The samples were placed in a germinator with a constant temperature of 25 °C and kept in these conditions for 10 days to facilitate germination.Before conducting the test, the containers used were disinfected with a 15% hypochlorite solution, and the peanut seeds were treated with 1% mercury chloride [37].The emergence of a two mm radicle was considered as a standard for seed germination.The identification and counting of normal and abnormal seedlings (abnormal seedlings include seedlings without a primary root system, or weak secondary roots, with necrotic spots in the tissue, and seedlings with a damaged terminal bud or a missing cotyledon) were performed according to the guidelines of the International Seed Testing Association (ISTA) from the fifth day up to the tenth day.On the tenth day, the seedlings were placed in a dryer at a temperature of 60 °C for 24 h [10].The mass and length of the seedlings were measured using a scale with an accuracy of 0.0001 g and a caliper with an accuracy of 0.01 mm, respectively.Based on the counts and measurements, various indicators related to seed germination were calculated for each treatment group.These indicators include germination energy (GE), mean daily germination (MDG), germination value (GV), daily germination speed (DGS), and germination vigor (GVI).Additionally, the allometric coefficient (AC) was calculated for each seed.The relationships used to calculate the seed germination indices were presented in Table 1.

T [40]
Where MCGP is the maximum percentage of cumulative germination, N is total number of seeds sown, ti is the number of days after the start of germination, GP is percentage of germination final yield, T is length of germination period (days), SFW is seedling wet weight (grams), SDW is seedling dry weight (grams), PL is seedling length (centimeters), and PR is root length (centimeters).

Preprocessing of Vis/NIR Spectra
After acquiring the spectra and transferring them to the computer using Excel 2013 software, a single spectrum was created by averaging the two acquired spectra from the sides of the peanut, representing the index spectrum of each sample.Spectral data may contain irrelevant information and noise, such as fluorescence background, stray light, detector noise, cosmic rays, instrument noise, laser power fluctuations, and so on.To extract accurate information and enhance subtle differences between different samples, spectral preprocessing is a crucial step in spectral data analysis [39].
In this study, the first step of spectral preprocessing involved using the Savitzky-Golay (SG) method to smooth the curves and remove random noise.This method effectively smooths out slight fluctuations caused by noise in the curve while enhancing spectral peaks related to changes in the sample components [39,40].Subsequently, multiplicative scatter correction (MSC) was employed to eliminate the noise caused by light scattering.MSC removes baseline translations and displacements caused by scattering effects between samples, thereby improving the signal-to-noise ratio of the original spectrum [41].The Unscrambler 10.4 software was used for spectral data preprocessing.Figure 3 depicts the main spectrum curves of the samples and their preprocessed curves associated with each cultivar.As it is clear in Figure 3c,d, after applying preprocessing on the spectral curves, the unrealistic variance between the curves has been reduced.Also, in the range of wavelengths of 520-530 and 590-610 nm, it can be seen that the inflection point of the curves rotate in different directions.Also, in the range of wavelengths of 555-565 and 945-955, some curves are at the relative maximum point and some others are at the relative minimum point.Such differences that have been revealed due to the application of preprocessors will help to identify the optimal wavelengths.sides of the peanut, representing the index spectrum of each sample.Spectral data may contain irrelevant information and noise, such as fluorescence background, stray light, detector noise, cosmic rays, instrument noise, laser power fluctuations, and so on.To extract accurate information and enhance subtle differences between different samples, spectral preprocessing is a crucial step in spectral data analysis [39].
In this study, the first step of spectral preprocessing involved using the Savitzky-Golay (SG) method to smooth the curves and remove random noise.This method effectively smooths out slight fluctuations caused by noise in the curve while enhancing spectral peaks related to changes in the sample components [39,40].Subsequently, multiplicative scatter correction (MSC) was employed to eliminate the noise caused by light scattering.MSC removes baseline translations and displacements caused by scattering effects between samples, thereby improving the signal-to-noise ratio of the original spectrum [41].The Unscrambler 10.4 software was used for spectral data preprocessing.Figure 3 depicts the main spectrum curves of the samples and their preprocessed curves associated with each cultivar.As it is clear in Figure 3c,d, after applying preprocessing on the spectral curves, the unrealistic variance between the curves has been reduced.Also, in the range of wavelengths of 520-530 and 590-610 nm, it can be seen that the inflection point of the curves rotate in different directions.Also, in the range of wavelengths of 555-565 and 945-955, some curves are at the relative maximum point and some others are at the relative minimum point.Such differences that have been revealed due to the application of preprocessors will help to identify the optimal wavelengths.

Methods of Choosing the Optimal Wavelength
Optimization methods and algorithms are divided into two categories: exact algorithms and approximate algorithms.Exact algorithms are able to find the optimal solution accurately, but they are not efficient enough for hard optimization problems, and their execution time increases exponentially according to the dimensions of the problems.Approximate algorithms are able to find good solutions (near optimal) in a short solution time for hard optimization problems [42].Approximate algorithms are divided into three categories: heuristic, metaheuristic, and hyper heuristic.The two main problems of heuristic algorithms are that they get stuck in local optimal points and that they display premature convergence to these points.Metaheuristic algorithms are presented to solve these heuristic algorithm problems.In fact, metaheuristic algorithms are one of the types of approximate optimization algorithms that have solutions for exiting from local optimal points and can be used in a wide range of problems [43].
Agronomy 2023, 13, 2939 8 of 18 To address complex optimization problems with numerous variables, metaheuristic algorithms are employed as suitable and efficient approaches.These algorithms rapidly provide approximate solutions, avoiding the need for time-consuming optimal solutions [44].
A key advantage of metaheuristic algorithms is their ability to escape local optima, ensuring a more comprehensive search for optimal solutions [45].The working principle of these algorithms involves introducing a set of initial solutions randomly.A fitness function is then calculated to assess the optimality of each solution in the initial population.If the statistical criteria for optimization quality are not met, the algorithm produces a new generation of solutions, repeating the cycle until the desired optimization criteria are satisfied [46].Metaheuristic approaches are typically categorized into two main groups: evolutionary algorithms (EA) and swarm intelligence (SI) [47].Evolutionary algorithms have tried to simulate the process of genetic evolution of organisms or communities using mathematical principles.EA draws inspiration from biological evolution mechanisms such as reproduction, mutation, recombination, and selection.In the context of optimization problems, the introduced solutions represent individuals within a population, and the fitness function evaluates the quality and accuracy of these solutions.Through iterative steps, the evolutionary algorithm facilitates the evolution of the initial population toward overall optimization [48].In contrast, swarm intelligence optimization methods involve a collection of simple solutions with no complexity of artificial agents.In general, swarm intelligence algorithms try to simulate the routing pattern of different organisms to reach food by using mathematical principles.The concept behind SI algorithms is inspired by natural systems, where each agent performs a basic task.However, the interaction, cooperation, and somewhat random responses of these agents lead to emergent intelligent behavior that is not achievable by any individual agent alone [49].SI-based feature selection methods have been utilized in previous research, and their operational principles have been thoroughly described in the literature [30].In this study, the process of selecting optimal wavelengths was carried out using variable selection methods based on various metaheuristic algorithms, including (WCC) [50], LCA [51], GA [52], PSO [53], ACO [54], ICA [55], LA [56], HTS [57], and FOA [58].The optimization of optimal wavelength selection was carried out using the FeatureSelect software package within MATLAB 2017 [59].

Modeling Methods to Predict Seed Viability
Traditional artificial intelligence methods typically employed a mathematical representation approach to describe optimization problems and discover optimal solutions under specific constraints based on logical mathematical principles.Such an approach is commonly referred to as knowledge-based [60].Nevertheless, in most natural phenomena in which predicting trends or classifying an occurrence should be conducted, logical-mathematical laws may not be able to describe these phenomena adequately.This limitation arises from the fact that these phenomena are often abstract and not easily captured with mathematical formulations [61].To overcome the limitations of the knowledgebased approach, an alternative strategy inspired by human processes has been developed.Humans are capable of learning from repeated tasks, receiving feedback, and adjusting their decisions or actions accordingly to achieve favorable outcomes [60].This approach, based on iterative learning from experience, is referred to as a learning-based approach, in contrast to the knowledge-based approach [61].Similarly, we can develop machines to perform specific tasks using a learning-based approach, known as machine learning (ML).In this current study, machine learning methods such as LR [62], DT [63], MP, SVM [64], k-NN [65], and NB [66] have been employed in the WEKA 3.8.6 software package for detecting and classifying the vitality of seeds.

Evaluation Criteria of Optimal Wavelength Selection Algorithms and Machine Learning Models
To assess the effectiveness and performance of the optimal wavelength selection algorithms, two statistical measures were utilized: the root mean square error (RMSE) and the coefficients of determinate (R 2 ) [67,68].These measures were calculated using Equations ( 1) and ( 2) [68][69][70][71]: where y i , ýi , and y are predicted, actual, and mean values, respectively.
To evaluate the accuracy of the classification models for peanut seed viability, several statistical criteria were employed, including accuracy, precision, sensitivity, specificity, and the receiver operating characteristic (ROC).These criteria were calculated using the following equations: Accuracy refers to how close the measured value is to the true value.Precision indicates the closeness of successive measurements to each other (i.e., the consistency of errors in the various measurements).Sensitivity represents the fraction of positive cases correctly identified.And specificity denotes the fraction of negative cases correctly identified.These metrics were calculated using Equations ( 3)-( 6 where True Positive is the number of samples in the i-th category that the algorithm correctly recognized.False Negative is the number of samples in the i-th category that the algorithm has misdiagnosed.False Positive is the number of samples outside the i-th category that the algorithm placed in the i-th category.True Negative is the number of samples outside the i-th category that the algorithm did not place in the i-th category.

Examination of Seed Viability Indices
Table 2 presents the results related to seed viability indices for each treatment.Accordingly, it is evident that the accelerated aging test significantly impacted the seed groups, and there were noticeable differences in seed viability indices among the treatments.With an increase in the aging period, the germination percentage for both seed varieties significantly decreased, and in the third period, almost half of the seeds failed to germinate.

Comparison of the Efficiency of Optimal Wavelength Selection Methods
In this study, to carry out the process of selecting the seed allometric coefficient as the continuous output variable (target) and wavelengths as the input variable (independent variable), regression was considered.The higher the seed allometric coefficient, the higher the likelihood of germination [10].The selection of variables was considered a regression problem; thus, optimization algorithms were employed to search for wavelengths that create regression models with the highest correlation between actual values and predicted values.Table 3 provides descriptive statistical measures of algorithm accuracy and the number of optimally selected wavelengths via each algorithm.Based on Table 3, it is evident that all variable selection algorithms exhibit a high level of accuracy (CR > 0.98) and low error (RMSE < 0.003) in predicting the allometric coefficient.Therefore, it appears that the most logical and appropriate criterion for selecting the optimal algorithm is based on its execution time.Algorithms that require less computational time are more practical for commercial-scale implementation [67][68][69][70].This time difference is considered an extremely significant and important feature in commercial and practical applications.In research contexts, the seed recognition system encounters only one seed at a time, but at the commercial scale, the seed recognition system encounters millions of seeds at a time.The shorter the execution time of the algorithm, the less the computing load on the hardware, the less the seed separation time, and the system performance increases.The ranking of variable selection algorithms based on execution time is as follows: ICA < PSO < GA < HTS < FOA < WCC < DSOS < ACO < CUK < LA < LCA.Previous research has also indicated the high popularity and practicality of ICA [72], PSO [73], and GA [74], algorithms due to their low execution time.
On the other hand, the number of selected wavelengths is also a crucial criterion in industrial-scale applications because the cost of producing spectroscopic tools for practical purposes is directly dependent on the number of wavelengths detectable via the instrument.As the number of wavelengths decreases, the production cost decreases accordingly [14].Additionally, instruments capable of detecting fewer wavelengths can provide higher accuracy and resolution in their measurements [75].Therefore, the variable selection algorithms are ranked based on the number of optimal wavelengths as follows: LCA < FOA < CUK < WCC < PSO < GA < DSOS < ACO < HTS < LA < ICA.The LCA and FOA algorithms perform superior than others by identifying 10 optimal wavelengths.Considering that the FOA algorithm's execution time is twice as fast, it can be considered the optimal method for wavelength selection.
In Figure 4, the algorithms' performance are compared in terms of correlation and RMSE in each round of algorithm execution.The LA algorithm's results showed a strong correlation between the number of executions and its performance, achieving lower error and higher correlation with each round of execution.Conversely, the DSOS algorithm is relatively unaffected by more executions, as re-executing the algorithm does not result in significant changes in its error rate and correlation.These findings align with the results presented by Masoudi-Sobhanzadeh et al. [59], who implemented and compared the algorithms using different datasets.
< FOA < CUK < WCC < PSO < GA < DSOS < ACO < HTS < LA < ICA.The LCA and FOA algorithms perform superior than others by identifying 10 optimal wavelengths.Considering that the FOA algorithm's execution time is twice as fast, it can be considered the optimal method for wavelength selection.
In Figure 4, the algorithms' performance are compared in terms of correlation and RMSE in each round of algorithm execution.The LA algorithm's results showed a strong correlation between the number of executions and its performance, achieving lower error and higher correlation with each round of execution.Conversely, the DSOS algorithm is relatively unaffected by more executions, as re-executing the algorithm does not result in significant changes in its error rate and correlation.These findings align with the results presented by Masoudi-Sobhanzadeh et al. [59], who implemented and compared the algorithms using different datasets.

Examination of Averages of Vis/NIR Absorption Spectra and Evaluation of the Location of the Selected Wavelength
Figure 5 displayed the mean Vis/NIR absorption spectra for three seed aging periods along with the locations of the optimum selected wavelengths.Accordingly, as the seeds' age increases, the absorption levels decrease along the entire curve.This phenomenon can be attributed to the reduction of water in the sample due to the aging process, because the amount of radiation absorption largely depends on the amount of water in the chemical components [76].Similar trends have been reported by other researchers [47,75,77].The  Figure 5 displayed the mean Vis/NIR absorption spectra for three seed aging periods along with the locations of the optimum selected wavelengths.Accordingly, as the seeds' age increases, the absorption levels decrease along the entire curve.This phenomenon can be attributed to the reduction of water in the sample due to the aging process, because the amount of radiation absorption largely depends on the amount of water in the chemical components [76].Similar trends have been reported by other researchers [47,75,77].The absorption changes in the spectral range of 500-550 nm can be attributed to carotenoids and anthocyanins [78].Furthermore, changes in the range of 650-700 nm are related to the presence of chlorophyll [79], whereas changes in 680-760 nm were associated with the presence of amino acids in the seed [77].On the other hand, changes in the region of 750-850 nm were attributed to the water content in the sample [80].The alterations in the 860-910 nm indicate the presence of CH and OH bonds in carbohydrates [81].Furthermore, changes in the 930-1030 nm range were explained by the presence of protein compounds in the sample [82].The presence of each of these substances in the seed composition increases the probability of germination, highlighting the importance of their detection [78].Hence, it can be concluded that the variable selection algorithm provides an optimal mode that has selected at least one wavelength in all the mentioned ranges.

Results of Seed Viability Classification Modeling based on Selected Wavelengths via Machine Learning Methods
Table 4 presented the results of the seed viability classification models based on the selected wavelengths using the FOA algorithm (lowest number of wavelengths) and DSOS algorithm (highest number of wavelengths).As shown in Table 4, all classifications exhibited good performance in determining the viability of the seeds.Overall, it can be concluded that the seed classification using selected wavelengths via the FOA algorithm performed better.However, to more precisely check and compare the classifiers' performance, the results are graphically presented in Figure 6.
presence of chlorophyll [79], whereas changes in 680-760 nm were associated with the presence of amino acids in the seed [77].On the other hand, changes in the region of 750-850 nm were attributed to the water content in the sample [80].The alterations in the 860-910 nm indicate the presence of CH and OH bonds in carbohydrates [81].Furthermore, changes in the 930-1030 nm range were explained by the presence of protein compounds in the sample [82].The presence of each of these substances in the seed composition increases the probability of germination, highlighting the importance of their detection [78].Hence, it can be concluded that the variable selection algorithm provides an optimal mode that has selected at least one wavelength in all the mentioned ranges.

Results of Seed Viability Classification Modeling based on Selected Wavelengths via Machine Learning Methods
Table 4 presented the results of the seed viability classification models based on the selected wavelengths using the FOA algorithm (lowest number of wavelengths) and DSOS algorithm (highest number of wavelengths).As shown in Table 4, all classifications exhibited good performance in determining the viability of the seeds.Overall, it can be concluded that the seed classification using selected wavelengths via the FOA algorithm performed better.However, to more precisely check and compare the classifiers' performance, the results are graphically presented in Figure 6.As shown in Figure 6, except for the DT and MP methods, the effectiveness of other methods decreased with an increase in the number of variables.The MP method provides consistent results across both sets of variables (10 and 16 variables), and the DT method exhibits the best performance with increasing variables.LR, SVM, and K-NN methods were strongly affected by the increased number of variables, leading to a sharp drop in their performance.Previous research has indicated that two methods, MP and DT, have favorable capabilities in data mining of high-scale data with many variables.This can be explained by the fact that LR and SVM methods are not affected by the collinearity of variable problems [73].However, the DT method, with its node-branch expansion capability, can check different features in various branches and overcome the problem of collinearity of variables, which is essential for achieving good performance [61,69].On the other hand, the MP method features a layered structure comprising several neurons in each layer.Each neuron in this method models a set of features, thus providing a solution to the problem of nonlinearity among variables [83].However, the LR approach, when executed with a limited number of variables, possesses an advantage over other methods, as it can explicitly describe the relationship between each wavelength and the response variable and rank the importance of wavelengths in the target classification [60].Hence, considering the high accuracy of all algorithms in seed classification, no particular algorithm can be considered superior to others.Consequently, the appropriate algorithm can be chosen based on specific research or operational goals.Various classification methods have been employed in different research to classify seed viability.For instance, corn seed viability was classified using SVM, KNN, random forest, and a deep convolutional neural network (CNN), with the best accuracy achieved using CNN [35].Peanut seed viability was classified using SVM, DT, and LDA, and the best accuracy was obtained using the DT classification [35].Hyperspectral images were used to identify damaged rice seeds with SVM, KNN, DT, and deep forest classifiers, with the new DF classifier, developed based on the DT classifier, providing higher accuracy than other classifiers [84].The germination vigor of sugar beet seeds was predicted using hyperspectral images and KNN, SVM, and RF classifiers, with the SVM classifier providing the best performance [77].As shown in Figure 6, except for the DT and MP methods, the effectiveness of other methods decreased with an increase in the number of variables.The MP method provides consistent results across both sets of variables (10 and 16 variables), and the DT method exhibits the best performance with increasing variables.LR, SVM, and K-NN methods Tables 5-8 display the confusion matrices for DT-DSOS, MP-FOA, LR-DSOS, and KNN-DSOS classifications, respectively, enabling a clear comparison between the best and worst classifications in each treatment.Accordingly, the most accurate diagnosis was related to the third senescence period for both seed varieties, and seeds with poor viability were correctly identified and distinguished from healthy seeds.Additionally, the correct identification of healthy seeds was acceptable, with most misclassified seeds belonging to the second senescence period.As a result, there was no significant difference in the accuracy of the correct diagnosis based on the seed variety.

Figure 2 .
Figure 2. The spectrum acquisition process along with the probe-detector sensor.

Figure 2 .
Figure 2. The spectrum acquisition process along with the probe-detector sensor.

Figure 4 .
Figure 4. Comparison of correlation and RMSE of variable selection algorithms based on convergence, mean convergence, and stability.

Figure 4 .
Figure 4. Comparison of correlation and RMSE of variable selection algorithms based on convergence, mean convergence, and stability.

3. 3 .
Examination of Averages of Vis/NIR Absorption Spectra and Evaluation of the Location of the Selected Wavelength

Figure 5 .
Figure 5. Averages of Vis/NIR absorption spectra and the location of the selected wavelengths.

Figure 5 .
Figure 5. Averages of Vis/NIR absorption spectra and the location of the selected wavelengths.

Figure 6 .
Figure 6.Results of classification for (a) seed viability identification and (b) FOA algorithm selected wavelength.

Figure 6 .
Figure 6.Results of classification for (a) seed viability identification and (b) FOA algorithm selected wavelength.

Table 1 .
Calculation relationships of the studied indicators.

Table 1 .
Calculation relationships of the studied indicators.

Table 2 .
Results of seed viability indices for peanut seeds.

Table 3 .
The results obtained from the variable selection algorithms for the regression problem.
Where AL is algorithm, NOF is the number of features, ET is elapsed time, RMSE is the root mean squared error, and CR (R 2 ) is the squared correlation coefficient.

Table 4 .
Classification results to determine seeds viability.

Table 4 .
Classification results to determine seeds viability.