# TPE-RBF-SVM Model for Soybean Categories Recognition in Selected Hyperspectral Bands Based on Extreme Gradient Boosting Feature Importance Values

^{*}

## Abstract

**:**

## 1. Introduction

- Dataset construction: the hyperspectral images range from 400 nm to 1000 nm, collected from 2299 soybean seeds and four categories were established;
- Feature selection using a boosting algorithm: an extreme gradient boosting algorithm was introduced to reduce redundancy dimensionality in the hyperspectral data. Ten feature bands were determined from the original 203 hyperspectral bands for a subset;
- Optimized RBF-SVM model with TPE: a support vector machine with Gaussian radial basis kernel function is built for the multi-classification pattern recognition task of soybean datasets and the tree-structured Parzen estimator method was introduced to improve the model as TPE-RBF-SVM.

## 2. Materials and Methods

#### 2.1. Soybean Material and Hyperspectral Dataset

#### 2.2. Support Vector Machine Model and Optimization

#### 2.2.1. Support Vector Machine with Gaussian Radial Basis Kernel

**x**

_{i}

^{[m]}, y

_{i}), i= 1, 2, 3, …, n}, composed of n samples of m dimension and corresponding real values $y$, the support vector machine needs to solve the following, Equation (1), iteratively:

**ω**and b are the parameter of point-normal equation of a hyperplane, y =

**ωx**+ b;

**loss**

_{i}= max (0, 1 − y

_{i}(

**ω**

^{T}ψ(

**x**

_{i}) − b)) is the hinge loss of true values and predicted ones in sample space;

**C**is the strength of the regularization of objective function; ψ(x

_{i}) is the map function between sample space of

**x**

_{i}and high-dimensional Hilbert space that introduces kernel function method later.

**Q**is the transformed positive semi-definite Hermitian Matrix, whose elements are Q

_{ij}= y

_{i}y

_{j}ψ(

**x**)

_{i}^{T}ψ(

**x**);

_{j}**e**is the one vector full with the number 1 to ensure the validity of the calculation;

**α**is the dual coefficient vector.

**α**and the support vectors

**SV**= {

**x**

_{k}, k ∈ QP(α)*}. In the solution and application process, the Gaussian radial basis kernel function (RBF) is introduced for mapping the samples into the Hilbert space. For the prediction $\widehat{y}$ of a new sample (

**x**, y), it can be expressed in the following form (3):

**x**

_{i},

**x**)= ψ(

**x**

_{i})

^{T}ψ(

**x**)= exp(−$\frac{\left|\left|{\mathrm{x}}_{\mathrm{i}}-\mathrm{x}\right|\right|}{{2\mathsf{\sigma}}^{2}}$) and the σ is a free parameter to control the mapping process in Gaussian radius basis function [24].

#### 2.2.2. Optimization of SVM with Tree-Structured Parzen Estimator

^{k}(k ∈

**N**) experimentally in the validation dataset or cross-validation [25].

**z**= (C, γ) in the sample space in essence of black-box process [19]. We will implement this by a Bayesian optimization method, the tree-structured Parzen estimator (TPE). It is a sequential-model-based optimization (SMBO) method proposed by Ozaki et al. in 2020 that originally optimizes for large-scale neural networks [20]. The pseudo code of Tree-structured Parzen Estimator method in Algorithm 1 is as follows:

Algorithm 1 The pseudo-code of tree-structured Parzen estimator algorithms (for RBF-SVM) |

1: Initialization H = _{0}∅, z = z_{0} |

2: For: k = 1 to I_{max} |

3: Update hyperparameters: z*=argmin(EI_{k}(P, z_{k}[H_{k-1}])) |

4: RBF-SVM repeat fitting and evaluating: L_{k} |

5: Update optimization history: H_{k} = H_{k-1}∪< EI_{k}, L_{k}> |

6: End |

7: Return $\mathbf{z}*=\mathit{argmin}\mathit{L}({y}_{i},\widehat{y}=f(\mathit{x}i,C,\gamma \left)\right)$ |

**z**by the surrogate function, EI values and threshold P in iteration history and evaluate the results of tuning values

**L**in validation dataset or cross-validation method. The surrogate function is as follows [20]:

**z**) and h(

**z**) are the probability distributions when the value of

**L**is greater than or less than the threshold P, respectively, whose distributions come from the historical information

**H**accumulated in the previous k − 1 iterations.

#### 2.3. Crux Spectrum Feature Selection Based on Extreme Gradient Boosting

**x**

_{i}, y) is as follows:

**FIV**to measure feature importance in the model [28]. The FIVs are as follows (10):

**FIV**(s) represents the information gain of the feature s to all meta learners or models during the iteration of the extreme gradient boosting. When the performance of the model reaches a certain acceptable level,

**FIV**s can be used in both feature selection and data interpretation. The range of this value is a floating number in (0.00, 1.00). The closer it is to the value of the right boundary, the more important the feature is for the ensemble model. This paper will use the FIV value model based on the extreme gradient boosting algorithm for feature selection, compress the high-dimension spectral data into fewer crux bands and then build RBF-SVM model with the TPE method mentioned above.

#### 2.4. Feature Selection and Optimized RBF-SVM Modelling

#### 2.4.1. Feature Selection Based on Feature Importance

**D**(m = 203) into a wavelength band subset

**s**with 10 crux spectral bands.

**D**(m). The fitting process would have a validation in

**valid**dataset of

**D**(m) to control possible overfitting and underfitting, then a full-band extreme boosting model

**FS**

_{D}_{(m=203)}was finished. After performance evaluation, the

**FIV**s of acceptable model,

**FS**

_{D}_{(m=203)}, will be extracted and mask for a subset by the sorted descending top 10 spectral band wavelengths as

**s**with feature interpretation.

#### 2.4.2. Modelling and Optimizing RBF-SVM with TPE in Sub-Dataset

**D**(m=203) will be scaled as follows:

**x**

_{std}are the average and variance value of the dataset $\mathrm{train}$ correspondingly and these data will be used directly as constant values in normalized scaling for

**valid**and

**test**to avoid data leaks that pollute the training dataset by accident information.

**D**

^{norm}(m*=10) is constructed according to sub-dataset in (11) above. The RBF-SVM is going to build and be fitted by

**train***of

**D**

^{norm}(m*=10) with optimization searching by TPE algorithms in

**valid***datasets for better performance metrics.

#### 2.5. Baseline Models and Evaluation Metrics Design

#### 2.5.1. Vanilla and Control Group Models

**D**(m* = 10). Further, Vanilla RBF-SVM (svc2) will be considered to compare the effect of the TPE; in addition, six other machine learning algorithms are shown as comparative models below:

- CART Tree (tree)

- Random Forest (rdrf)

- Logistic Regression (lgst)

- Multilayer Perceptron (mlp2&mlp4)

- Convolution Neural Network (conv)

#### 2.5.2. Evaluation Metrics and Analysis Environment

_{i}and the n

_{sample}is the number of samples in dataset. Further, an ACC-based confusion matrix will be introduced for detailed category prediction evaluation.

## 3. Results

#### 3.1. Feature Selection Based on FIVs

**train**, in a range of 400 nm to 1000 nm, fits the extreme gradient boosting model cited above, which applicated

**valid**in a separate validation dataset for early stopping. The boosting was set to the max number of estimators of 1000 rounds and finally the iterations stopped in the 880th round. At this time, the model with the validation dataset would not have been improved by the loss metrics in 10 rounds so it is deemed to fit fully.

**FS**

_{D}_{(m=203)}and corresponding

**FIV**can be pick up from the fitted boosting model.

**FIV**, whose horizontal axis represents the band wavelength of the spectral data; the vertical axis is the

**FIV**numerical distribution with a sum of 1.00 from the extreme gradient boosting model. According to the

**FIV**value, the top 10 wavelengths are extracted as the selected crux wavelength band to

**s**, whose distribution is shown covered in the band marked with the dotted line.

**FIV**. Ten selected bands contribute 30.9631% in the model. The average weight of each band is calculated to transform it into relative FIVs based on the sum of selected bands’ values.

#### 3.2. Optimization of SVM by Tree-Structed Parzen Estimator

^{4}and γ = 2.1056 × 10

^{−4}, which appears during the 32nd iteration. At this time, the ACC in the validation dataset is 0.9072 and the ACC after testing in the independent dataset

**test***is 0.9165. The two values are similar so the generalization ability of the TPE-RBF-SVM model with both structural risk and empirical risk is proved.

^{4}, γ = 2.1056 × 10

^{−4}) is close to in the search space. According to the algorithm theory of SVM, when the training dataset is complex multi-dimensional data, the relationship is complex between parameters of SVM and hyperparameters during the training process. Therefore, results of tuning by the metric will be of multiple local optimal solutions in the search space. This is confirmed by several iterations in the convergence–mutation process of the metric from validation by TPE as the number of iterations increases.

^{4}and the hyperparameter γ is determined to be 2.1056 × 10

^{−4}for our datasets. The accuracy of the TPE-RBF-SVM model under this configuration is 0.9072 in the validation dataset and 0.9165 in the test dataset.

#### 3.3. Comparison with Vanilla Model and Other Algorithms

**FIV**s based on the extreme gradient boosting method can select crux bands from bands ranging from 400nm to 1000nm for dimension-reduced sub-band datasets. Further, in the sub-band dataset by FIVs from extreme gradient boosting, compared with the vanilla RBF-SVM and XGBoost models, the SVM model optimized by TPE algorithms can effectively improve in the test dataset performance of soybean multi-classification tasks; compared with other machine learning algorithms, the method still has high accuracy as well.

## 4. Discussion

## 5. Conclusions

**FIVs**based on the extreme gradient boosting method can select less crux bands from 400 nm to 1000 nm for the dimension-reduced sub-band dataset. (2) The TPE algorithm can determine the hyperparameters of the RBF-SVM model in the sub-band spectral dataset to perform better than vanilla SVM. (3) The combination of TPE-RBF-SVM and the sub-band dataset selected by

**FIVs**can significantly improve accuracy and F-score metrics compared with two vanilla models and other machine learning algorithms.

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Fehily, A.M. SOY (SOYA) BEANS|Dietary Importance. In Encyclopedia of Food Sciences and Nutrition; Elsevier: Amsterdam, The Netherlands, 2003; pp. 5392–5398. ISBN 978-0-12-227055-0. [Google Scholar]
- Lee, T.; Tran, A.; Hansen, J.; Ash, M. Major Factors Affecting Global Soybean and Products Trade Projections. Amber Waves Econ. Food Farming Nat. Resour. Rural. Am.
**2016**, 4. [Google Scholar] [CrossRef] - Zhao, G.; Quan, L.; Li, H.; Feng, H.; Li, S.; Zhang, S.; Liu, R. Real-Time Recognition System of Soybean Seed Full-Surface Defects Based on Deep Learning. Comput. Electron. Agric.
**2021**, 187, 106–230. [Google Scholar] [CrossRef] - Abdel-Rahman, E.M.; Mutanga, O.; Odindi, J.; Adam, E.; Odindo, A.; Ismail, R. A Comparison of Partial Least Squares (PLS) and Sparse PLS Regressions for Predicting Yield of Swiss Chard Grown under Different Irrigation Water Sources Using Hyperspectral Data. Comput. Electron. Agric.
**2014**, 106, 11–19. [Google Scholar] [CrossRef] - Folch-Fortuny, A.; Prats-Montalbán, J.M.; Cubero, S.; Blasco, J.; Ferrer, A. VIS/NIR Hyperspectral Imaging and N-Way PLS-DA Models for Detection of Decay Lesions in Citrus Fruits. Chemom. Intell. Lab. Syst.
**2016**, 156, 241–248. [Google Scholar] [CrossRef] - Rapaport, T.; Hochberg, U.; Shoshany, M.; Karnieli, A.; Rachmilevitch, S. Combining Leaf Physiology, Hyperspectral Imaging and Partial Least Squares-Regression (PLS-R) for Grapevine Water Status Assessment. J. Photogramm. Remote Sens.
**2015**, 109, 88–97. [Google Scholar] [CrossRef] - Osco, L.P.; Ramos, A.P.M.; Faita Pinheiro, M.M.; Moriya, É.A.S.; Imai, N.N.; Estrabis, N.; Ianczyk, F.; Araújo, F.F.D.; Liesenberg, V.; Jorge, L.A.D.C.; et al. A Machine Learning Framework to Predict Nutrient Content in Valencia-Orange Leaf Hyperspectral Measurements. Remote Sens.
**2020**, 12, 906. [Google Scholar] [CrossRef] - Erkinbaev, C.; Derksen, K.; Paliwal, J. Single Kernel Wheat Hardness Estimation Using near Infrared Hyperspectral Imaging. Infrared Phys. Technol.
**2019**, 98, 250–255. [Google Scholar] [CrossRef] - Zhang, X.; Sun, J.; Li, P.; Zeng, F.; Wang, H. Hyperspectral Detection of Salted Sea Cucumber Adulteration Using Different Spectral Preprocessing Techniques and SVM Method. LWT
**2021**, 152, 112–295. [Google Scholar] [CrossRef] - Jahed Armaghani, D.; Asteris, P.G.; Askarian, B.; Hasanipanah, M.; Tarinejad, R.; Huynh, V.V. Examining Hybrid and Single SVM Models with Different Kernels to Predict Rock Brittleness. Sustainability
**2020**, 12, 2229. [Google Scholar] [CrossRef] - Ahmad, A.S.; Hassan, M.Y.; Abdullah, M.P.; Rahman, H.A.; Hussin, F.; Abdullah, H.; Saidur, R. A Review on Applications of ANN and SVM for Building Electrical Energy Consumption Forecasting. Renew. Sustain. Energy Rev.
**2014**, 33, 102–109. [Google Scholar] [CrossRef] - Zeng, N.; Qiu, H.; Wang, Z.; Liu, W.; Zhang, H.; Li, Y. A New Switching-Delayed-PSO-Based Optimized SVM Algorithm for Diagnosis of Alzheimer’s Disease. Neurocomputing
**2018**, 320, 195–202. [Google Scholar] [CrossRef] - Li, Y.; Yang, K.; Gao, W.; Han, Q.; Zhang, J. A Spectral Characteristic Analysis Method for Distinguishing Heavy Metal Pollution in Crops: VMD-PCA-SVM. Spectrochim. Acta Part A Mol. Biomol. Spectrosc.
**2021**, 255, 119–649. [Google Scholar] [CrossRef] [PubMed] - Pal, M.; Foody, G.M. Feature Selection for Classification of Hyperspectral Data by SVM. IEEE Trans. Geosci. Remote Sens.
**2010**, 48, 2297–2307. [Google Scholar] [CrossRef] - Kour, V.P.; Arora, S. Particle Swarm Optimization Based Support Vector Machine (P-SVM) for the Segmentation and Classification of Plants. IEEE Access
**2019**, 7, 29374–29385. [Google Scholar] [CrossRef] - Nader, A.; Azar, D. Searching for Activation Functions Using a Self-Adaptive Evolutionary Algorithm. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, ACM, Cancún, Mexico, 8 July 2020; pp. 145–146. [Google Scholar]
- Tharwat, A.; Hassanien, A.E. Quantum-Behaved Particle Swarm Optimization for Parameter Optimization of Support Vector Machine. J. Classif.
**2019**, 36, 576–598. [Google Scholar] [CrossRef] - Young, S.R.; Rose, D.C.; Karnowski, T.P.; Lim, S.-H.; Patton, R.M. Optimizing Deep Learning Hyper-Parameters through an Evolutionary Algorithm. In Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments, ACM, Austin, TX, USA, 15 November 2015; pp. 1–5. [Google Scholar]
- Bergstra, J.; Bengio, Y. Random Search for Hyper-Parameter Optimization. J. Mach. Learn. Res.
**2012**, 13, 281–305. [Google Scholar] - Ozaki, Y.; Tanigaki, Y.; Watanabe, S.; Onishi, M. Multiobjective Tree-Structured Parzen Estimator for Computationally Expensive Optimization Problems. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference, ACM, Cancún, Mexico, 25 June 2020; pp. 533–541. [Google Scholar]
- Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn.
**1995**, 20, 273–297. [Google Scholar] [CrossRef] - Boser, B.E.; Guyon, I.M.; Vapnik, V.N. A Training Algorithm for Optimal Margin Classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory COLT ’92, ACM, Pittsburgh, PA, USA, 27–29 July 1992; pp. 144–152. [Google Scholar]
- Chang, C.-C.; Lin, C.-J. LIBSVM: A Library for Support Vector Machines. ACM Trans. Intell. Syst. Technol.
**2011**, 2, 1–27. [Google Scholar] [CrossRef] - Herrero-Lopez, S. Multiclass Support Vector Machine. In GPU Computing Gems Emerald Edition; Elsevier: Amsterdam, The Netherlands, 2011; pp. 293–311. ISBN 978-0-12-384988-5. [Google Scholar]
- Abdiansah, A.; Wardoyo, R. Time Complexity Analysis of Support Vector Machines (SVM) in LibSVM. IJCA
**2015**, 128, 28–34. [Google Scholar] [CrossRef] - Friedman, J.H. Stochastic Gradient Boosting. Comput. Stat. Data Anal.
**2002**, 38, 367–378. [Google Scholar] [CrossRef] - Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, San Francisco, CA, USA, 13 August 2016; pp. 785–794. [Google Scholar]
- Adler, A.I.; Painsky, A. Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection. Entropy
**2022**, 24, 687. [Google Scholar] [CrossRef] [PubMed] - Li, H.; Zhang, L.; Sun, H.; Rao, Z.; Ji, H. Identification of Soybean Varieties Based on Hyperspectral Imaging Technology and One-dimensional Convolutional Neural Network. J. Food Process. Eng.
**2021**, 44, e13767. [Google Scholar] [CrossRef] - Zhu, S.; Zhou, L.; Zhang, C.; Bao, Y.; Wu, B.; Chu, H.; Yu, Y.; He, Y.; Feng, L. Identification of Soybean Varieties Using Hyperspectral Imaging Coupled with Convolutional Neural Network. Sensors
**2019**, 19, 4065. [Google Scholar] [CrossRef] - Alsahaf, A.; Petkov, N.; Shenoy, V.; Azzopardi, G. A Framework for Feature Selection through Boosting. Expert Syst. Appl.
**2022**, 187, 115–895. [Google Scholar] [CrossRef] - Wan, Z.; Xu, Y.; Šavija, B. On the Use of Machine Learning Models for Prediction of Compressive Strength of Concrete: Influence of Dimensionality Reduction on the Model Performance. Materials
**2021**, 14, 713. [Google Scholar] [CrossRef] [PubMed] - Zhang, N.; Yang, G.; Pan, Y.; Yang, X.; Chen, L.; Zhao, C. A Review of Advanced Technologies and Development for Hyperspectral-Based Plant Disease Detection in the Past Three Decades. Remote Sens.
**2020**, 12, 3188. [Google Scholar] [CrossRef] - Dai, Q.; Cheng, J.-H.; Sun, D.-W.; Zeng, X.-A. Advances in Feature Selection Methods for Hyperspectral Image Processing in Food Industry Applications: A Review. Crit. Rev. Food Sci. Nutr.
**2015**, 55, 1368–1382. [Google Scholar] [CrossRef]

**Figure 5.**Subset and visual features image for soybeans in this paper. (

**a**) DongSheng-1 subset (Label 0). (

**b**) ChangNong-33 subset (Label 1). (

**c**) ChangNong-38 subset (Label 2). (

**d**) ChangNong-39 subset (Label 3).

**Figure 7.**Hyperparameters changing during iterations. (

**a**) Change for accuracy with C and iterations. (

**b**) Change for accuracy with γ and iterations.

Category | Crude Protein | Crude Fat | Shape | Seed Coat Luster | Seed Hilum | Train & Valid | Test Dataset | Label | Sum |
---|---|---|---|---|---|---|---|---|---|

Dongsheng-1 | 41.30% | 19.97% | Spherical | shiny | yellow | 375 | 125 | 0 | 500 |

Changnong-33 | 37.57% | 23.00% | Ellipsoid | shiny | yellow | 374 | 125 | 1 | 499 |

Changnong-38 | 37.26% | 21.33% | Ellipsoid | slight | yellow | 450 | 150 | 2 | 600 |

Changnong-39 | 40.91% | 20.15% | Spherical | dull | brown | 525 | 175 | 3 | 700 |

Hyperparameter | Data Type | Search Space | Minimize Step |
---|---|---|---|

C | float | 1 × 10^{−2}, 1 × 10^{5} | 1 × 10^{−8} |

γ | float | 1 × 10^{−8}, 1 | 1 × 10^{−10} |

Compute Environment | Analysis Tools | |
---|---|---|

CPU | Intel^{®} Core™ i5-10400F (2.90 GHz) | Pandas 1.3.3, Numpy 1.19.3, Scikit-learn 0.24.1, XGBoost 1.4.2, Scipy 1.6.2, liblinear 3.23.0.4, CUDA 11.2, Keras 2.9.0 |

GPU | Nvidia GeForce RTX 3070 | |

RAM | DDR4 3000Mhz 48GB = 2 × 8 GB + 2 × 16GB | |

Operating System | Windows LTSC 21H2 | |

Random Seed | 615 |

Center Wavelength | Violet Bands | Green Bands | Orange | NIR Bands | ||||||
---|---|---|---|---|---|---|---|---|---|---|

421.66 | 424.63 | 516.49 | 525.38 | 528.35 | 534.27 | 537.24 | 617.25 | 824.69 | 866.18 | |

Full Bands’FIV values | 2.73% | 1.86% | 1.48% | 5.23% | 3.52% | 9.51% | 1.92% | 1.25% | 2.16% | 1.31% |

RelativeFIV values | 8.80% | 6.01% | 4.77% | 16.88% | 11.38% | 30.71% | 6.19% | 4.04% | 6.98% | 4.22% |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Zhao, Q.; Zhang, Z.; Huang, Y.; Fang, J.
TPE-RBF-SVM Model for Soybean Categories Recognition in Selected Hyperspectral Bands Based on Extreme Gradient Boosting Feature Importance Values. *Agriculture* **2022**, *12*, 1452.
https://doi.org/10.3390/agriculture12091452

**AMA Style**

Zhao Q, Zhang Z, Huang Y, Fang J.
TPE-RBF-SVM Model for Soybean Categories Recognition in Selected Hyperspectral Bands Based on Extreme Gradient Boosting Feature Importance Values. *Agriculture*. 2022; 12(9):1452.
https://doi.org/10.3390/agriculture12091452

**Chicago/Turabian Style**

Zhao, Qinghe, Zifang Zhang, Yuchen Huang, and Junlong Fang.
2022. "TPE-RBF-SVM Model for Soybean Categories Recognition in Selected Hyperspectral Bands Based on Extreme Gradient Boosting Feature Importance Values" *Agriculture* 12, no. 9: 1452.
https://doi.org/10.3390/agriculture12091452