Application of Hyperspectral Imaging for Identification of Melon Seed Variety Using Deep Learning

Hong, Zhiqi; Zhang, Chu; Song, Wenjian; Nie, Xiangbo; Ye, Hongxia; He, Yong

doi:10.3390/agriculture15111139

Open AccessArticle

Application of Hyperspectral Imaging for Identification of Melon Seed Variety Using Deep Learning

by

Zhiqi Hong

^1,2,†,

Chu Zhang

^3,†

,

Wenjian Song

²,

Xiangbo Nie

⁴,

Hongxia Ye

^5,* and

Yong He

^1,*

¹

College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China

²

The Rural Development Academy & Agricultural Experiment Station, Zhejiang University, Hangzhou 310058, China

³

School of Information Engineering, Huzhou University, Huzhou 313000, China

⁴

Shaoxing Jinshuo Agricultural Technology Co., Ltd., Shaoxing 312000, China

⁵

Institute of Vegetable Science, Zhejiang University, Hangzhou 310058, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Agriculture 2025, 15(11), 1139; https://doi.org/10.3390/agriculture15111139

Submission received: 3 April 2025 / Revised: 20 May 2025 / Accepted: 22 May 2025 / Published: 25 May 2025

(This article belongs to the Special Issue Optics and Image Analysis in Modern Agriculture: Transforming Practices and Unveiling Opportunities)

Download

Browse Figures

Versions Notes

Abstract

The accurate identification of melon seed varieties is essential for improving seed purity and the overall quality of melon production. In this study, hyperspectral imaging was used to identify six varieties of melon seeds. Both hyperspectral images and RGB images were generated during hyperspectral image acquisition. The spectral features of seeds were extracted from the hyperspectral images. The image features of the corresponding seeds were manually extracted from the RGB images. Five different datasets were formed using the spectral features and RGB images of the seeds, including seed spectral features, manually extracted seed image features, seed images, the fusion of seed spectral features with manually extracted features, and the fusion of seed spectral features with seed images. Logistic Regression (LR), Support Vector Classification (SVC), and Extreme Gradient Boosting (XGBoost) were used to establish classification models using spectral features and the manually extracted image features. Convolutional Neural Network (CNN) models were established using the five datasets. The results indicated that the CNN models achieved good performance in all five datasets, with classification accuracies exceeding 90% for the training, validation, and test sets. Also, CNN using the fused datasets obtained optimal performance, achieving classification accuracies exceeding 97% for the training, validation, and test sets. The results indicated that both spectral features and image features can be used to identify the six varieties of melon seeds, and their fusion of spectral features and image features can improve classification performance. These findings provide an alternative approach for melon seed variety identification, which can also be extended to other seed types.

Keywords:

hyperspectral imaging; image feature; data fusion; convolutional neural network; melon seeds

1. Introduction

Melon (Cucumis melo L.) is one of the most economically important horticultural crops, widely grown in many countries. The quality of melon is intrinsically associated with its variety. Various regions, climatic conditions, and soil types impose specific requirements on the cultivated melon varieties. Due to their unique taste, aroma, and nutrients, numerous melon varieties have been developed to satisfy the different demands of growers and consumers. In addition to being used for the propagation of melon plants, melon seeds can also serve as traditional Chinese medicine [1] and can be processed to produce melon seed oils [2]. Considering these factors, the identification of melon seed varieties can help to ensure the use of appropriate melon varieties.

The large number of melon varieties, along with the significant morphological similarities among seeds from different varieties, complicates the accurate identification of distinct varieties through visual inspection alone. Gene-based methods can be used to identify melon seed varieties accurately [3,4]. However, these methods are destructive, inefficient, and expensive, requiring professional operation skills; thus, they are not suitable for the large-scale analysis of each individual seed. Various non-destructive techniques have been used for seed variety identification [5,6], including melon seeds [7]. Among these techniques, hyperspectral imaging has been proven to be an efficient and promising technique for seed variety identification [8]. Being capable of acquiring spectral and spatial information simultaneously, hyperspectral imaging has been studied in various fields. As for seed quality inspection, hyperspectral imaging can be used for both batch samples and single-kernel seeds [8].

The application of hyperspectral imaging in seed variety identification depends on the analysis of the hyperspectral images. Hyperspectral images can provide one-dimensional (1D) spectra, two-dimensional (2D) images, and three-dimensional (3D) hyperspectral data cubes. Seed variety identification has been successfully achieved based on the multi-dimensional data. Conventional data analysis methods have shown great success, including data preprocessing, feature selection/extraction, and modeling methods and strategies [8]. In addition to these data analysis methods, deep learning has emerged as one of the most prominent approaches for analyzing hyperspectral images [9,10]. Multi-dimensional data can be directly processed using deep learning models. Deep learning, primarily known for its remarkable feature learning and mining ability, has now been extended to various fields. For seed variety identification, deep learning-based methods have demonstrated strong performance. The combination of deep learning and hyperspectral imaging has emerged as a potential alternative for real-world applications of seed variety identification. The spectral analysis of hyperspectral images, which is primarily a 1D data analysis issue, is the most commonly used data analysis strategy in hyperspectral imaging. Spectral features can reflect the physical and chemical properties of the samples. Deep learning models using 1D spectral data have demonstrated great success across various tasks, including seed variety classification [5].

Moreover, image features can also provide useful information for seed quality inspection [11,12]. These features can provide information regarding morphology, color, and texture [11]. Image feature-based seed variety classification has also been reported for seeds exhibiting external differences, such as variations in morphology, color, and texture [11,12,13]. For certain seed varieties with differences in internal and external features, external features may strengthen classification performance. For image feature extraction, the manually defined features are extracted in advance using conventional approaches and then these features are used for classification [11]. Some studies have demonstrated that image feature-based classification obtained good performance [5,11,12,13]. When the seeds exhibit quite similar external features, image feature-based analysis might not achieve satisfactory results. Thus, attempts to fuse the image features and spectral features to enhance classification performance have been widely explored [14,15]. In the case of hyperspectral imaging, image features are generally obtained through a dimension reduction approach to reduce the amount of data and explore informative features.

It is a matter of fact that hyperspectral imaging instruments are significantly more expensive, and, consequently, some researchers have combined spectral features with the features of RGB images [14,15]. For both hyperspectral images and RGB images, these features are predefined and must be extracted prior to further analysis [14,15,16,17,18].

Due to the excellent feature learning abilities, deep learning-based analysis can be performed in an end-to-end manner, eliminating the need for the prior extraction of predefined features. However, the calculation time for manually extracted features is short, whereas the feature learning process of deep learning might be relatively longer. It is worth investigating the performance of manually extracted features (image features extracted manually based on the equations defined and validated by previous studies) with that of features automatically extracted by deep learning models. As for hyperspectral imaging, deep learning models using the 3D data cube can mine both spectral and image features. Generally, image features of hyperspectral imaging are extracted from gray-scale images at selected wavelengths. Currently, improving seed variety classification performance is still challenging.

In this research, hyperspectral imaging with deep learning was used to identify melon seed varieties. The fusion of spectral features and image features was explored. The specific objectives were to (1) compare the classification performance of LR, SVC, XGBoost, and CNN models for melon seed variety identification using 1D spectra data; (2) identify significant wavelengths and important image features for melon seed variety classification using Grad-CAM++; (3) establish 1D CNN models using image features extracted from RGB images and gray-scale images at selected feature wavelengths and 2D CNN models using RGB and spectral images at selected wavelengths, and compare the performance of these models; and (4) establish end-to-end CNN models to fuse spectral features and image features, and compare these models with the established models.

2. Materials and Methods

2.1. Sample Preparation

Melon seeds of six varieties (2A-234, CX-264, DFM-268, Zhetian103, Zhetian105, and Zhetian501) were collected from Zhejiang University, Hangzhou, Zhejiang Province, China, in 2023. These six melon varieties exhibit close genetic relationships. All seeds were intact and clean, and then used for hyperspectral image acquisition. Representative images of seeds of the six melon varieties are shown in Figure 1. Detailed seed information is listed in Table 1. The melon seeds of 2A-234, CX-264, DFM-268, Zhetian103, Zhetian105, and Zhetian501 were assigned the category values of 0, 1, 2, 3, 4, and 5, respectively.

2.2. Hyperspectral Image Acquisition and Correction

A laboratory-based hyperspectral imaging system was used to acquire hyperspectral images. The hyperspectral imaging system consists of an FX10 hyperspectral camera (Spectral Imaging Ltd., Oulu, Finland) at the spectral range of 400–1000 nm with a spectral resolution of 5.5 nm. The effective pixel size was 19.9 × 9.97 μm. A light source with six halogen lamps (OSRAM, Munich, Germany) was used for illumination. The lamps were symmetrically placed on either side of the camera, with three lamps in a row on each side. Each lamp had a power of 35 W. A mobile platform was used to transport the samples. The distance between the lower edge of the light source and the moving plate was 220 mm. The official software LUMO-Scanner 2020 (Spectral Imaging Ltd., Oulu, Finland) was used to control the entire hyperspectral imaging system. During the acquisition of images of melon seeds, the distance between the seeds and the camera lens was set at 300 mm, and the speed of the moving platform was set at 24.70 mm/s. During image acquisition, the seeds were randomly placed on the moving platform, with no seeds being in contact with one another.

Before image acquisition, a dark reference image was collected by covering the image lens with a black cover. During image acquisition, a white Teflon board was placed in front of the samples for white reference image acquisition. After acquiring the raw hyperspectral image, the raw image was corrected as the reflectance image using the dark reference image and white reference image based on the following equation:

I_{C} = \frac{I_{R} - I_{D}}{I_{W} - I_{D}}

(1)

where I_R is the raw hyperspectral image, I_D is the dark reference image, I_W is the white reference image, and I_C is the corrected image.

2.3. Spectra Extraction

After image acquisition and correction, the hyperspectral images were preprocessed to remove the background by establishing and applying the masks. The hyperspectral images were then cut into sub-images to ensure that each sub-image contained only one intact seed. After obtaining the hyperspectral images of each single melon seed, the spectral information was extracted. To extract the spectral information, each seed was defined as a region of interest (ROI). Each pixel within the ROI (seed) contained a spectrum, and the reflectance values of all pixels within the seed at each wavelength were averaged to represent the reflectance value of the seed at certain wavelengths. The average reflectance value of each wavelength was then combined to form the average spectrum of the seed. For each seed, although the number of pixels was different, the average spectra would help reduce the influence of seed sizes. Due to the fact that the head and tail of the spectra contained noises caused by the sensor responses and environments, spectra at these wavelengths were removed. Only the spectra in the range of 424–987 nm were used for analysis. As for spectral preprocessing, the models were first established using the raw spectra by trials, and good performance was obtained. Thus, no further spectral preprocessing was conducted.

2.4. Image Feature Extraction

During the acquisition of hyperspectral images, the software LUMO-Scanner automatically generated the RGB images of melon seeds with the same spatial size as the hyperspectral images. The RGB image was then cut into sub-images as the same as the sub-images of hyperspectral images. In each sub-image, there were only the seed and the black moving plate without any foreign materials, which were quite simple. There were obvious differences between the seed and the black plate, and a binarization approach was conducted to remove the background. The binarization was achieved as follows: a fixed threshold method was applied to segment the generated sub-image, producing a binary mask to remove the background. The threshold was determined using the Otsu method for each sub-image. Then, the image features were extracted according to the literature [11]. A total of 47 features were extracted. There were 12 color features from the R, G, and B images, including the mean, standard deviation, maximum value, and minimum value of R, G, and B images. Then, the RGB images were transformed into HSV spaces as HSV images, and 12 similar color features were extracted. The monochrome image was constructed for gray-scale feature extraction, resulting in four features (mean, variance, standard deviation, and population mean). In addition to the color features, the morphological features were also extracted, including area, perimeter, convex hull perimeter, maximum feret diameter, major axis length, minor axis length, aspect ratio, ellipse ratio, thinness ratio, hydraulic radius, and orientation (the definitions of these features can be found in the literature [11]). The texture features were also extracted from the monochrome image, including contrast, dissimilarity, homogeneity, energy, correlation, and angular second moment (ASM). The extracted features were used to establish the classification models. Moreover, the manually extracted image features were used for data fusion with the spectral features.

The 2D images can be directly processed by deep learning models. Thus, in addition to the manually extracted features, the preprocessed seed images were also used as inputs of the CNN models and the fusion model.

2.5. Data Analysis Methods

In this study, five datasets were constructed, namely, spectral features, manually extracted image features, seed images, fusion of spectral features and manually extracted image features, and fusion of spectral features and images. The classification models were established using the five datasets. The samples were randomly divided into the training, validation, and test sets. The number of samples in the training, validation, and test sets are shown in Table 1. It should be noted that the orders of the samples in the spectral dataset, the image features dataset, and the fusion dataset were consistent. The conventional machine learning methods Logistic Regression (LR), Support Vector Classification (SVC), and Extreme Gradient Boosting (XGBoost) were used to establish models using spectral features and the manually extracted image features. The Convolutional Neural Network (CNN) was used to establish models based on the five datasets.

2.6. Conventional Machine Learning Methods

2.6.1. Logistic Regression (LR)

LR is a widely used classification method [19]. LR is primarily a binary class classification method, and it calculates the probability of the sample to be one class based on the input features. The core of LR is the Sigmoid function to calculate the probability. LR can be extended to deal with multi-class classification problems.

2.6.2. Support Vector Classification (SVC)

SVC is the classification version of support vector machine (SVM) [20]. It deals with both linear and non-linear issues effectively. For linearly separable samples, the goal of SVC is to find a linear classifier to maximumly classify the samples. For linear non-separable samples, SVC first maps the original data into a high-dimension space using kernel functions. The goal of SVC is to construct maximum-margin hyperplanes to maximumly classify the samples in the high-dimensional space. The selection of the kernel function is of importance. In spectral data analysis, the radial basis function (RBF) has shown good performance.

2.6.3. Extreme Gradient Boosting (XGBoost)

XGBoost is an ensemble-based machine learning method [21]. XGBoost is based on the Gradient Boosting Decision Tree (GBDT), and ensembles several classification and regression trees (CARTs). The goal of XGboost is to minimize the loss function. The loss function of XGboost contains the output results of all CARTs and a regularization term. This type of loss function is used to obtain better prediction performance, reduce complexity, and avoid overfitting.

2.6.4. Convolutional Neural Network (CNN)

The revolutionary development of deep learning has gained great success in various fields. Deep learning-based artificial intelligence has groundbreaking applications. CNN is one of the most used deep learning architectures in data analysis [22], including hyperspectral imaging [9,23]. A CNN model typically contains a number of convolutional layers, pooling layers, and fully connected layers, with batch normalization layers, dropout layers, etc. Due to the extraordinary feature learning abilities, CNN can deal with 1D, 2D, and 3D data in an end-to-end manner, which makes it quite suitable for hyperspectral image analysis. In this study, a shallow CNN was used to deal with the 1D spectra, 1D extracted image features, and 2D seed images to classify the varieties of melon seeds.

2.6.5. Efficient Channel Attention (ECA)

ECA is a lightweight attention mechanism optimized for Convolutional Neural Networks (CNNs) [24]. ECA is designed to enhance performance with minimal model complexity. It avoids traditional dimensionality reduction after global average pooling to preserve the integrity of channel features, and employs a size-adaptive 1D convolution kernel to efficiently capture local cross-channel interactions between each channel and its k neighbors, thereby generating precise channel weights. Compared to methods like SENet, ECA significantly reduces parameter count and computational overhead while achieving an excellent balance of performance and efficiency.

2.6.6. Data Fusion Using CNN

The fusion of spectral features and image features has been widely explored in hyperspectral image analysis [25,26]. To explore the performance of fusing spectral and image features, CNN models for data fusion were designed. The data fusion consisted in two ways. The first way was to fuse the spectral features with the manually extracted image features. The second way was to fuse the spectral features and the seed images. In both ways, two-branch CNN models were established as the end-to-end deep fusion networks.

2.7. Software and Model Performance Evaluation Metrics

The hardware used in this research was a computer with 16 GB RAM, NVIDIA GeForce RTX 4060Ti GPU and INTEL i5-12400 CPU. Spectral extraction, image feature extraction, and data processing were carried out with Opencv-Python (version: 4.9.0) and Pycharm (2021.1.1) software based on Python (version: 3.9). LR, SVC, and XGboost models were developed using Scikit-Learn (version: 1.4.2) and CNN models using Pytorch (version: 2.2.2). The performance of the models was evaluated by classification accuracy.

3. Results

3.1. Spectral Profiles

Figure 2 shows the reflectance spectra of the six varieties of melon seeds. The colored shadow indicates the standard deviation of the reflectance of each wavelength, which shows the variation in the reflectance of each wavelength of the samples in the corresponding melon seed variety. The general trends of the spectra of the six varieties of melon seeds were similar, and overlaps could be observed. Hoverer, differences could also be observed among these spectral profiles. There was a crossing of the spectra of different varieties of melon seeds between 600 nm and 650 nm. These differences would help to differentiate melon seed varieties.

3.2. Analysis of Image Features

In this study, a total of 45 features were manually extracted from the seed RGB images. The statistical analysis of the 45 image features can be found in Table S1 (in the Supplementary Materials). As shown in Table S1, there were differences in the manually extracted image features. Although the differences in some image features of some varieties might not be significant, at least three varieties had significant differences in these image features. Moreover, it should be noted that some of the image features showed significant differences among all six varieties of melon seeds, such as hydraulic radius and feret diameter. All these differences in the image features of the six varieties of sweet melon seeds laid the foundation for the good performance of the models using these image features.

3.2.1. Model Development Using Spectral Features

LR, SVC, XGBoost, and CNN models were established using the 1D spectral features. Figure 3 shows the structures of CNN models. LR, SVC, and XGBoost models were developed based on Scikit-Learn (version: 1.4.2), and the model parameters could be found and optimized in Scikit-Learn (version: 1.4.2). After trials, the optimal model parameters were obtained. For LR, the solver was selected as ‘liblinear’, the max_iter was set as 20,000, and the C value was set as 1.0. For SVC, the kernel function was selected as ‘linear’, and the C value was 1.0. For XGBoost, the max_depth was set as 4, the min_child_weight was set as 12, the learning_rate was set as 0.008, the n_estimators was set as 1200, the subsample was set as 0.8, and the colsample_bytree was set as 0.8. To train the CNN model, the batch size, the learning rate, and the number of epochs were set as 64, 0.01, and 500. The results of the LR, SVC, XGBoost, and CNN models are shown in Table 2. As for the conventional machine learning models, XGBoost and SVC showed close results in the validation and test sets. The LR model obtained relatively worse results. The CNN model had much better performance, with the classification accuracy over 95% in the training, validation, and test sets. The overall results indicated that the spectral features could be used for the classification of the six varieties of melon seeds.

3.2.2. Model Development Using Extracted Image Features

In addition to the spectral features, manually extracted features were also used to build classification models to explore the feasibility of using the external image features to identify melon seed varieties. For LR, the solver was selected as ‘liblinear’, the max_iter was set as 20,000, and the C value was set as 1.0. For SVC, the kernel function was selected as ‘linear’, and the C value was 1.0. For XGBoost, the max_depth was set as 4, the min_child_weight was set as 12, the learning_rate was set as 0.008, the n_estimators was set as 1200, the subsample was set as 0.8, and the colsample_bytree was set as 0.8. To train the CNN model, the batch size, the learning rate, and the number of epochs were set as 64, 0.01, and 500. The classification results are shown in Table 3. Excellent performance was obtained for melon seed variety identification by different models. The classification accuracy of the training, validation, and test sets was all over 90%. Slight differences in the performance of machine learning and deep learning models could be found. The results indicated that manually extracted 1D images could be used for the classification of the six varieties of melon seeds (Figure 4).

3.2.3. CNN Model Using Seed Images

In addition to the CNN model using 1D image features, the 2D images of melon seeds were also used to develop CNN models for variety identification. To train the CNN model, the batch size, the learning rate, and the number of epochs were set as 64, 0.01, and 500. To avoid the interference of the background, the background was removed for each seed. The seed images were then fed into a CNN model. The CNN model using the seed images as inputs obtained good performance, with the classification accuracy of the training, validation, and test sets at 100%, 94.49%, and 95.08%, respectively. The F1-scores of the training, validation, and test sets were 1.0000, 0.9444, and 0.9504, respectively. The training time of the CNN model was 1089.5620 s. The results indicated that 2D seed images could be used to identify the six varieties of melon seeds (Figure 5).

3.2.4. CNN Fusion Models Using Spectral Features and Image Features

As mentioned above, both spectral features and image features could be used for melon seed variety identification. Attempts to fuse the spectral features and image features were also explored. Hyperspectral imaging can also provide the 3D data cube containing both spectral and image features. However, the training of CNN models using 3D data required more powerful computation capability and higher computation complexity. Considering that both spectral features and image features obtained good classification performance, the use of the 3D data cube for melon seed variety identification was not utilized.

In this study, the fusion of spectral and image features was conducted using two strategies. First, the spectral features (1D) and the manually extracted image features (1D) were used as inputs of a two-branch CNN model for end-to-end fusion. The CNN architectures for the fusion of 1D spectral features and 1D manually extracted image features are shown in Figure 6. Second, the spectral features (1D) and the processed images (2D) were used as inputs of a two-branch CNN model for end-to-end fusion. The CNN architectures for the fusion of 1D spectral features and 2D seed images are shown in Figure 7. In both strategies, the features were firstly learned by one branch, and the learned features were then also concatenated for the next steps. During the training of CNN models using the fusion set, the batch size, the learning rate, and the number of epochs were set as 64, 0.01, and 500 for the two CNN models. The results are shown in Table 4.

As shown in Table 4, both fusion strategies obtained good and close performance, with the classification accuracy of the training, validation, and test sets over 97%. The overall results illustrated the effectiveness of the fusion strategies and the fact that spectral features and image features can provide complementary information to each other. However, the training of the CNN model using the fusion of the 1D spectral features and 2D seed images required more computation resources. Assuming that 1D manually extracted features can be used for the classification of the six varieties of melon seeds, the fusion of 1D spectral features and 1D manually extracted image features might be more practical. Figure 8 shows the confusion matrix of the training, validation, and test sets based on the CNN model using the fusion of 1D spectral features and 1D manually extracted image features. It should be noted that no particular pattern was found for the misclassification.

3.3. Comparison of Results of Different Datasets

The classification models using the spectral feature dataset, the manually extracted image feature dataset, the image data, the fusion dataset of spectral features and manually extracted image features, and the fusion dataset of spectral features and the images all obtained good performance. Conventional machine learning algorithms (LR, SVC, and XGBoost) based on image features performed better than those based on spectral features. However, the CNN models using spectral features and image features obtained good and close results, indicating the effectiveness of CNN models in melon seed variety identification in this study. On the other hand, the number of image features was much lower than that of spectral features, indicating that there were differences in the external features of the melon seeds.

The CNN models using the seed images as inputs also obtained good performance, and the performance was similar to that of models using spectral features and image features. However, compared with the 1D features, the 2D images were more difficult for CNN training, requiring more computation power and time.

The CNN models using the two fusion datasets showed better performance than those using the other three datasets. Both spectral and image features contained distinctive features of the six varieties of melon seeds, and the fusion of the two types of features can provide more comprehensive information for melon seed variety identification.

Regarding the training time, it should be noted that the training time of the models using the fusion of 1D spectra and 2D seed image was the longest, followed by the CNN model using the seed images as input. The training times of CNN models using 1D spectra features or manually extracted image features were relatively longer than those of the conventional machine learning models. However, once the models were trained, the optimal model was saved and the saved models could be used for prediction. The prediction time was quite short for all models.

3.4. SHapley Additive exPlanations Analysis

The overall results show that the classification models using spectral features and image features obtained good performance. To further explore the important features contributing more to the melon seed variety identification, SHapley Additive exPlanations (SHAP) analysis was conducted based on the models established using spectral features and manually extracted image features. The features were ranked according to the mean absolute SHAP values of each model for each melon seed variety.

As for spectral features, the top 50 wavelengths with the highest mean absolute SHAP values of each variety in the LR, SVC, XGBoost, and CNN models are listed in Tables S2–S7. It can be observed that for each variety, there were some common important wavelengths for any two models, but there were quite a few common wavelengths for all four models. As for manually extracted image features, the top 20 image features with the highest mean absolute SHAP values of each variety are listed in Tables S8–S13. It can be observed that for each variety, there were some common important wavelengths for any two models, but there were only a few common wavelengths for all four models. Differences in feature importance could also be found for different varieties of melon seeds. These differences in the important features of each variety might be attributed to the different principles of the models. Another possible reason might be the limited number of seeds, and more seeds should be studied to further explore the features contributing more to the classification.

4. Discussion

In this research, the utilization of spectral features, image features, and the fusion of spectral and image features to identify melon seed varieties was explored, achieving good performance. The good classification results indicated that hyperspectral imaging combined with CNN can effectively identify the six varieties of melon seeds. The successful application of spectral features, manually extracted image features, seed images, and the fusion of spectral features and images has been demonstrated in various types of seeds [11,12,14,17,27].

In this research, the classification performance of conventional machine learning models using manually extracted image features was better than that of models using spectral features, indicating significant differences in the external features of the six varieties of melon seeds. These differences may also be the reason that CNN models using seed images and manually extracted features obtained quite similar and good results.

Although deep learning can directly process 3D hyperspectral images to mine spectral features and image features, the hardware requirements and computation time are relatively high. The fusion of spectral features and image features is widely explored in the data analysis of hyperspectral images by extracting these features separately from the hyperspectral images [16,17,18]. In this research, the models using the fusion of spectral features with manually extracted features and the fusion of spectral features with seed images obtained similar results, which were better than those obtained from models using spectral features and image features, respectively. In previous studies aimed at identifying seed varieties, the models using the fused datasets generally showed better performance than those using spectral features and image features independently [15,17,18].

The good performance of the spectral and image feature-based classification of melon seed varieties indicated that there may be significant differences in both internal and external features among the six varieties of melon seeds. However, there is a wide variety of melon seed types. In future, many more varieties of melon seeds should be studied, and corresponding data analysis strategies should be developed. The spectral features and image features (including manually extracted image features and seed images) demonstrate great potential for identifying melon seed varieties, which is critical for the online real-time automatic detection of melon seed purity and the online real-time automatic sorting of suitable melon seeds. Furthermore, the fusion of spectral features and image features can provide comprehensive information about the seeds, thereby improving classification accuracy. The data analysis strategies employed in this study would be available to identify various plant seed varieties. The use of the spectral features and the image features from seed RGB images also showed the potential to develop low-cost spectrometers and RGB cameras instead of hyperspectral imaging for seed variety identification, via proper and optimized design.

5. Conclusions

Hyperspectral imaging combined with deep learning models was successfully used to identify six varieties of melon seeds: 2A-234, CX-264, DFM-268, Zhetian103, Zhetian105, and Zhetian501. Five datasets were constructed from the hyperspectral images and the RGB images (obtained during hyperspectral image acquisition), namely, seed spectral features, manually extracted features from seed RGB images, seed RGB images, the fusion of seed spectral features and manually extracted features, and the fusion of seed spectral features with seed images. The classification models established using these datasets showed good performance, with CNN models achieving classification accuracy over 90% across the training, validation, and test sets for all five datasets. These results indicated that hyperspectral imaging combined with CNN had great potential for the classification of melon seed varieties. Moreover, the good performance of models using spectral features and image features illustrated that there may be significant differences in chemical compositions and external features among the six varieties of melon seeds. The relatively better performance of the models using fused datasets indicated that the utilization of both spectral features and image features can provide comprehensive information for the identification of the six varieties of melon seeds. The findings of this study could contribute to the development of models that use both spectral features and image features for rapid and accurate melon seed variety identification. In future, more melon seed varieties with relatively minor chemical and physical variations should be studied, and more data analysis strategies should be explored to obtain better performance.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/agriculture15111139/s1.

Author Contributions

Conceptualization, Z.H., C.Z., H.Y. and Y.H.; data curation, Z.H.; formal analysis, Z.H. and C.Z.; funding acquisition, Y.H.; investigation, Z.H. and C.Z.; methodology, C.Z., W.S. and Y.H.; project administration, W.S.; resources, X.N. and H.Y.; software, C.Z. and Y.H.; supervision, H.Y.; validation, C.Z.; visualization, Z.H.; writing—original draft, Z.H. and C.Z.; writing—review and editing, Z.H., C.Z., W.S., X.N., H.Y. and Y.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (grant number 32071895); the New Variety Breeding Project of the Major Science and Technology Projects of Zhejiang Province (grant number 2021C02065-3-3); and the Zhejiang University Experimental Technology Research Project (grant number SYBJS202325).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

Author Xiangbo Nie was employed by the company Shaoxing Jinshuo Agricultural Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Adebayo-Gege, G.; Alicha, V.; Omayone, T.O.; Nzekwe, S.C.; Irozuoke, C.A.; Ojo, O.A.; Ajayi, A.F. Anti-atherogenic and cardio-protective properties of sweet melon (Cucumis melo. L. Inodorus) seed extract on high fat diet induced in male wistar rats. BMC Complement. Med. Ther. 2022, 22, 334. [Google Scholar] [CrossRef] [PubMed]
Shafi, A.; Farooq, U.; Akram, K.; Majeed, H.; Hakim, A.; Jayasinghy, M. Cucumis melo seed oil: Agro-food by-product with natural anti-hyperlipidemic potential. J. Sci. Food Agric. 2023, 103, 1644–1650. [Google Scholar] [CrossRef]
Sabato, D.; Esteras, C.; Grillo, O.; Peña-Chocarro, L.; Leida, C.; Ucchesu, M.; Usai, A.; Bacchetta, G.; Picó, B. Molecular and morphological characterisation of the oldest Cucumis melo L. seeds found in the Western Mediterranean Basin. Archaeol. Anthropol. Sci. 2019, 11, 789–810. [Google Scholar] [CrossRef]
Aierken, Y.; Akashi, Y.; Phan, T.P.N.; Halidan, Y.; Tanaka, K.; Long, B.; Nishida, H.; Long, C.L.; Wu, M.Z.; Kato, K. Molecular Analysis of the Genetic Diversity of Chinese Hami Melon and Its Relationship to the Melon Germplasm from Central and South Asia. J. Jpn. Soc. Hortic. Sci. 2011, 80, 52–65. [Google Scholar] [CrossRef]
Jin, C.; Zhou, L.; Pu, Y.Y.; Zhang, C.; Qi, H.N.; Zhao, Y.Y. Application of deep learning for high-throughput phenotyping of seed: A review. Artif. Intell. Rev. 2025, 58, 76. [Google Scholar] [CrossRef]
Liu, F.; Yang, R.; Chen, R.Q.; Guindo, M.L.; He, Y.; Zhou, J.; Lu, X.Y.; Chen, M.Y.; Yang, Y.H.; Kong, W.W. Digital techniques and trends for seed phenotyping using optical sensors. J. Adv. Res. 2024, 63, 1–16. [Google Scholar] [CrossRef]
Makmuang, S.; Vilaivan, T.; Maher, S.; Ekgasit, S.; Wongravee, K. Discrimination of Thai melon seeds using near-infrared spectroscopy and adaptive self-organizing maps. Chemom. Intell. Lab. Syst. 2024, 245, 105060. [Google Scholar] [CrossRef]
Feng, L.; Zhu, S.S.; Liu, F.; He, Y.; Bao, Y.D.; Zhang, C. Hyperspectral imaging for seed quality and safety inspection: A review. Plant Methods 2019, 15, 91. [Google Scholar] [CrossRef]
Wang, C.Y.; Liu, B.H.; Liu, L.P.; Zhu, Y.J.; Hou, J.L.; Liu, P.; Li, X. A review of deep learning used in the hyperspectral image analysis for agriculture. Artif. Intell. Rev. 2021, 54, 5205–5253. [Google Scholar] [CrossRef]
Kumar, V.; Singh, R.S.; Rambabu, M.; Dua, Y. Deep learning for hyperspectral image classification: A survey. Comput. Sci. Rev. 2024, 53, 100658. [Google Scholar] [CrossRef]
Ruslan, R.; Khairunniza-Bejo, S.; Jahari, M.; Ibrahim, M.F. Weedy Rice Classification Using Image Processing and a Machine Learning Approach. Agriculture 2022, 12, 645. [Google Scholar] [CrossRef]
Barrio-Conde, M.; Zanella, M.A.; Aguiar-Perez, J.M.; Ruiz-Gonzalez, R.; Gomez-Gil, J. A Deep Learning Image System for Classifying High Oleic Sunflower Seed Varieties. Sensors 2023, 23, 2471. [Google Scholar] [CrossRef] [PubMed]
Shi, Y.Y.; Patel, Y.; Rostami, B.; Chen, H.W.; Wu, L.S.; Yu, Z.Y.; Li, Y. Barley Variety Identification by iPhone Images and Deep Learning. J. Am. Soc. Brew. Chem. 2022, 80, 215–224. [Google Scholar] [CrossRef]
Yang, D.F.; Hu, J. Accurate Identification of Maize Varieties Based on Feature Fusion of Near Infrared Spectrum and Image. Spectrosc. Spectr. Anal. 2023, 43, 2588–2595. [Google Scholar]
Bi, C.G.; Zhang, S.; Chen, H.; Bi, X.H.; Liu, J.J.; Xie, H.; Yu, H.L.; Song, S.Z.; Shi, L. Non-Destructive Classification of Maize Seeds Based on RGB and Hyperspectral Data with Improved Grey Wolf Optimization Algorithms. Agronomy 2024, 14, 645. [Google Scholar] [CrossRef]
Jiang, X.N.; Liu, Q.C.; Yan, L.; Cao, X.D.; Chen, Y.; Wei, Y.Q.; Wang, F.; Xing, H. Hyperspectral imaging combined with spectral-imagery feature fusion convolutional neural network to discriminate different geographical origins of wolfberries. J. Food Compos. Anal. 2024, 132, 106259. [Google Scholar] [CrossRef]
Sun, J.; Zhang, L.; Zhou, X.; Yao, K.S.; Tian, Y.; Nirere, A. A method of information fusion for identification of rice seed varieties based on hyperspectral imaging technology. J. Food Process Eng. 2021, 44, e13797. [Google Scholar] [CrossRef]
Wang, L.; Sun, D.W.; Pu, H.B.; Zhu, Z.W. Application of Hyperspectral Imaging to Discriminate the Variety of Maize Seeds. Food Anal. Methods 2016, 9, 225–234. [Google Scholar] [CrossRef]
Stoltzfus, J.C. Logistic Regression: A Brief Primer. Acad. Emerg. Med. 2011, 18, 1099–1104. [Google Scholar] [CrossRef]
Brereton, R.G.; Lloyd, G.R. Support Vector Machines for classification and regression. Analyst 2010, 135, 230–267. [Google Scholar] [CrossRef]
Chen, T.Q.; Guestrin, C.; Assoc Comp, M. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Li, Z.W.; Liu, F.; Yang, W.J.; Peng, S.H.; Zhou, J. A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. Ieee Trans. Neural Netw. Learn. Syst. 2022, 33, 6999–7019. [Google Scholar] [CrossRef] [PubMed]
Ranjan, P.; Girdhar, A. A comprehensive systematic review of deep learning methods for hyperspectral images classification. Int. J. Remote Sens. 2022, 43, 6221–6306. [Google Scholar] [CrossRef]
Wang, Q.L.; Wu, B.G.; Zhu, P.F.; Li, P.H.; Zuo, W.M.; Hu, Q.H. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 11531–11539. [Google Scholar]
AlSuwaidi, A.; Grieve, B.; Yin, H.J. Combining spectral and texture features in hyperspectral image analysis for plant monitoring. Meas. Sci. Technol. 2018, 29, 104001. [Google Scholar] [CrossRef]
Xu, P.; Fu, L.X.; Xu, K.; Sun, W.B.; Tan, Q.; Zhang, Y.P.; Zha, X.T.; Yang, R.B. Investigation into maize seed disease identification based on deep learning and multi-source spectral information fusion techniques. J. Food Compos. Anal. 2023, 119, 105254. [Google Scholar] [CrossRef]
Zhang, C.; Zhao, Y.Y.; Yan, T.Y.; Bai, X.L.; Xiao, Q.L.; Gao, P.; Li, M.; Huang, W.; Bao, Y.D.; He, Y.; et al. Application of near-infrared hyperspectral imaging for variety identification of coated maize kernels with deep learning. Infrared Phys. Technol. 2020, 111, 103550. [Google Scholar] [CrossRef]

Figure 1. Images of the six melon varieties seeds. (A): 2A-234; (B): CX-264; (C): DFM-268; (D): Zhetian103; (E): Zhetian105; (F): Zhetian501.

Figure 2. Average spectrum with standard deviation for each wavelength of the six varieties of melon seeds.

Figure 3. CNN architectures for 1D spectral features.

Figure 4. CNN architectures for 1D manually extracted features.

Figure 5. CNN architectures using 2D seed images as inputs.

Figure 6. The CNN architectures for the fusion of 1D spectral features and 1D manually extracted image features.

Figure 7. The CNN architectures for the fusion of 1D spectral features and 2D seed images.

Figure 8. The confusion matrix of the training, validation, and test sets based on the CNN model using the fusion of 1D spectral features and 1D manually extracted image features. (a) the confusion matrix of the training set; (b) the confusion matrix of the validation set; (c) the confusion matrix of the test set.

Table 1. Seed variety information and dataset split of melon seeds.

Variety	Total Number	Training	Validation	Test
2A-234	505	303	101	101
CX-264	665	399	133	133
DFM-268	1023	614	204	205
Zhetian103	1543	926	309	308
Zhetian501	1660	996	332	332
Zhetian105	499	299	100	100

Table 2. Classification results of different models using spectral features.

Dataset	Evaluation Metrics	LR	SVC	XGBoost	CNN-eca
Training set	Accuracy (%)	81.37	86.71	94.68	99.75
Training set	F1-score	0.7695	0.8579	0.9454	0.9992
Validation set	Accuracy (%)	81.42	87.27	87.62	96.27
Validation set	F1-score	0.7688	0.8639	0.8707	0.9633
Test set	Accuracy (%)	80.92	86.68	86.34	95.50
Test set	F1-score	0.7679	0.8582	0.8583	0.9566
	Training time (s)	0.8407	1.1486	19.5608	159.8209

Table 3. Classification results of different models using manually extracted features.

Dataset	Evaluation Metrics	LR	SVM	XGBoost	CNN-eca
Training set	Accuracy (%)	95.84	97.68	98.19	100.00
Training set	F1-score	0.9580	0.9768	0.9819	1.0000
Validation set	Accuracy (%)	96.10	94.15	94.06	95.34
Validation set	F1-score	0.9608	0.9418	0.9400	0.9530
Test set	Accuracy (%)	95.42	94.15	93.04	95.42
Test set	F1-score	0.9538	0.9418	0.9306	0.9541
	Training time (s)	0.6282	26.2554	3.6081	173.7988

Table 4. Classification results of CNN models using the fusion of spectral features and image features.

Fusion Strategy	Training Set		Validation Set		Test Set		Training Time (s)
	Accuracy (%)	F1-Score	Accuracy (%)	F1-Score	Accuracy (%)	F1-Score
1D spectra + 1D image features	100.00	1.0000	98.81	0.9881	98.64	0.9664	241.5790
1D spectra + 2D image	100.00	1.0000	98.22	0.9822	97.63	0.9764	1342.0777

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hong, Z.; Zhang, C.; Song, W.; Nie, X.; Ye, H.; He, Y. Application of Hyperspectral Imaging for Identification of Melon Seed Variety Using Deep Learning. Agriculture 2025, 15, 1139. https://doi.org/10.3390/agriculture15111139

AMA Style

Hong Z, Zhang C, Song W, Nie X, Ye H, He Y. Application of Hyperspectral Imaging for Identification of Melon Seed Variety Using Deep Learning. Agriculture. 2025; 15(11):1139. https://doi.org/10.3390/agriculture15111139

Chicago/Turabian Style

Hong, Zhiqi, Chu Zhang, Wenjian Song, Xiangbo Nie, Hongxia Ye, and Yong He. 2025. "Application of Hyperspectral Imaging for Identification of Melon Seed Variety Using Deep Learning" Agriculture 15, no. 11: 1139. https://doi.org/10.3390/agriculture15111139

APA Style

Hong, Z., Zhang, C., Song, W., Nie, X., Ye, H., & He, Y. (2025). Application of Hyperspectral Imaging for Identification of Melon Seed Variety Using Deep Learning. Agriculture, 15(11), 1139. https://doi.org/10.3390/agriculture15111139

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Application of Hyperspectral Imaging for Identification of Melon Seed Variety Using Deep Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Sample Preparation

2.2. Hyperspectral Image Acquisition and Correction

2.3. Spectra Extraction

2.4. Image Feature Extraction

2.5. Data Analysis Methods

2.6. Conventional Machine Learning Methods

2.6.1. Logistic Regression (LR)

2.6.2. Support Vector Classification (SVC)

2.6.3. Extreme Gradient Boosting (XGBoost)

2.6.4. Convolutional Neural Network (CNN)

2.6.5. Efficient Channel Attention (ECA)

2.6.6. Data Fusion Using CNN

2.7. Software and Model Performance Evaluation Metrics

3. Results

3.1. Spectral Profiles

3.2. Analysis of Image Features

3.2.1. Model Development Using Spectral Features

3.2.2. Model Development Using Extracted Image Features

3.2.3. CNN Model Using Seed Images

3.2.4. CNN Fusion Models Using Spectral Features and Image Features

3.3. Comparison of Results of Different Datasets

3.4. SHapley Additive exPlanations Analysis

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI