On the Efficacy of Handcrafted and Deep Features for Seed Image Classification

Computer vision techniques have become important in agriculture and plant sciences due to their wide variety of applications. In particular, the analysis of seeds can provide meaningful information on their evolution, the history of agriculture, the domestication of plants, and knowledge of diets in ancient times. This work aims to propose an exhaustive comparison of several different types of features in the context of multiclass seed classification, leveraging two public plant seeds data sets to classify their families or species. In detail, we studied possible optimisations of five traditional machine learning classifiers trained with seven different categories of handcrafted features. We also fine-tuned several well-known convolutional neural networks (CNNs) and the recently proposed SeedNet to determine whether and to what extent using their deep features may be advantageous over handcrafted features. The experimental results demonstrated that CNN features are appropriate to the task and representative of the multiclass scenario. In particular, SeedNet achieved a mean F-measure of 96%, at least. Nevertheless, several cases showed satisfactory performance from the handcrafted features to be considered a valid alternative. In detail, we found that the Ensemble strategy combined with all the handcrafted features can achieve 90.93% of mean F-measure, at least, with a considerably lower amount of times. We consider the obtained results an excellent preliminary step towards realising an automatic seeds recognition and classification framework.


Introduction
The last few decades have seen considerable growth in the use of image processing techniques to solve various problems in agriculture and the life sciences because of their wide variety of applications [1]. This growth is mainly due to the fact that computer vision techniques have been combined with deep learning techniques. The latter has offered promising results in various application fields, such as haematology [2], biology [3,4], or botany [5,6]. Deep learning algorithms differ from traditional machine learning (ML) methods in that they require little or no preprocessing of images and can infer an optimal representation of data from raw images without the need for prior feature selection, resulting in a more objective and less biased process. Moreover, the ability to investigate the structural details of biological components, such as organisms and their parts, can significantly influence biological research. According to Kamilaris et al. [7], image analysis is a significant field of research in agriculture for seeds, crops or leaves classification, anomaly or disease detection, and other related activities.
In the agricultural area, the cultivation of crops is based on seeds, mainly for food production. In particular, in this study, we focus on the field of plant science carpology, which examines seeds and fruits from a morphological and structural point of view. It generally has two main challenges: reconstructing the evolution of a particular plant species and recreating what the landscape was and, therefore, what its flora and fauna appeared. Professionals employed in this field typically capture images of seeds using a digital camera or flatbed scanner. Especially the latter provides quality and speed of workflow due to the constant illumination condition and the defined image size [8][9][10][11][12]. In this context, the seed image classification can play a fundamental role for manifold reasons, from crops, fruits and vegetables to disease recognition, or even to obtain specific feature information for archaeobotanical reasons, and so forth. One of the most-used tools by biologists is ImageJ [13][14][15]. It is defined as one of the standard types of image analysis software, as it is freely available, platform-independent, and easily applicable for biological researchers to quantify laboratory tests.
A traditional image analysis procedure uses a pipeline of four steps: preprocessing, segmentation, feature extraction, and classification, although deep learning workflows have emerged since the proposal of the convolutional neural network (CNN) AlexNet in 2012 [16]. CNNs do not follow the typical image analysis workflow because they can extract features independently without the need for feature descriptors or specific feature extraction techniques. In the traditional pipeline, image preprocessing techniques are used to prepare the image before analysing it to eliminate possible distortions or unnecessary data or highlight and enhance distinctive features for further processing. Next, the segmentation step divides the significant regions into sets of pixels with shared characteristics such as colour, intensity, or texture. The purpose of segmentation is to simplify and change the image representation into something more meaningful and easier to analyse. Extracting features from the regions of interest identified by segmentation is the next step. In particular, features can be based on shape, structure or colour [17,18]. The last step is classification, assigning a label to the objects using supervised or unsupervised machine learning approaches. Compared to manual analysis, the use of seed image analysis techniques brings several advantages to the process: (i) It speeds up the analysis process; (ii) It minimises distortions created by natural light and microscopes; (iii) It automatically identifies specific features; (iv) It automatically classifies families or genera.
In this work, we address the problem of multiclass classification of seed images from two different perspectives. First, we study possible optimisations of five traditional machine learning classifiers as adopted in our previous work [6], trained with seven different categories of handcrafted (HC) features extracted from seed images with our proposed ImageJ tool [19]. Second, we train several well-known convolutional neural networks and a new CNN, namely SeedNet, recently proposed in our previous work [6], in order to determine whether and to what extent a feature extraction performed from them may be advantageous over handcrafted features for training the same traditional machine learning methods depicted before. In particular, Table 1 depicts the study, contributions, and tools provided by our previous works and the one presented here.
The overall aim of the work is to propose a comprehensive comparison of seed classification systems, both based on handcrafted and deep features, to produce an accurate and efficient classification of heterogeneous seeds. It is important, for example, to obtain archaeobotanical information of seeds and to effectively recognise their types. More specifically, the classification addressed in this work is fine-grained, oriented to single seeds, rather than sets of seeds, as it is in [20]. In detail, our contribution is threefold: (i) We exploit handcrafted, a combination of handcrafted, and CNN-extracted features; (ii) We compare the classification results of five different models, trained with HC and CNN-extracted features; (iii) We evaluate the classification results from a multiclass perspective to assess which type of descriptor may be most suitable for this task. This research aims to classify individual seeds belonging to the same family or species from two different and heterogeneous seed data sets, where differences in colour, shape, and structure can be challenging to detect. We also want to highlight how traditional classification techniques trained with handcrafted features can outperform CNNs in training speed and achieve accuracy close to CNNs in this task.
The rest of the article is organised as follows. Section 2 presents state of the art in plant science work, with a focus on seed image analysis. Section 3 presents the data sets used and the classification experiments. The experimental evaluation is discussed in Section 4, and finally, in Section 5 we give the conclusions of the work.

Related Work
This work aims to classify seeds of different families, or species, according to the data set used. In general, computer vision techniques have been applied to this or similar tasks, even though no studies address heterogeneous seed identification or classification. For example, several authors have proposed methods to detect or classify types of seeds [5,6,21], leaves [22][23][24], and crops [25], to identify the quality of crops [26] or diseased leaves or crops [1,25,[27][28][29], using both traditional and deep learning-based techniques. Table 2 gives a summary of the main methods and findings of the literature.

Leaf Detection and Classification
Several methods have been proposed for tasks similar to seed classification, such as leaf identification and recognition. Examples include a Support Vector Machine (SVM)based method trained with leaf-related features such as shape, colour, and texture [22] and a mobile leaf detection system based on saliency computation to extract regions of interest, followed by segmentation with the region growing algorithm, which exploits both saliency map and colour features [24]. Finally, Hall et al. [23] use a combination of handcrafted and deep features as part of a classification system based on Random Forest (RF). It can classify the leaves of different plant species using a data set of over 1900 images divided into 32 species. The last method is particularly relevant and can be applied to seed images; however, it uses images of leaves acquired in ideal conditions, as well as Di Ruberto et al. [22], i.e., with an artificial background and not in actual conditions with a natural background, as in our investigation. As for Putzu et al. [24], the main focus is on the processing of leaves with complex artificial backgrounds in a mobile application scenario, which is far from the purposes of this work.

Leaf Diseases Identification
Works that are similar but oriented to the identification of diseases have been proposed by different authors [1,25,[27][28][29]. Some examples include the work of Slado et al. [27], which used CaffeNet (a single-GPU version of AlexNet's CNN) to identify leaf diseases; while AlexNet and GoogleNet have been used to identify 14 crop species and 26 different diseases [25]. In addition, LeNet has been used for diseased banana leaves recognition in [28]. Barman et al. [1] proposed a real-time citrus leaf disease detection and classification system based on the MobileNet CNN [30]. Finally, Gajjar et al. [29] proposed a novel CNN as part of a framework for real-time identification of diseases in a crop, tested on 20 different healthy and diseased leaves of four different plants. Each of the works in this section uses convolutional neural networks suitable for leaf disease detection. Contrary to the analysis carried out in our work, they did not study the handcrafted features that are particularly important in discriminating different types of seeds. Table 2. Overview of existing works in this field with key insights from the proposed methods.

Classification of Crops
A pretty similar task to the one faced in this work is related to the classification of crops. For example, Junos et al. [31] proposed a detection system based on an improved version of YOLOv3 to detect loose palm fruits from images acquired under various natural conditions; on the other hand, Zhu et al. [26] realised a system to recognise the appearance quality of carrots, based on the SVM classifier trained with features extracted by CNNs. In particular, ResNet101 network offered excellent results. CNNs are also used in these works. In the first case, the task is different because the proposed system is a monitoring framework rather than a fine-grained classification system. In Zhu et al. [26], however, they employed features extracted from CNNs as in our work, although they did not employ handcrafted features as a comparison term for detecting the quality of carrots.

Seed Detection and Classification
The works most related to ours belong to this category. In particular, Sarigu et al. [5] performed a plum variety identification employing seed's endocarp shape, colour, and texture descriptors followed by a Linear Discriminant Analysis (LDA) to obtain the most representative features, while Przybylo et al. [21] and Loddo et al. [6] employed CNNs.
On the one hand, the first one used a modified version of AlexNet [32] and focused on the task of acorn germination ability classification based on the colour intensity of seed sections as a feature. On the other hand, a new CNN, called SeedNet, has been proposed by Loddo et al. [6] to classify and sort seeds belonging to different families or species. As in the first two works, we aim to classify seeds. Moreover, we investigated the same data sets employed in the latter work. The authors focused on a single seed variety in the first two works and employed only handcrafted features. As for our previous work [6], we proposed a new CNN for the classification and retrieval of fine-grained seeds without going into the details of handcrafted and deep features, which is one of the main purposes of the current work.

Materials and Methods
We leverage two data sets in this work. They contain images of heterogeneous seeds, both in number and in characteristics. They are publicly available on request. Each one was preprocessed as described in our previous work [6] and used for seed family or species classification using handcrafted or deep features. In the following, we start by describing the data sets in Sections 3.1.1 and 3.1.2. We then provide the implementation details of the classification strategy. We validate the performance through an empirical evaluation with a 10-fold cross-validation strategy and visualise the process's overall and class-specific discriminative features.

Data Sets Description
In this section, we describe the data sets used for the experiments.

Canada Data Set
The Canada data set is publicly available [33]. It contains 587 images of seeds, organised into families. Every seed belongs to the Magnoliophyta phylum. Every image can have one of the following three different resolutions: 600 × 800, 600 × 480, and 600 × 400. We exploited this data set because (i) It provides several different families, and (ii) The background of the images is clean and requires a precise and unique preprocessing strategy.
In particular, for the experiments, we selected the families considering the six most represented-Amaranthaceae, Apiaceae, Asteraceae, Brassicaceae, Plantaginaceae, and Solanaceae (23)-for a total of 215 seed images. Figure 1 shows a sample for each family, and Table 3 indicates the number of samples. Each original image contains a scale marker as a dimensional reference for the seed. It was removed following the preprocessing procedure proposed in [6].

Family Samples
The basic collection of the Banca del Germoplasma della Sardegna (BG-SAR), University of Cagliari, Italy, was used to create the local data set. It consists of 3386 samples from 120 different plant species. Each seed is a member of the Fabaceae family and varies significantly in size and colour. The images have a resolution of 2125 × 2834 [34]. We used a preprocessing procedure defined by our previous work [6] to remove the background and extract single seeds for classification.

Evaluation Metrics
The following metrics are used to evaluate the performance of each classification model: accuracy (Acc), precision (Pre), specificity (Spec), and recall (Rec). Accuracy is defined as the proportion of correctly labelled instances to the total number of instances. Precision is the proportion of true positives in a set of positive results. Specificity is the proportion of negative results that are correctly identified, and recall is the proportion of positive results that are correctly identified. They are defined as follows: where i represents the current class and J the total number of classes. The F-measure(i) for class i is defined as: We pinpoint that each of the metrics shown in this work has been calculated as a macro average of the number of classes.

Seed Classification
The handcrafted features were extracted using the ImageJ tool described in [19]. It can extract up to 64 different features. In particular, 32 are morphological shapes, 16 are textures, and 16 are colour intensities. Among the texture features, Haralick's GLCM, which describes the arrangement of pixel pairs with the same grey level [36], was used to extract information of local similarities. They all permit their computation with the typical four different degrees: 0 • , 45 • , 90 • , 135 • . More precisely, we extracted the following second-order statistics from GLCM: energy, contrast, correlation, and homogeneity.
The handcrafted descriptors have been compared to deep features that were extracted from several different well-known network architectures: Vgg16, Vgg19, AlexNet, GoogLeNet, InceptionV3, ResNet101, Resnet18, Resnet50, and SeedNet. AlexNet [16], Vgg16 [37], and Vgg19 [38] are the shallowest among the tested architectures, being composed of 8, 16, and 19 layers, respectively. In all of these three cases, we extracted the features from the second last fully connected layer (fc7) for a total of 4096 features. GoogleNet [39], Inception-v3 [40], ResNet18, ResNet50, and ResNet101 [41] are much deeper, being composed of 100, 48, 18, 50, and 101 layers, respectively. In all of these cases, we extracted the features from the one fully connected layer for a total of 1000 features. Finally, SeedNet is a novel and lightweight CNN proposed in [6] for seed image classification and retrieval. We extracted the features from the last fully connected layer for a total of 23 features. The CNNs are known to have a sufficient representational power and generalisation ability to perform different visual recognition tasks [42]. Nevertheless, we fine-tuned the above CNNs on both data sets before the feature extraction in order to produce a fairer comparison to the standard machine learning classifiers trained with handcrafted features. In particular, we adopted the following classification strategy for both data sets: (i) We split the data into 60% for training, 20% for validation, and 20% for test set; (ii) We fine-tuned the CNNs on the training set, using the validation set to avoid overfitting; (iii) We used 10-fold stratified cross-validation on training and validation set combined, in order to train the five classification algorithms; (iv) We finally evaluated the classification performed on the test set.
The extracted features have been used as input to different classification algorithms in order to produce different classification models. The models considered are the following: k-Nearest Neighbors (kNN), Decision Tree (DT), Naive Bayes (NB), Ensemble (Ens), and Support Vector Machine (SVM). KNN uses the k nearest neighbour training examples in the data set as input, and a neighbour voting strategy ranks an object. Decision trees create a model that predicts the value of a target variable by learning simple decision rules inferred from the characteristics of the data. The deeper the tree, the more complex the decision rules and the more suitable the model. Naive Bayes classifiers are probabilistic models based on the application of the Bayes theorem with strong assumptions of independence between features. The Ensemble classifier is based on an ensemble of classifiers rather than a single one. The classifiers in the ensemble all predict the correct classification of each unseen instance, and their predictions are then combined using some form of voting system. Finally, SVM is a non-probabilistic binary linear classifier that assigns objects to a category, mapping the instances to points in space to maximise the width of the gap between categories.
To ensure the heterogeneity of the training set and keeping in mind that we faced a class imbalance problem, we trained each classifier with 10-fold stratified cross-validation to ensure that the proportion of positive and negative examples is respected in all folds in such a way that they contain a representative ratio of each class. For each case, we selected the model with the largest area under the ROC curve (AUC). The primary hyperparameters characterising each classifier were tuned in order to obtain a model with optimal performance. Furthermore, to make the results reproducible, we specify the values of the hyperparameters chosen for each model considered. For the kNN classifier, the distance metric adopted is the cityblock, and the number of nearest neighbours is 6, with a squared inverse distance weighting function. For the Decision Tree classifier, the maximum number of splits to control the depth of the trees is 50. The chosen distribution used to model the data is normal for the Naive Bayes classifier with a normal kernel smoother. For the Ensemble classifier, we chose the Adaptive Boosting (AdaBoost) method for multiclass classification. In particular, the learners are decision trees. Finally, for the SVM classifier, we used a polynomial kernel function of order 2, with an auto kernel scale parameter and a box constraint parameter equal to 1 to control the maximum penalty imposed on marginviolating observations and therefore to prevent overfitting. We evaluated the performance of each classifier using the same hyperparameters on both data sets.

Results and Discussion
We report several results obtained from the experiments. First of all, four graphs are presented to show the general behaviour of the two sets of descriptors from two points of view. In fact, we report the best and average accuracies for both sets, as shown in Figures 3 and 4. It works as a general indicator of the effectiveness of the features used for the task. In addition, we pinpoint their performance in the multiclass scenario. In particular, Figures 5 and 6 show the behaviour of the MFM computed for the different classifiers. Secondly, in the Appendix A, we report Tables A1-A5, in which the individual descriptor categories results obtained with each classifier are detailed.   The graphs in Figures 3 and 4 show that each of the employed classifiers can achieve excellent classification accuracy on both data sets. However, from the results of the Canada data set, the Decision Tree classifier seems the least suitable for the task, especially when trained with CNNs descriptors, being the only one below 90% on average in that case. Although the other four strategies can achieve an accuracy above 90% in practically all cases, the Support Vector Machine seems the most appropriate in every experimental condition. It outperforms the others, averaging 98.38% and 99.49% on the Canada and Cagliari data sets, respectively, and a best of 99.58% and 99.73% with the features extracted from SeedNet on the Canada and Cagliari data sets, respectively. In general, and looking only at the accuracy, there seem to be no distinct performance differences in the two categories of descriptors to justify one over the other.
However, the scenario considerably changes when observing the results of the multiclass classification performance that we evaluated using the F-measure computed for all the classes, as indicated in Equation (6). In particular, Figure 5 shows that the best MFMs are generally lower than the accuracy on both data sets, even though the SVM reaches more than 99% of MFM on the Canada data set with ResNet50 and SeedNet descriptors and 96.07% with SeedNet-extracted features on Cagliari data set. In the last graph shown in Figure 6, the average performance obtained with the MFM again indicates the SVM trained with CNN descriptors as the most suitable choice for the task. Indeed, the SVM trained with CNN-extracted descriptors obtains 96.11% and 92.86% on Canada and Cagliari data sets, respectively.
As a general rule, on the one hand, the results provided with the extensive experiments conducted seem to show that the SVM trained with CNN-extracted features can accomplish the multiclass seeds classification task with performance that outperforms every other combination of descriptors and classifier analysed. Moreover, this solution seems to be robust, achieving the highest results in each comparative test. Those extracted from SeedNet performed excellently in all categories among the deep features, establishing themselves very well suited to the task. On the other hand, the results produced using the HC descriptors are satisfactory since they generally bring results comparable to the CNNs ones, even slightly lower than the best CNN descriptors case of the SVM. In general, the Ensemble strategy turns out to be the most appropriate when using HC descriptors, being able to reach 98.76% and 99.42% as the best accuracy and 95.24% and 90.93% as best the MFM on Canada and Cagliari data sets, respectively, and 96.50% 98.84% as the average accuracy and 84.88% and 81.59% as the MFM. Nevertheless, the different number of features that the two different categories have should also be considered. In particular, the handcrafted ones are 64 if combined, while the deep features are 4096 in the worst cases of AlexNet, Vgg16, and Vgg19, and 1000 in all the remaining CNNs. SeedNet is an exception because it has 23 features. Therefore, if we consider the number of discriminative features, the results obtained with the HC features are even more satisfactory and pave the way for possible combinations of heterogeneous features.
Since our investigation is the first attempt to study the problems of classifying finegrained seed types with a large variety of different seed classes (up to 23), we leveraged known existing classification strategies that have been commonly used in other works closer to this [22,24,26,43,44]. Specifically, we employed kNN, Decision Tree, Naive Bayes, Ensemble classifier with AdaBoost method, and SVM. SVM is the most suitable for this task, probably due to its excellent capacity in distinguishing classes with closely related elements. This condition is evident in the Canada data set, containing seeds with heterogeneous shapes, colours, and textures. In contrast, the Cagliari data set is composed of similar classes, making the process more complex (see Astragalus, Medicago, and Melitotus as examples from Figure 2). For the same reasons, Decision Trees showed the most unsatisfactory results in this context because, with high probabilities, the features produced are insufficient to adequately represent all possible conditions of the internal nodes and realise an appropriate number of splits. Furthermore, as Figures 3 and 4 show, the overall performance of the system in terms of accuracy indicates that the HC and CNN features are comparable, and in some cases, the first ones are better than the last ones. This behaviour is because, in general, both categories have high representational power for fine-grained seeds classification [6], both in this context and on the same data sets. However, the accuracy metric does not represent the detail of the multiclass issue faced in this work. For this reason, we adopted the mean F-measure in order to have a more unambiguous indication of the most suitable features for the task, keeping in mind the multiclass scenario.
Considering that we addressed a a multiclass classification problem, we provide Figures 7-10, which represent the classwise MFM for each class of both features categories. In detail, regarding the Cagliari data set, Figures 9 and 10 show that the most difficult classes are Calicotome villosa (with F-measures of 77.78% with the SeedNet features and 65.38% with all the HC features) and Cystus purgans (with F-measures of 82.93% with the SeedNet features and 68.35% with all the HC features), in both cases far below 90%. They are mostly misclassified with Hedysarum coronarium and Cystus scoparius, respectively. In both cases, this is certainly due to their similar shapes, and in the latter, certain seeds also have similar colours. As regards the Canada data set, Figure 8 shows that the Amaranthaceae (F-measure of 66%) class is mainly misclassified with Solanaceae, and vice versa, although to a lesser extent (F-measure of 86.95%). Even in this case, this is probably due to the similar shapes, but it is necessary to remark that the Amaranthaceae class contains only ten samples. The remaining four classes obtained an F-measure highly above 95%. On the other hand, Figure 7 represents the excellent representational power of the ResNet50-extracted features, considering that the F-measure of all the classes is above 95%, and, above all, the Amaranthaceae obtained 100%, overcoming the issues of the handcrafted features in discriminating it. A final remark should be devoted to the execution time. We did not indicate the training time of the different CNNs employed because it is out of the scope of the work. However, we note that the training time was never less than 22 min on the Canada data set (AlexNet) and 21 min on the Cagliari data set (GoogLeNet) for the known architectures, while the SeedNet training lasted 4 and 12 min, respectively. Regarding the training time of the traditional classifiers, it was never above 1 min (the worst was Naive Bayes To sum up, the classification strategy based on the optimised SVM trained with SeedNet-extracted features is suitable for the seed-classification task, even in a multiclass scenario. This work shows how SeedNet is not only a robust solution for classification [6] but is also an outstanding feature extractor if coupled with the SVM classifier. The solution here obtained could also be more feasible than using SeedNet alone, considering the quicker training time of the SVM once provided with the selected features, in contrast to SeedNet alone.
While interesting results have been shown, our work suffers from some limitations. First, the best-performing solution relies entirely on one combination of descriptor and classifier, even though other categories of descriptors produced satisfactory results. Considering the properties of handcrafted features, combining them with deep features could improve the results, particularly in distinguishing the different classes of seeds more specifically. Second, every experimental condition assumed a preprocessing step before it, which needs to be tuned according to the data set employed. As a result, the trained classifier could have issues if applied to other data sets with different image conditions. Third, the training time of the best classification system strictly depends on the training time of the CNN adopted for the feature extraction. Efforts should be made in this sense in order to make a real-time system for the task addressed. Fourth, the dimensionality of the different feature vectors slightly changes if we compare handcrafted and deep descriptors. The first ones have a maximum of 64 features, while the second ones can have up to 4096. In this context, SeedNet is an excellent solution with only 23 features. A reasonable combination of heterogeneous descriptors could be made to investigate possible improvements, even followed by a feature reduction/selection. Fifth, as represented by the classwise performance, some classes are harder to distinguish because of their similar shapes and colours. In the case of Cagliari data sets, not even the deep features have overcome this issue. For this reason, the combination of heterogeneous descriptors could help recognise the most challenging classes.

Conclusions
In this work, we mainly focused on the problem of seed image classification. In this context, we specifically addressed an unbalanced multiclass task with two heterogeneous seed data sets, using both handcrafted and deep features. Based on shape, texture, and colour, the handcrafted features are general and dependent on the problem addressed, generating a feature vector with a maximum size of 64. Deep features were extracted from several known CNNs capable of performing different visual recognition tasks, generating a feature vector whose size is 1000 or 4096, except for SeedNet, which has 23. The features were then used to train five different classification algorithms, kNN, Decision Tree, Naive Bayes, SVM, and Ensemble. The experimental results show that the different feature categories perform best and comparably using SVM or Ensemble models for the Canada data set, with average accuracy values above 96.5%. The best model for the Cagliari data set is Ensemble for HC features and SVM for deep features. In both cases, the average accuracy values are above 99.4%. The MFM metric values give us essential information about how well the considered features can solve the unbalanced multiclass task. For both types of features, the Ensemble model achieves the best and comparable performance, with average values higher than 95.2% and 91%, respectively, for the data sets of Canada and Cagliari. When comparing HC-and CNN-based features, especially when considering the size of the feature vector, HC descriptors outperformed deep descriptors in some cases, as they achieved similar performance but with significant computational savings. In general, the results provided by the extensive experiments indicate that the SVM trained with the features extracted from the CNN can perform the task of multiclass seed classification with a performance that outperforms any other combination of descriptors and classifiers analysed. Moreover, this solution seems to be robust, obtaining the highest results in each comparative test. Among the deep features, those extracted by SeedNet performed excellently in all categories, establishing themselves as very well suited to the task and expressing SeedNet as a powerful tool for seed classification and feature extraction. It is also important to remark that the classwise performance highlighted that some classes are harder to distinguish because of their similar shapes and colours. For this reason, the combination of HC and deep descriptors could help recognise the most challenging classes.
In conclusion, SeedNet and CNNs, in general, have demonstrated their ability to offer convenient features for this task, achieving outstanding performance results with both data sets. However, if we consider the size of the feature vectors as a computational term, and the training time involved in the initial process, the HC feature performed satisfactorily, which is particularly desired for a real-time framework.
As a future direction, we aim to further improve the results obtained by investigating the possibility of combining the HC and CNN features, particularly to overcome the difficulties in recognising some seed classes and a feature selection step to reduce the dimensionality of the features. Finally, we also plan to realise a complete framework that can manage all the steps involved in this task, from image acquisition to seed classification, broadening our approach to distinguishing between seeds' genera and species. Data Availability Statement: All the features extracted, the models, and the material produced in this study are available at the following URL: GitHub repository.