A Novel Approach to the Authentication of Apricot Seed Cultivars Using Innovative Models Based on Image Texture Parameters

The different cultivars of apricot seeds may differ in their properties. To ensure economical and efficient seed processing, knowledge of the cultivars’ composition and physical properties may be necessary. Therefore, the correct identification of the cultivar of the apricot seeds may be very important. The objective of this study was to develop models based on selected textures of apricot seed images to distinguish different cultivars. The images of four cultivars of apricot seeds were acquired using a flatbed scanner. For each seed, approximately 1600 textures from the image, converted to the different color channels R, G, B, L, a, b, X, Y, and Z, were calculated. The models were built separately for the individual color channels; the color spaces Lab, RGB, XYZ; and all color channels combined based on selected texture parameters using different classifiers. The average accuracy of the classification of apricot seeds reached 99% (with an accuracy of 100% for the seeds of the cultivars ‘Early Orange’, ‘Bella’, and ‘Harcot’, and 96% for ‘Taja’) in the case of the set of textures selected from the color space Lab for the model built using the Multilayer Perceptron classifier. The same classifier produced high average accuracies for the color spaces RGB (90%) and XYZ (86%). For the set of textures selected from all color channels, i.e., R, G, B, L, a, b, X, Y, and Z, the average accuracy reached 96% (Multilayer Perceptron and Random Forest classifiers). In the case of individual color channels, the highest average accuracy was up to 91% for the models built based on a set of textures selected from color channel b (Multilayer Perceptron). The results proved the possibility of distinguishing apricot seed cultivars with a high probability using a non-destructive, inexpensive, and objective procedure involving image analysis.


Introduction
Apricot (Prunus armeniaca L.) belongs to the Prunus genus and the Rosaceae family. This stone fruit is grown around the world, mainly in temperate regions. Apricot is a fruit consisting of skin, flesh, and stone with seed (kernel). Stones with seeds can be used in a variety of ways, e.g., for biodiesel or energy production, as sorbent for wastewater and water cleanup, for active carbon preparation, in the food industry, and in the cosmetic industry. Apricot kernels contain proteins, minerals, amino acids, fatty acids, and cyanogenic glycosides [1]. These compounds may be necessary for human health [2,3]. The leading countries in terms of the production of apricot, as estimated for 2018-2025, are Turkey, Uzbekistan, Algeria, and Italy [4]. Apricot seeds have great industrial potential. However, economical and efficient seed utilization requires knowledge of the cultivars' composition and physical properties [5]. It is very important to recover by-products from apricot processing and to reduce the impact on the environment [6]. The properties of apricot stones and seeds must be considered in the design of equipment for transportation, sorting, separating, breaking, and processing [7]. The seeds of apricots belonging to different cultivars may differ in chemical composition [1,3,8,9]. Therefore, the correct identification of the cultivar of the apricot seeds may be very important.
The identification of different products has been achieved through both contact methods (e.g., RAPD analysis-random amplification of polymorphic DNA, as well as multisensory gas analysis) and non-contact methods (e.g., imaging, excitation systems, and vibration sensors). In contact methods, the product is identified by sampling the product and transporting it to the laboratory, where these methods are employed to identify the products [10][11][12]. In addition to their complexity and time-consuming nature, contact (destructive) methods have other limitations, the most important of which is the possibility of damaging the sample [13]. Therefore, in previous studies, computer vision systems were usefully explored as an inexpensive, accurate, and objective approach to evaluating seed cultivars [14][15][16]. Since fruit is one of the main products in international markets and exports, its classification and grading are among the most important domains in agriculture [17]. Also, horticultural products have a higher added value than other crops, which doubles the importance of fruits.
In recent years, to reduce human labor and save time, image processing and computer vision algorithms have been utilized for different fruits. Most of the automatic sorting systems that are available are designed for various fruits, such as citrus fruits, oranges, apples, strawberries, mango, lemons, dates, etc. [18]. The main steps in the classification and grading of fruits according to images are the preparation and preprocessing of fruit images, segmentation, feature extraction, and the comparison and sorting of extracted features based on the classification criteria.
There are various methods for the classification and grading of fruits by algorithms. In the study performed by Capizzi et al. [19] to classify the defects of oranges, color and texture were used along with a neural network, and an accuracy of 88% was obtained. In another study, the classification of intra-class fruits according to color and texture characteristics using an ANN approach was carried out, with an accuracy of 83-98% [20]. New methods also use machine learning algorithms to identify and classify horticultural products. One study used shape features to identify apricot cultivars and employed six methods based on machine learning to determine their class, which proved to be successful [21]. It is worth noting that due to the development of new instrumentation and analysis tools, horticultural products can now be analyzed using image processing techniques. The usefulness of models based on features selected from images that were employed to perform cultivar discrimination of seeds, pits, and stones was also reported in the literature, e.g., for seeds and stones of peach [22], seeds of pepper [23], pits of sweet [24] and sour cherries [25], seeds of apples [16], and seeds of wheat [26]. Based on the promising results in the literature on the effectiveness of image analysis and machine learning to distinguish seeds and pits, the following research hypotheses were formulated to be tested by the present study: (1) Among the features of the external structure of apricot seeds, there are parameters that depend on the cultivar. (2) The application of selected image features combined with machine learning algorithms enables us to build discriminative models that can distinguish between seed samples belonging to different apricot cultivars.
The objective of this study was to develop models based on selected textures of apricot seed images to distinguish different cultivars with the use of a non-destructive, inexpensive, and objective procedure. The novelty of this study is the approach that uses texture parameters extracted from the individual color channels R, G, B, L, a, b, X, Y, and Z, along with machine learning algorithms from the Lazy, Meta, Functions, Trees, Rules, and Bayes groups in order to discriminate between apricot seeds cultivars. A large data set, including approximately 1600 textures, was considered. The textures, which varied depending on the apricot seed cultivar, were selected and used to build innovative discriminative models that may be very useful for the evaluation of apricot seed diversity, assessments of authenticity, and the detection of seed adulteration.

Materials
Mature apricots of the cultivars 'Taja', 'Early Orange', 'Harcot', and 'Bella' were collected from an orchard located in central Poland. The seeds were extracted manually from apricot stones obtained from fruits. In the case of each of the four cultivars, twentyfive seeds were obtained. The total number of seeds subjected to image analysis was one hundred.

Image Analysis
The seeds were imaged with the use of a flatbed scanner, specifically a Canon CanoScan 9000F Mark II (Tokyo, Japan). In total, images of one hundred seeds were acquired. The obtained digital color images at a resolution of 800 dpi were saved in TIFF format and then were converted to BMP to enable processing using the MaZda software version 4.7 (Łódź University of Technology, Institute of Electronics, Łódź, Poland) [27]. The first step in the image processing was the conversion of the seed images to color the channels R, G, B, L, a, b, X, Y, and Z. Color images of apricot seeds for the cultivars 'Taja', 'Early Orange', 'Harcot', and 'Bella' are presented in Figure 1. Examples of apricot seed images from different color channels are shown in Figure 2. The images, which include the seeds on a black background, were segmented to separate the seeds from the background. Image thresholding was performed by manually determining brightness regions. The background was completely black, and the seeds were light, as shown in Figure 1, so the seeds were easily separated from the background, and the preferences of the researcher performing the brightness determination did not affect the results of segmentation. Each seed was one region of interest (ROI). For each ROI, about 1600 parameters were calculated, including about 180 textures for each color channel. The image features included textures extracted based on the gradient map (5 texture parameters), the co-occurrence matrix (132 texture parameters, including 11 features computed for 3 between-pixel distances and 4 various directions), the run-length matrix (20 texture parameters, including 5 features computed for 4 various directions), the histogram (9 textures), the Haar wavelet transform (10 textures), and the autoregressive model (5 textures). Textures were duplicated for each color channel. However, due to the differences in the images, the texture values were different for each color channel. Due to the different number of textures for different matrices, most textures were unique to a matrix. Some textures, for example, Mean, Variance, Skewness, and Kurtosis, were repeated for several matrices but had different values.

Classification
The analysis to distinguish seeds belonging to different cultivars was performed using the WEKA machine learning software version 3.8.4 (University of Waikato, Hamilton, New Zealand) [28]. In the first step, models were built for the individual color channels R, G, B, L, a, b, X, Y, and Z. The next step of the analysis included developing models separately for each of the three color spaces: Lab, RGB, and XYZ. Finally, models were built for a set of textures selected from all color channels, i.e., R, G, B, L, a, b, X, Y, and Z. For each set of textures, attribute selection was carried out to choose features with the highest discriminative power. The best-first search method and CFS (correlation-based feature selection) attribute evaluator were applied. In the literature, it was reported that the number of samples should be at least 10 times greater than the number of attributes [29,30]. In the present study, images of one hundred seeds were acquired. Therefore, ten attributes were selected to build models for distinguishing the samples. Models including ten textures were used successfully to discriminate between different classes of kernels in previous studies [31]. The classification was performed using algorithms from the Lazy, Meta, Functions, Trees, Rules, and Bayes groups. A 10-fold cross-validation mode was used. The apricot seed dataset was randomly divided into 10 parts. Each part, in turn, was used as the test set, and the remaining 9 parts were used as the training sets. Thus, the learning was performed a total of 10 times with the use of different training sets. The results were presented as the average of 10 estimates [32]. Based on the obtained results, the classifiers producing the highest accuracies were selected. The accuracies (%) for each of the cultivars 'Taja', 'Early Orange', 'Harcot', and 'Bella' were determined, as was the average accuracy (%) of classification for all four cultivars.

Results and Discussion
The most accurate results were obtained using three classifiers, namely IBk (Lazy), Multilayer Perceptron (Functions), and Random Forest (Trees), and the accuracies of these methods are presented in this paper. In the case of analysis performed separately for each color channel (Table 1), the average accuracy of classification of apricot seeds for the cultivars 'Taja', 'Early Orange', 'Harcot', and 'Bella' reached 91% for the model developed based on a set of textures selected from color channel b using the Multilayer Perceptron classifier. In this case, the individual seed cultivars were classified with accuracies ranging from 86% ('Early Orange') to 96% ('Bella'). Slightly lower average accuracies of seed classification for the four apricot cultivars were determined for the IBk and Random Forest classifiers. However, in the case of the IBk classifier, quite high average accuracies of 88% and 89% were obtained for the color channels b and X, respectively. The classification carried out using the Random Forest classifier provided an average accuracy of up to 88% for color channel b. The lowest average accuracies were noted for the color channel R for all classifiers. The models developed based on textures selected from color channel R provided average accuracies of 68% for IBk and 67% for both Multilayer Perceptron and Random Forest.
In the case of models built based on textures selected from individual color spaces, apricot seeds for the cultivars 'Taja', 'Early Orange', 'Harcot', and 'Bella' were classified with higher average accuracies ( Table 2). Classification carried out using the Multilayer Perceptron classifier yielded the best results for all three color spaces (RGB, Lab, and XYZ). The set of textures selected from the color space Lab allowed for the classification of different cultivars of apricot seeds with an average accuracy of 99%. The seeds of the cultivars 'Early Orange', 'Bella', and 'Harcot' were correctly classified in 100% of cases. This meant that these apricot seeds were completely different from others in terms of a set of selected image textures, and all samples belonging to the actual classes 'Early Orange', 'Bella', and 'Harcot' were correctly included in the predicted classes 'Early Orange', 'Bella', and 'Harcot', respectively. For other color spaces (RGB and XYZ), the Multilayer Perceptron classifier provided average accuracies equal to 90% and 86%, respectively. Complete differentiation was not observed for any cultivar. The IBk classifier produced average accuracies of 81% for color space XYZ, 86% for color space RGB, and 95% for color space Lab. In the case of color space RGB, the apricot seeds for the 'Early Orange' cultivar were classified with 100% accuracy. The average accuracies obtained using the Random Forest classifier were equal to 85% for color space XYZ, 88% for color space RGB, and 91% for color space Lab. Satisfactory results yielding average accuracies in the range of 92% (IBk) to 96% (Multilayer Perceptron and Random Forest) were obtained for the model built based on a set of textures selected from all color channels, i.e., R, G, B, L, a, b, X, Y, and Z (Table 3). In the case of individual apricot seed cultivars, the accuracies were also very high. For models built using Multilayer Perceptron, accuracies were in the range of 93-100%. The highest accuracy of 100% was observed for 'Bella'. All samples of 'Bella' were correctly classified as 'Bella'. For the IBk classifier, accuracies ranged from 89% ('Taja') to 96% ('Bella'), and for the Random Forest classifier, they ranged from 93% ('Taja') to 97% ('Harcot'). Table 3. The accuracies of cultivar classification of apricot seeds based on textures selected from a set of all color channels, i.e., R, G, B, L, a, b, X, Y, and Z.

Classifier
Accuracy The obtained results were very satisfactory and proved the usefulness of image features and machine learning for the evaluation of cultivar diversity of apricot seeds. It was found that models built for textures selected from individual color spaces and color channels, as well as a set of textures selected from all color channels, i.e., R, G, B, L, a, b, X, Y, and Z, provided high discrimination accuracies. The results fully confirmed the research hypotheses. The obtained results proved the possibility of classifying apricot seeds belonging to different cultivars with a high level of accuracy using image analysis. There are literature reports on the application of image processing and machine learning for the evaluation of apricot fruit and seeds (kernels). In the case of apricots, image analysis was used for developing a classification model based on the physical features (length, width, mass, thickness, and the projected area of three perpendicular surfaces) in order to distinguish five cultivars with a top accuracy of 87.7% [33]. The classification of four apricot cultivars based on shape features was also performed by Yang, Zhang, Zhai, Pang, and Jin [21], who achieved an accuracy reaching 90.7% for a test set using machine learning. Image analysis was useful for the classification of apricots into different maturity stages (unripe, ripe, and overripe), reaching an accuracy of 0.923 [34]. A hyperspectral imaging system and multivariate analysis were applied for the detection of adulteration of almonds with apricot seeds [35]. Our own research expanded the possibilities for using image analysis for the examination of apricot. The innovative models based on attributes selected from a set of approximately 1600 texture parameters from different color channels of images were built for distinguishing apricot seeds belonging to different cultivars. The developed models can be used for the identification of the cultivar of apricot seeds and can contribute to a better understanding of apricot diversity. Due to the great usefulness of image features for distinguishing cultivars, research should be continued for more cultivars and carried out for other species. Further research may also require increasing seed numbers. in this study, the total number of apricot seeds belonging to four apricot cultivars was one hundred. Even though this number of samples was sufficient to achieve an average accuracy of classification of apricot seeds reaching 99%, increasing the number of samples may increase the accuracy. In the literature, it was reported that including features extracted from images of 30 kernels for each class can be sufficient for classification with an accuracy of up to 100% [30]. However, Shahinfar et al. [36] proved that increasing the number of training images from 10 to 1000 can lead to improved performance metrics. Although the increase in accuracy was slight in some cases, larger numbers of samples for examined apricot seeds could allow the use of deep learning, which could result in increased accuracy.

Conclusions
This study proposed a new image-processing-based method that can distinguish and classify different apricot cultivars. The use of models based on selected textures is crucial for determining the cultivar of apricot and classifying it accordingly. The textures extracted from the images allowed for cultivar discrimination of apricot seeds with an average accuracy of up to 99% for a model built using the Multilayer Perceptron classifier based on textures selected from color space Lab. In this case, three of the four cultivars were correctly classified in 100% of cases. Thus, the formulated research hypotheses were confirmed. The presence of cultivar-dependent textures on the outer surface of apricot seed was verified. After selecting the image features, the textures with the highest discriminatory power allowed for the development of models using machine learning algorithms to distinguish apricot seed cultivars with high accuracy. The obtained results are very promising. Models based on textures from images converted to different color channels can be used in practice to identify seed cultivars. However, only some models provided a satisfactory accuracy above 95%. Therefore, it may be useful to extend the present study by including more cultivars from different seasons and locations to confirm the usefulness of the applied approach to discriminating apricot seed. Funding: There was no financial support.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.