Classiﬁcation and Analysis of Pistachio Species with Pre-Trained Deep Learning Models

: Pistachio is a shelled fruit from the anacardiaceae family. The homeland of pistachio is the Middle East. The Kirmizi pistachios and Siirt pistachios are the major types grown and exported in Turkey. Since the prices, tastes, and nutritional values of these types differs, the type of pistachio becomes important when it comes to trade. This study aims to identify these two types of pistachios, which are frequently grown in Turkey, by classifying them via convolutional neural networks. Within the scope of the study, images of Kirmizi and Siirt pistachio types were obtained through the computer vision system. The pre-trained dataset includes a total of 2148 images, 1232 of Kirmizi type and 916 of Siirt type. Three different convolutional neural network models were used to classify these images. Models were trained by using the transfer learning method, with AlexNet and the pre-trained models VGG16 and VGG19. The dataset is divided as 80% training and 20% test. As a result of the performed classiﬁcations, the success rates obtained from the AlexNet, VGG16, and VGG19 models are 94.42%, 98.84%, and 98.14%, respectively. Models’ performances were evaluated through sensitivity, speciﬁcity, precision, and F-1 score metrics. In addition, ROC curves and AUC values were used in the performance evaluation. The highest classiﬁcation success was achieved with the VGG16 model. The obtained results reveal that these methods can be used successfully in the determination of pistachio types.


Introduction
Pistachio (Pistacia vera L.) is an agricultural product native to the Middle East and Central Asia. The world's major pistachio producers, Iran, the USA, Turkey, and Syria, contribute close to 90% of the total production worldwide [1]. Pistachio production in Turkey covers a large number of types in different names. Among these pistachio types, the most preferred ones are Kirmizi, Siirt, and Halabi. The Red Aleppo, widely grown in Syria, is the Turkey variety. However, it is a less preferred variety. Again, Syria-specific Achoury, Alemi, El Bataury, Obaid, and Ayimi are types also grown in Turkey [2][3][4].
Pistachio kernel, which is a good source of fat (50%-60%), contains unsaturated fatty acids necessary for the nutrition of humans. It is widely used in the manufacture of confectionery and snack foods. Due to the dark green color of its kernel, pistachio is highly preferred in ice cream and pastry industries [5]. The shell (endocarp) of the pistachio is hulled along its seams. This is desirable since pistachios are often marketed in their shells to be eaten by hand as a kind of snack food [2][3][4]6].
As an expensive agricultural product, the pistachio's reflected price to the consumer depends on the quality of the product. Therefore, determining the quality of shelled pistachios is an important issue in terms of economy, export, and marketing. Increased quality will lead to improvements in consumption and marketing. Besides, it is equally important to determine the quality of pistachios with high accuracy and easy applications via smart systems to prevent economic losses in terms of export and marketing [7,8]. Consequently, new methods and technologies are needed for the separation and classification of pistachios.
The aim of this study is to be able to classify pistachio types by using their images in a quick and effective way. Within the scope of the study, 1. Kirmizi and Siirt pistachio kernels were collected for this study. Each pistachio image was created by us for this study with a specially designed computer vision system.

2.
A dataset of 2148 images was obtained, with collected images of two pistachio types that are commonly grown.

3.
In order to determine the most suitable classification model, the most successful model was determined via classification performed by using different CNN architectures.

4.
A comprehensive analysis of CNN models was carried out and preliminary preparations were made for future studies.
The remainder of this study is organized as follows: in the second section, studies in the literature related to this subject are mentioned, followed by information about the image acquisition, dataset, confusion matrix, performance metrics, and CNN architectures used in the study. In the third section, transfer learning is given, and in the fourth section, the experimental results are presented. Finally, in the fifth section, the evaluation of the experimental results and suggestions are given.

Related Works
In recent years, there have been many studies in the literature focusing on the classification of agricultural products with deep learning and machine learning methods [9][10][11][12][13]. However, the number of studies about the classification of pistachio types, especially studies where deep learning is used, is quite limited. These studies in the literature are summarized below.
Mahdavi-Jafari et al. introduced an intelligent system for classification of pistachios (open shell, filled and closed shell, empty and closed shell) using ANN. The ANN was trained based on the analysis of acoustic signals generated from pistachio impacts with a steel plate. Fast Fourier transform (FFT), discrete cosine transform (DCT), and Discrete Wavelet Transform (DWT) were used for signal processing. According to the results of the study, the proposed method has a performance with an accuracy higher than 99.89% [7].
In their study, Farazi et al. used deep learning and machine learning architectures to distinguish open-shelled pistachios from other rotten pistachios and shells. A set of 1000 individual images obtained from pistachios and shells was increased to 20,000 by data augmentation. From these images, features were extracted using the architectures of AlexNet and GoogleNet, and the 300 most effective features were selected with PCA, which is among these features. Finally, these features were given as input to the SVM classifier and the classification operations were performed. The highest classification accuracy of 99% was achieved with the features obtained from the GoogleNet architecture [14].
Omid et al., in their study, proposed a system based on image processing and machine learning techniques to classify peeled pistachios. The peeled pistachios were graded into five classes using artificial neural networks (ANN) and support vector machine (SVM) for classification. While the classification accuracy is 99.4% with ANN, this rate is 99.8% with SVM [15].
In their study, Abbaszadeh et al. used deep auto-encoder neural networks to classify pistachios as defective and flawless. As a result of the study, a classification accuracy of 80.3% was obtained in the detection of defective pistachios [16].
In another study on the classification of defective and perfect pistachios, Dini et al. benefited from pre-trained CNN-based algorithms. The results obtained from a total of 958 images show that classification accuracy of 95.8%, 97.2%, and 95.83% was achieved from the models GoogleNet, ResNet, and VGG16, respectively [17].
In the study carried out by Dheir et al., CNN algorithms were utilized to classify nuts. Within the scope of the study, five types of nuts, namely, chestnut, hazelnut, nut forest, nut pecan, and walnut, were classified with a dataset including 2868 images. As a result, an accuracy of 98% was achieved via pre-trained ConvNet [18].
In their study, Vidyarthi et al. used the random forest (RF) machine learning technique to estimate the size and mass of pistachio kernels. It was stated that a similar close correlation for the mass of manually measured pistachio kernel and the estimated mass is at the 95% confidence level [19].
Rahimzadeh and Attar proposed a system with computer vision to determine whether different pistachio types are open-mouth or closed-mouth. CNN-based ResNet50, ResNet152, and VGG16 models were used to extract features and classify pistachio images. The average classification success achieved via these models was 85.28%, 85.19%, and 83.32%, respectively [20].
In the study conducted by Ozkan et al., image processing and machine learning techniques were utilized to classify two different types of pistachios. A total of 16 feature extractions were performed by using pistachio images. At the same time, principal component analysis (PCA) has been applied to improve accuracy performance, reduce the number of features, and improve the distribution of samples. For classification, a k-nearest neighbors (KNN) algorithm was used and a classification accuracy of 94.18% was obtained [21]. In Table 1, the results obtained from the studies conducted with Pistachio are given.

Image Acquisition
Kirmizi and Siirt types of pistachio images were used in this study. Pistachio kernels were collected for this study. In the computer vision system developed for this study, the images of the pistachios placed in a special lighting box were obtained through the Prosilica GT2000C camera. The computer vision system provides a high level of repeatability at a relatively low cost. At the same time, more importantly, it provides high-quality images without compromising accuracy. Equipped with a CMOS type sensor, the Prosilica GT2000C is an RGB camera capable of capturing images with a resolution of 2048 × 1088 and a maximum frame rate of 53.7 fps. During the image acquisition process, a background surface was used by fixing the camera at a certain distance in order to prevent shadows. These images were used by the authors with different features in a different study [21].

Pistachio Image Dataset
A total of 2148 images were obtained, 1232 of which were Kirmizi pistachios and 916 Siirt pistachios. Acquired images contain original pistachio images for deep learning models [21]. Each image used in the study was sized as 600 × 600 pixels. The sample pistachio images in the dataset are given in Figure 1.

Convolutional Neural Network
Deep learning, a popular machine learning technique that has been studied broadly in recent years, is a multi-layered method used to extract and define features from large number of data [22]. It contains separate layers with specific tasks such as convolution layer, activation layer, pooling, flatten layer, and fully connected layers [23].
Convolution layer: The convolution layer is a functional layer used to extract features from input data. The input vectors are scanned with a defined filter and the data are transformed into feature space with a locally weighted sum aggregation [24]. In this layer, which is the first convolution layer directly connected to the image set, low-level feature extraction such as colors and edges can be performed [25].
Activation Layer (Non-linearity layer): An activation layer is the layer on which a nonlinear function is applied for each pixel on the images [26]. In recent studies, the rectified linear units (ReLu) activation function has started to be used instead of the most commonly used sigmoid and hyperbolic tangent activation functions [27].
Pooling (Down-sampling) layer: Another building block of the CNN architecture, the pooling layer, reduces the number of parameters and the amount of computation in the network, which has two benefits. The first is to reduce the amount of computation for the next layer, and the second is to prevent the network from memorizing. Average, maximum, sum, and mean pooling are the methods commonly used for the pooling layer [28].
Flatten layer: The task of this layer is simply to prepare the input data for the last layer. Since neural networks take input data as one-dimensional arrays, it is a layer where matrix-type data from other layers are converted into one-dimensional arrays. As each pixel of the image is represented by a single line, this process is called smoothing [28].
Fully connected layers: This layer is dependent on all fields of the previous layer. The number of this layer may vary in different architectures. At the nodes in these layers, the features are kept, and the learning process is carried out by changing the weight and bias values. This layer, which is responsible for performing the actual processing by taking input from all the various feature extraction stages, analyzes the outputs of all the processing layers [29].

Transfer Learning
Transfer learning, which aims to use the knowledge learned during the education phase in solving different or similar problems, has recently become more popular. Transfer learning is basically the process of transferring weights to networks using features previously learned from a large dataset [30]. Using a pre-trained CNN architecture allows researchers to apply features extracted from the last layer with different classifiers before fully connected layers. These architectures' pre-training can yield positive results in terms of performance [31].

Pre-Trained CNN Models
There are many CNN architectures in the literature. Classification successes, model sizes, and speeds are taken into account when deciding which of these architectures to use. The models used in the study were decided after many trials, and the most successful and fastest models were preferred.

AlexNet
Developed by Krizhevsky, Ilya Sutskever, and Geoff Hinton, AlexNet was the first to become popular among CNN architectures [32]. AlexNet performed significantly well in the 2012 ILSVRC competition. Having an architecture similar to LeNet, AlexNet is one of the deepest and most important architectures where all convolution layers are brought together [33].

VGG16
The VGG16 architecture, developed by Karen Simonyan and Andrew Zisserman, was introduced at the ILSVRC 2014 competition. The highlight of this architecture is that the system depth can be a major factor for a better performance. The number of "convolutional" and "fully connected layers" of VGG16 is 16 in total. The difference from the AlexNet architecture is that it uses fixed-sized filters. Due to the extensive training of the network, VGG16 is able to offer accuracy to an excellent degree even on datasets with a small number of images [34,35].

VGG19
VGG19 has a total of 24 main layers: 16 convolutional, 5 pooling, and 3 fully connected layers. Convolution layers use filters of 3 × 3 to preserve spatial dimensions during activation phases. After each convolution, maximum pooling is applied to reduce the spatial dimension by using the ReLu activation function. A filter of 2 × 2 steps with no padding is used to improve accuracy in the pooling layer [36,37].

Confusion Matrix
The complexity matrix is utilized to evaluate the predictive performance of training and test data. The values in the matrix are widely used for performance measurement of classification problems [38]. The complexity matrix of a two-class classification problem in the study is as in Table 2. The meanings of the rows and columns are as follows.

Performance Metrics
Performance metrics are used to evaluate classifier performances. Within the scope of this study, true-positive (TP), true-negative (TN), false-positive (FP), and false-negative (FN) values, and basic performance metrics of accuracy, F-1 score, sensitivity, precision, and specificity, were calculated [39][40][41][42]. The calculation of these metrics is shown in Table 3.

ROC and AUC
One of the commonly used techniques to evaluate the classification rule's performance in classification problems is the receiver operating characteristic (ROC) curve. The x-axis of this curve represents specificity and the y-axis represents sensitivity. The area under the curve (AUC) is a generally accepted measure that determines whether a certain condition exists about test data. That the value of this field close to 1 indicates that the model has a good performance [43].

Experimental Results
In this chapter, classification analyzes of three different models created with transfer learning are performed in order to classify pistachio images. The models were trained through the pistachio images with pre-trained models, AlexNet, VGG16, and VGG19 using the transfer learning method. The outputs of the models were adjusted into two classes, in accordance with the pistachio dataset used in the study. The experiments were carried out through a computer with an Intel i5 10200H 2.4 GHz processor, 16 GB RAM, and GTX 1650 Ti graphics card. The working environment was MATLAB 2020b. The models' training was performed by determining eight epochs and a mini-batch size of 11. The learning rate was determined as 0.0001 and the SGDM optimization method was chosen as the solvent.
The pistachio dataset used in the training of AlexNet, VGG16, and VGG19 models is divided as 80% training and 20% validation. Figure 2 shows the block diagram of the processes from data acquisition to classification.
Due to the difference in the number of layers in AlexNet, VGG16, and VGG19 architectures, the learning level and classification success of the model may vary. When a new CNN model is created, it may take a long time to find training convolution, pooling, number of activation layers, and parameter values. Hence, new models were created via the transfer learning method by using previously trained models with high classification success. In the transfer learning method, the training of the model can be carried out successfully by making changes in the layers of the existing architectures. By making changes in the layers of the existing architectures in the transfer learning method, the model can be trained successfully.
In order to train with pistachio images by using the weights of the pre-trained models used in the study, some layers needed to be revised. In all models, the input of the fc8 layer was 4096. However, since there were two classes in the dataset used in the study, the input of the fc8 layer was changed to two for all models. As the softmax and classification layers were associated with the fc8 layer, these layers were changed for all models. AlexNet architecture consists of one image input, five convolution, seven ReLu, two normalization, five pooling, three fully connected, two dropout, one softmax, and one classification layers. Figure 3 shows the fine-tuning performed on the AlexNet model used in the study. VGG16 architecture has 1 image input, 13 convolutions, 12 ReLu,1 normalization, 5 pooling, 3 fully connected, 2 dropouts, 1 softmax, and 1 classification layer. Figure 4 shows fine-tuning performed on the VGG16 model. Unlike VGG16 architecture, VGG16 and VGG19 architectures have extra convolution, ReLu, and pooling layers in blocks 3, 4, and 5. Figure 5 shows fine-tuning performed on the VGG19 model.  As can be seen in Figures 3-5, the number of layers in all three models is different. Therefore, the features extracted from the images also differ. After each layer of the models, different features of the image are extracted and classified. Figure 6 gives the views of some activation maps of the sample image given as input to AlexNet, VGG16, and VGG19 models. As a result of fine-tuning performed on AlexNet, VGG16, and VGG19 architectures, the models were trained by using pistachio images via the transfer learning method. In order to objectively measure the performance of the models, the parameters were not changed during the training. The training times also differed due to the different number of layers in the models. Table 4 shows the training times of the three models. As a result of the training of all three models, a classification success of 94.42%, 98.84%, and 98.14% was obtained from the AlexNet, VGG16, and VGG19 models, respectively. Figure 7 shows the training, validation, and loss graphs for all three models.
The dataset is divided as 80% training and 20% testing. Figure 8 shows the confusion matrices obtained as a result of the classifications performed through the models.
When Figure 9 is examined, it can be seen that the number of misclassified pistachio images is 23 in the AlexNet model, while this number is 5 in the VGG16 model and 8 in the VGG19 model. The number of correctly classified pistachio images in AlexNet, VGG16, and VGG19 models, is 406, 425, and 422, respectively. Performance metrics of the models were calculated by using the confusion matrix data. The models' performance metrics are shown in Table 5 and the graphs for these metrics are given in Figure 9.
According to Table 5, the highest classification success belongs to the VGG16 model. In the VGG16, it can be seen that sensitivity, specificity, precision, and F-1 score metrics are also the highest as in the classification success. Table 6 shows the classification success of the models in percent.
ROC curves provide information about the learning levels of the models. Figure 10 gives the ROC curves of the models created by using sensitivity and specificity metrics. Figure 10 shows the ROC curves of all models. AUC, the area under the ROC curve, shows the learning level of the models. The AUC takes a value between 0 and 1. It is seen that the closer the AUC value is to 1, the higher the learning level of the model. The AUC values of AlexNet, VGG16, and VGG19 models are 0.989, 1, and 0.996, respectively.

Discussion and Conclusions
This study aimed to classify pistachio images by using three different convolutional neural network models. The dataset of the study includes a total of 2148 images of Kirmizi and Siirt pistachio types. In the creation of the models, pre-trained CNN architectures with proven classification success were used. The pre-trained models, AlexNet, VGG16, and VGG19, were fine-tuned and the models were trained through pistachio images. The dataset is split into parts, 80% training and 20% test. As a result of the performed classifications, respectively, 94.42%, 98.84%, and 98.14% classification success was obtained from the AlexNet, VGG16, and VGG19 models. It is seen that the highest classification success was achieved with the VGG16 model. Various confusion matrices and performance metrics were used to analyze the models' performances in more detail. Again, the highest values in these metrics were obtained with the VGG16 model. In CNN architectures, the number of layers is not always directly proportional to the classification success. High classification success can be achieved with models containing the optimum number of layers according to the dataset used. VGG16 architecture has come to the fore as the most suitable CNN architecture for pistachio dataset. Therefore, the highest classification success was obtained from the VGG16 architecture. Using the confusion matrix, it was observed which pistachio type is classified correctly or incorrectly in what number.
When the literature is examined, there are cases of pistachios being open and closed shell [7,20], being defective [14,16], classification of pistachios [15], and classification of pistachio varieties by machine learning [21]. Only one of these studies is related to our study [21]. Ozkan, Koklu, and Saracoglu achieved the highest classification success of 94.18% in their study. In this study, the highest classification success was 98.84%.
The methods used in the study set as an example for future studies in this field. On the other hand, it is possible to achieve different classification successes by utilizing different artificial intelligence methods. Depending on the number of images in the dataset, different classification successes can be obtained from different models. Pistachios can be distinguished quickly and effectively. A diverse classification study can be conducted by collecting different types of pistachio images. The application can be made mobile and used to determine the type of pistachio in an agriculture field.