Tea Category Identiﬁcation Using a Novel Fractional Fourier Entropy and Jaya Algorithm

: This work proposes a tea-category identiﬁcation (TCI) system, which can automatically determine tea category from images captured by a 3 charge-coupled device (CCD) digital camera. Three-hundred tea images were acquired as the dataset. Apart from the 64 traditional color histogram features that were extracted, we also introduced a relatively new feature as fractional Fourier entropy (FRFE) and extracted 25 FRFE features from each tea image. Furthermore, the kernel principal component analysis (KPCA) was harnessed to reduce 64 + 25 = 89 features. The four reduced features were fed into a feedforward neural network (FNN). Its optimal weights were obtained by Jaya algorithm. The 10 ˆ 10-fold stratiﬁed cross-validation (SCV) showed that our TCI system obtains an overall average sensitivity rate of 97.9%, which was higher than seven existing approaches. In addition, we used only four features less than or equal to state-of-the-art approaches. Our proposed system is efﬁcient in terms of tea-category identiﬁcation.

The tea-category identification (TCI) is of importance for controlling fermentation time so as to obtain the expected tea. Six main types of tea were produced, including: white tea, yellow tea, green tea, Oolong tea, black tea, and post-fermented tea. In China, green, black, and Oolong tea are the most popular categories [10]. Table 1 lists their characteristics.
In the past decade, there are two novel types of methods, viz., hardware and software, for tea classification. The former is aimed at devising new devices while the latter is aimed at designing new algorithms based on computer vision, which has proven to be better than human vision in many fields [11][12][13].
Some researchers proposed the use of new measurement devices. Zhao et al. [14] employed near-infrared (NIR) to identify three categories of tea. The identification accuracies were all above 90%. Chen et al. [15] employed NIR reflectance spectroscopy to classify three different teas. Herrador and Gonzalez [16] chose eight metals as chemical descriptors, with the aim of differentiating three categories of tea. Later, Chen et al. [17] designed an electronic nose based on odor imaging sensor arrays for identify fermentation degrees. Szymczycha-Madeja et al. [18] developed a method by flame atomic absorption spectrometry (FAAS) and inductively coupled plasma optical emission spectrometry (ICP OES) to determine (non-)essential elements in black and green teas. Liu et al. [19] utilized the electronic tongue to process 43 pieces of data of black and green tea. Dai et al. [20] employed E-nose for quality classification of Xihu-Longjing tea. The aforementioned methods yielded excellent classification results; nevertheless, they needed expensive sensor devices and suffered from lengthy procedures.
In the last decade, computer vision based approaches were introduced to help develop TCI systems. However, computer vision systems are composed of intervolved parts that interact dynamically at different scales in space. In addition, the complexity in the TCI system gives rise to large-scale dynamics that cannot be summarized by a single rule. The issues in modeling TCI systems require the development of novel computational tools. For instance, Chen et al. [21] employed 12 color features and 12 texture features. Then, they employed principal component analysis (PCA) and linear discriminant analysis (LDA) to generate the classification system. Jian et al. [22] classified and graded the tea on the basis of color and shape parameters. Genetic neural-network (GNN) was used to build the identification system, which yielded promising effects of identification with eight parameters of color and shape. Laddi et al. [23] suggested acquiring tea granule images using a 3-CCD camera in the illumination condition of dual ring light. Gill et al. [24] overviewed various computer vision based algorithms for texture and color analysis with a special orientation towards monitoring and grading of made tea. Zhang et al. [25] combined three different types of features (shape, color, and texture) to identify fruit images. Tang et al. [26] extracted features from green tea leaves by combining local binary pattern (LBP) and gray level co-occurrence matrix (GLCM). Akar and Gungor [27] integrated multiple texture methods and normalized difference vegetation index (NDVI) to the random forest algorithm, in order to detect tea and hazelnut plantation areas. Wang et al. [28] utilized wavelet packet entropy (WPE). They then combined fuzzy technique with support vector machine (SVM), and named it fuzzy SVM (FSVM). Their method yielded a promising result of 97.77% recall rate. Those methods used simple digital cameras rather than expensive devices, but the identification performances did not meet the requirement of practical use.
In this study, we proposed a novel approach on the basis of computer vision and machine learning, with the aim of improving the classification sensitivities of each category. To improve the accuracy of TCI, we introduced a relatively new feature descriptor-fractional Fourier entropy (FRFE) [29]-that combines two successful components: fractional Fourier transform (FRFT) and Shannon entropy (SE). We also introduced the Jaya algorithm to train the classifier. Jaya was proposed by Rao [30]. There is no need to tune its algorithm-specific parameters (ASPs). The structure of the rest of this paper is organized as follows. Section 2 describes the sample preparation. Sections 3 and 4 show the feature extraction and reduction, respectively. Section 5 presents the classification method. Section 6 provides the experiments, results, and discussions. Finally, Section 7 concludes this study and provides the future research directions. The acronyms are explained in the Abbreviations section.

Tea Sample Preparation
Three-hundred samples of three categories of tea (green, black, and Oolong) were prepared. The tea samples originated from different areas in China. In order to augment the generalization capability of our TCI system, each category contained different brands. Table 2 shows the sample settings. The green tea has 100 samples, originating from Guizhou, Henan, Anhui, Jiangxi, Jiangsu, and Zhejiang provinces. The black tea has 100 samples, originating from Yunnan, Fujian, Hunan, and Hubei provinces. The Oolong tea has 100 samples, originating from Guangdong and Fujian provinces.  Finally, Section 7 concludes this study and provides the future research directions. The acronyms are explained in the Abbreviations section.

Tea Sample Preparation
Three-hundred samples of three categories of tea (green, black, and Oolong) were prepared. The tea samples originated from different areas in China. In order to augment the generalization capability of our TCI system, each category contained different brands. Table 2 shows the sample settings. The green tea has 100 samples, originating from Guizhou, Henan, Anhui, Jiangxi, Jiangsu, and Zhejiang provinces. The black tea has 100 samples, originating from Yunnan, Fujian, Hunan, and Hubei provinces. The Oolong tea has 100 samples, originating from Guangdong and Fujian provinces.  Figure 1 shows the TCI system used in this study. It consisted of five parts: (1) illumination system; (2) a camera; (3) a capture board; (4) computer hardware; and (5) computer software. We spread the tea-leaves thinly over the tray, and then employed a 3-CCD digital camera to acquire tea images. Compared to 1-CCD cameras, 3-CCD cameras provide high-resolution images with lower noise [23]. The lighting arrangements were classified as two types: front or back lighting [31]. Front lighting (FL) is used to produce surface extraction, whereas the back lighting (BL) is used when production of a silhouette image is needed [24]. We used front lighting (FL) in this study.

Image Acquiring
The captured images are a size of 256 × 256 with 180 dpi. They are stored without loss in TIFF format. We used two-dimensional median filters to remove noise. The tea-leaves are not reused We spread the tea-leaves thinly over the tray, and then employed a 3-CCD digital camera to acquire tea images. Compared to 1-CCD cameras, 3-CCD cameras provide high-resolution images with lower noise [23]. The lighting arrangements were classified as two types: front or back lighting [31]. Front lighting (FL) is used to produce surface extraction, whereas the back lighting (BL) is used when production of a silhouette image is needed [24]. We used front lighting (FL) in this study.
The captured images are a size of 256ˆ256 with 180 dpi. They are stored without loss in TIFF format. We used two-dimensional median filters to remove noise. The tea-leaves are not reused among the images. With the aim of using fresh tea samples, they were obtained when in stock within a four-month period.

Color Histogram
Diniz et al. [32] used a color histogram (CH) with the software ImageJ 1.44p as an analytical information to screen teas. Yu et al. [33] combined CH and texture information to classify green teas. CH represents the distribution of colors in a particular image [34]. CH counts the number of pixels that are of the similar color range. The procedure of CH contains two steps. First, we discretized each color channel (RGB) into four bins. Second, the whole image in RGB space was segmented into 4ˆ4ˆ4 = 64 bins. Third, we counted pixel numbers in each bin. The advantage of CH is its relative translation invariance and rotation invariance.

Fractional Fourier Transform
The α-angle fractional Fourier transform (FRFT) [35,36] of a particular function x(t) was denoted by Y α with following form: Here, u denotes the frequency, t the time, and T the transform kernel is defined as Here, j represents the imaginary unit. For the tea images, we need to extend FRFT to the two-dimensional space. Two angles exist for 2D-FRFT, α and β [37][38][39][40]. Furthermore, the results of FRFT with angles from 0 to 1 are listed in Figure 2. A triangular signal tri(t) is used, and its frequency spectrum is sinc 2 (u) [41]. In the figure, black lines represent real components, and blue lines imaginary components.
Entropy 2016, 18, 77 4 of 17 among the images. With the aim of using fresh tea samples, they were obtained when in stock within a four-month period.

Color Histogram
Diniz et al. [32] used a color histogram (CH) with the software ImageJ 1.44p as an analytical information to screen teas. Yu et al. [33] combined CH and texture information to classify green teas. CH represents the distribution of colors in a particular image [34]. CH counts the number of pixels that are of the similar color range. The procedure of CH contains two steps. First, we discretized each color channel (RGB) into four bins. Second, the whole image in RGB space was segmented into 4 × 4 × 4 = 64 bins. Third, we counted pixel numbers in each bin. The advantage of CH is its relative translation invariance and rotation invariance.

Fractional Fourier Transform
The α-angle fractional Fourier transform (FRFT) [35,36] of a particular function x(t) was denoted by Yα with following form: Here, u denotes the frequency, t the time, and T the transform kernel is defined as Here, j represents the imaginary unit. For the tea images, we need to extend FRFT to the two-dimensional space. Two angles exist for 2D-FRFT, α and β [37][38][39][40]. Furthermore, the results of FRFT with angles from 0 to 1 are listed in Figure 2. A triangular signal tri(t) is used, and its frequency spectrum is sinc 2 (u) [41]. In the figure, black lines represent real components, and blue lines imaginary components.

Fractional Fourier Entropy
We introduced a relatively new image feature of Fractional Fourier Entropy (FRFE), denoted by symbol E. Mathematically, we implemented Shannon entropy operator S on the spectrums obtained by FRFT Y:

Fractional Fourier Entropy
We introduced a relatively new image feature of Fractional Fourier Entropy (FRFE), denoted by symbol E. Mathematically, we implemented Shannon entropy operator S on the spectrums obtained by FRFT Y: When FRFT was implemented over the tea leave images, 25 unified time-frequency spectrums were generated with various combinations of α and β. Suppose I represents a tea image, the FRFE can be written in a matrix as Here, Y α,β denotes an FRFT perform with α-angle along x-axis and β-angle along y-axis. We set both angles to vary from 0.6 to 1 with an equal increase of 0.1, because angles close to 0 will degrade FRFT to an identity operation.

Principal Component Analysis
Principal component analysis (PCA) reduces the dimension of a data set that is composed of a large number of variables, while it reserves the most significant principal components (PC). The common method to calculate PCA is based on the covariance matrix. Suppose a dataset X with dimensions of p and a size of N. We need to preprocess the dataset to have zero mean. First, the empirical mean of each feature was obtained by Afterwards, the deviation is yielded by mean subtraction Here, B stores the centered data and e means an Nˆ1 vector of all 1s. The pˆp covariance matrix C is generated where * denotes for the conjugate transpose symbol. The covariance matrix can give an eigendecomposition expression as V´1CV " D.
Here, D is a diagonal matrix fulfilled with eigenvalues (C). V is the matrix of eigenvectors. We rearrange the eigenvector matrix V and eigenvalue matrix D so that the eigenvalue is in a decreasing order. Remember that eigenvalue represents the distribution of variances from the source data among each eigenvector, we then select a subset of eigenvectors used for basis vectors. This is done by computing the cumulative variance for each eigenvector in the form of Finally, we select L principal components (PCs) that can cover variance that is above a threshold T of original variances, GpLq Gppq ě T.

Kernel Principal Component Analysis
PCA can only extract linear structure information. Nevertheless, it cannot handle the dataset containing nonlinear structure. Kernel PCA (KPCA) is an extension of standard PCA. It transforms the input dataset into a higher dimensional space where PCA is then implemented. Note that the higher dimensional feature space is not required to be computed explicitly, because KPCA is simply implemented by computing the inner product of two vectors with a kernel function.
Several kernels are commonly used. The most commonly used is Gaussian Radial basis function (RBF) kernel with the form of where σ represents the scaling factor. Polynomial kernels are listed below where b and d are parameters, which can be tuned by grid searching approach. σ is searched in the range of [1, 100] with increment of 10. b is searched in the range of [0, 1] with increment of 0.1, and d is in the range of [1,5] with increment of 1. We termed the above polynomial kernel and RBF kernel based PCAs as POL-KPCA and RBF-KPCA, respectively.

Implementation
Revisit the feature extraction and reduction procedures, we proposed a composite feature set, which is composed of 64 color histogram (CH) features and 25 FRFE features. Afterwards, PCA, POL-KPCA, RBF-KPCA were employed to reduce the number of total features. Here, the threshold is set to that remained PCs should cover at least 99.5% variances of original full features. Figure 3 illustrates the flowchart of our feature-processing technique.

Kernel Principal Component Analysis
PCA can only extract linear structure information. Nevertheless, it cannot handle the dataset containing nonlinear structure. Kernel PCA (KPCA) is an extension of standard PCA. It transforms the input dataset into a higher dimensional space where PCA is then implemented. Note that the higher dimensional feature space is not required to be computed explicitly, because KPCA is simply implemented by computing the inner product of two vectors with a kernel function.
Several kernels are commonly used. The most commonly used is Gaussian Radial basis function (RBF) kernel with the form of where σ represents the scaling factor. Polynomial kernels are listed below where b and d are parameters, which can be tuned by grid searching approach. σ is searched in the range of [1, 100] with increment of 10. b is searched in the range of [0, 1] with increment of 0.1, and d is in the range of [1,5] with increment of 1. We termed the above polynomial kernel and RBF kernel based PCAs as POL-KPCA and RBF-KPCA, respectively.

Implementation
Revisit the feature extraction and reduction procedures, we proposed a composite feature set, which is composed of 64 color histogram (CH) features and 25 FRFE features. Afterwards, PCA, POL-KPCA, RBF-KPCA were employed to reduce the number of total features. Here, the threshold is set to that remained PCs should cover at least 99.5% variances of original full features. Figure 3 illustrates the flowchart of our feature-processing technique. As a result, 89 features were extracted from a particular tea image. We used three feature reduction approaches (PCA, POL-KPCA, RBF-KPCA) to further reduce them.

Feed-Forward Neural Network
There are many canonical classification techniques. In this paper, we employed the feedforward neural network (FNN), because it did not need information related to the probability distribution [42], nor the a priori probabilities of different classes [43]. Figure 4 illustrates the general one-hidden-layer (OHL) FNN. Within this model, there are three different types of layers: an input layer, a hidden layer, and an output layer. Their neuron numbers are represented as NI, NH, and NO, respectively. The activation function is selected as sigmoid and linear for the hidden layer and output layer, respectively. As a result, 89 features were extracted from a particular tea image. We used three feature reduction approaches (PCA, POL-KPCA, RBF-KPCA) to further reduce them.

Feed-Forward Neural Network
There are many canonical classification techniques. In this paper, we employed the feedforward neural network (FNN), because it did not need information related to the probability distribution [42], nor the a priori probabilities of different classes [43]. To build the FNN to be equal to train the weights/biases of all neurons in the FNN, which is treated as an optimization problem, i.e., we need to obtain the optimal weights/biases in order to make minimal the mean-squared error (MSE) between real outputs and target outputs.
The BP and MBP do not perform well on non-linear data [50,51]. For meta-heuristic approaches, they usually obtain better performances; however, they need to tune both common controlling parameters (CCPs) and algorithm-specific parameters (ASPs), which influence their performance and complicate their applications. Rao et al. [52] was the first to propose a meta-heuristic approach that does not require any ASP. They named their method as teaching-learning-based optimization (TLBO) [53]. Although TLBOs are widely used in various fields [54][55][56], it needs to be implemented at two phases (teacher phase and learner phase) [57].
Recently, a novel approach called Jaya (A Sanskrit word meaning victory) was proposed. Jaya not only inherits the ASP free of TLBO but is also simpler than TLBO. In addition, Rao [30] compared Jaya with the latest approaches, and they found Jaya ranks first for "best" and "mean" solutions for all 24 constrained benchmark problems and gives better performances over 30 unconstrained benchmark problems. In this study, we make a tentative proposal of using Jaya to train the weights/biases of feed-forward neural network (FNN), and we propose a novel classifier termed as Jaya-FNN. Figure 5 shows the diagram of Jaya. Here, suppose b and w represent the index of best and worst candidates among the population, and suppose i, j, k is the index of iteration, variable, and candidate. Then A(i, j, k) means the j-th variable of k-th candidate in i-th iteration. The modification formula of each candidate can be written as where A(i, j, b) and A(i, j, w) represents the best and worst value of j-th variable in i-th iteration. The r(i, j, 1) and r(i, j, 2) are two random numbers in the range of [0, 1] generated at random. The second term "r(i, j, 1)(A(i, j, b) − |A(i, j, k)|" indicates the candidate should move closer to the best one, while the third term "−r(i, j, 2)(A(i, j, w) − |A(i, j, k)|" indicates that the candidate should move further away from the worst one (note the minus symbol before r). The A(i+1, j, k) is accepted if the modified candidate is better in terms of function values, otherwise the previous A(i, j, k) is maintained. To build the FNN to be equal to train the weights/biases of all neurons in the FNN, which is treated as an optimization problem, i.e., we need to obtain the optimal weights/biases in order to make minimal the mean-squared error (MSE) between real outputs and target outputs.
The BP and MBP do not perform well on non-linear data [50,51]. For meta-heuristic approaches, they usually obtain better performances; however, they need to tune both common controlling parameters (CCPs) and algorithm-specific parameters (ASPs), which influence their performance and complicate their applications. Rao et al. [52] was the first to propose a meta-heuristic approach that does not require any ASP. They named their method as teaching-learning-based optimization (TLBO) [53]. Although TLBOs are widely used in various fields [54][55][56], it needs to be implemented at two phases (teacher phase and learner phase) [57].
Recently, a novel approach called Jaya (A Sanskrit word meaning victory) was proposed. Jaya not only inherits the ASP free of TLBO but is also simpler than TLBO. In addition, Rao [30] compared Jaya with the latest approaches, and they found Jaya ranks first for "best" and "mean" solutions for all 24 constrained benchmark problems and gives better performances over 30 unconstrained benchmark problems. In this study, we make a tentative proposal of using Jaya to train the weights/biases of feed-forward neural network (FNN), and we propose a novel classifier termed as Jaya-FNN. Figure 5 shows the diagram of Jaya. Here, suppose b and w represent the index of best and worst candidates among the population, and suppose i, j, k is the index of iteration, variable, and candidate. Then A(i, j, k) means the j-th variable of k-th candidate in i-th iteration. The modification formula of each candidate can be written as Api`1, j, kq " Api, j, kq`rpi, j, 1q pApi, j, bq´|Api, j, kq|q´rpi, j, 2q pApi, j, wq´|Api, j, kq|q , where A(i, j, b) and A(i, j, w) represents the best and worst value of j-th variable in i-th iteration. The r(i, j, 1) and r(i, j, 2) are two random numbers in the range of [0, 1] generated at random. The second term "r(i, j, 1)(A(i, j, b)´|A(i, j, k)|" indicates the candidate should move closer to the best one, while the third term "´r(i, j, 2)(A(i, j, w)´|A(i, j, k)|" indicates that the candidate should move further

Statistical Setting
In this study, we used K-fold stratified cross validation (abbreviated as SCV) for statistical analysis. By K-fold SCV, the tea samples were partitioned into K mutually exclusively folds of approximately equal size and approximately similar class distribution. To give stricter results, the K-fold SCV was repeated 10 times. We assigned K as 10 for easy and fair comparison, since the recent literature all used 10-fold cross validation. In addition, the model was established each time the training set was re-generated. Figure 6 shows one sample of three categories of teas. It also shows their corresponding color histogram (CH). The x-axis represents the 64 bins, while the y-axis represents the pixel number. Figure 6 validates the distribution of color histograms of green, Oolong and black teas were distinct from each other; therefore, the color histogram is an efficient measure for tea-category identification.

Green
Oolong Black Tea Sample CH Figure 6. Color histogram of green, Oolong, and black tea [28].

Statistical Setting
In this study, we used K-fold stratified cross validation (abbreviated as SCV) for statistical analysis. By K-fold SCV, the tea samples were partitioned into K mutually exclusively folds of approximately equal size and approximately similar class distribution. To give stricter results, the K-fold SCV was repeated 10 times. We assigned K as 10 for easy and fair comparison, since the recent literature all used 10-fold cross validation. In addition, the model was established each time the training set was re-generated. Figure 6 shows one sample of three categories of teas. It also shows their corresponding color histogram (CH). The x-axis represents the 64 bins, while the y-axis represents the pixel number. Figure 6 validates the distribution of color histograms of green, Oolong and black teas were distinct from each other; therefore, the color histogram is an efficient measure for tea-category identification.

Statistical Setting
In this study, we used K-fold stratified cross validation (abbreviated as SCV) for statistical analysis. By K-fold SCV, the tea samples were partitioned into K mutually exclusively folds of approximately equal size and approximately similar class distribution. To give stricter results, the K-fold SCV was repeated 10 times. We assigned K as 10 for easy and fair comparison, since the recent literature all used 10-fold cross validation. In addition, the model was established each time the training set was re-generated. Figure 6 shows one sample of three categories of teas. It also shows their corresponding color histogram (CH). The x-axis represents the 64 bins, while the y-axis represents the pixel number. Figure 6 validates the distribution of color histograms of green, Oolong and black teas were distinct from each other; therefore, the color histogram is an efficient measure for tea-category identification.

Green
Oolong Black Tea Sample CH Figure 6. Color histogram of green, Oolong, and black tea [28].  Figure 6. Color histogram of green, Oolong, and black tea [28].  Figure 7 shows the FRFT result of an Oolong tea image. As it shows, the FRFT degrades to standard FT when both α and β increase to one. The contents in FRFD suggest tea images contain a mass of information, so FRFT is a better tool that can provide more information than standard FT.

FRFT Results
Entropy 2016, 18, 77 9 of 17 Figure 7 shows the FRFT result of an Oolong tea image. As it shows, the FRFT degrades to standard FT when both α and β increase to one. The contents in FRFD suggest tea images contain a mass of information, so FRFT is a better tool that can provide more information than standard FT.

FRFE Results
Afterwards, we extract entropy information from each FRFD. The mean and standard deviation of each class is listed in Table 3. We can see that the FRFE is an extremely effective feature, such that the three tea categories can be segmented by the 25 FRFE features.

FRFE Results
Afterwards, we extract entropy information from each FRFD. The mean and standard deviation of each class is listed in Table 3. We can see that the FRFE is an extremely effective feature, such that the three tea categories can be segmented by the 25 FRFE features.   Figure 8 shows how KPCA extends the data to non-linear feature maps for simulation data. Figure 8a shows a three-dimensional three-class dataset. Points in class 1, class 2, and class 3 are distributed along three spheres with the same origin and different radii of 1, 2, and 3, respectively. Both PCA and KPCA reduced the features from 3 to 2. Figure 8b-d shows the feature reduction results by PCA, KPCA with polynomial kernel, and KPCA with RBF kernel, respectively.   Figure 8 shows how KPCA extends the data to non-linear feature maps for simulation data. Figure 8a shows a three-dimensional three-class dataset. Points in class 1, class 2, and class 3 are distributed along three spheres with the same origin and different radii of 1, 2, and 3, respectively. Both PCA and KPCA reduced the features from 3 to 2. We can find that the two PCs obtained by PCA cannot segment the data, since they are entangled with each other. The reduced features obtained by POL-KPCA do not overlap with each other, so they may be segmented in a nonlinear approach. Finally, the RBF-KPCA performs the best so its reduced data can be divided by linear methods. In addition, the RBF-KPCA can only preserve We can find that the two PCs obtained by PCA cannot segment the data, since they are entangled with each other. The reduced features obtained by POL-KPCA do not overlap with each other, so they may be segmented in a nonlinear approach. Finally, the RBF-KPCA performs the best so its reduced data can be divided by linear methods. In addition, the RBF-KPCA can only preserve only one reduced feature (i.e., PC1), and it can still be easily classified. Although KPCA has the above advantage, it requires added complexity and additional model hyperparameters to tune.

KPCA over Tea Features
The 89 features of each tea image are reshaped as a row vector. Then, each 89-element feature vector of all tea images were justified to form one two-dimensional array. Figure 9 illustrates the only one reduced feature (i.e., PC1), and it can still be easily classified. Although KPCA has the above advantage, it requires added complexity and additional model hyperparameters to tune.

KPCA over Tea Features
The 89 features of each tea image are reshaped as a row vector. Then, each 89-element feature vector of all tea images were justified to form one two-dimensional array. Figure 9 illustrates the curve of variance versus PC number by PCA, POL-KPCA, and RBF-KPCA approaches. The optimal parameters of the POL and RBF kernels were chosen by grid search technique with the aim of accumulating more variance with less number of PCs. They were implemented on the overall dataset. From curves in Figure 9, we can see that the PCs selected under the same threshold of 99.5% are four, five, and seven for RBF-KPCA, POL-KPCA, and standard PCA, respectively. We can conclude that RBF-KPCA is superior to the other two approaches in reducing features in terms of cumulative variance. The result falls within the finding set out in Section 6.4.

Training Comparison
Since RBF-KPCA obtains the least feature while yielding the largest cumulative variance, we believe its reduced features are more effective and will use it in the following experiments. The four reduced features are fed into FNN. Here, the number of input neurons is assigned with a value of four, since the number of features is four. The number of output neurons is set to three, since we need to classify from three classes: green, Oolong, and black. The number of hidden neurons is determined by cross-validation. Each time the training set was updated, we performed the grid search method to find the optimal number of hidden neurons.
We compared the Jaya method with different training methods: BP, MBP, GA, PSO, and SA. The maximum iterative epochs (MIEs) of all approaches are determined as 1000, since this value is rather large for such a small network. Actually, some approaches will converge at about 50 epochs. Hence, Figure 10 shows the ability to converge to a global optimal point. The populations are all set to 20. The ASPs of each algorithm were obtained by trial-and-error approach. Their values are listed in Table 4. The average and standard deviation (SD) of MSEs are shown in Figure 10, and their detailed data is shown in Table 5. Table 5 shows that Jaya can find the least averaged MSE among all approaches. It achieves 0.0072 mean MSE, less than BP of 0.1242, MBP of 0.0939, SA of 0.1043, GA of 0.0266, and PSO of 0.0197. This again validates the superiority of Jaya. The reason may stem from that other algorithms may have a bad performance if the ASPs are not tuned well, while Jaya does not need to tune ASPs. Since Jaya has shown more superior performances than other approaches, we will use Jaya as the default approach for training FNN. From curves in Figure 9, we can see that the PCs selected under the same threshold of 99.5% are four, five, and seven for RBF-KPCA, POL-KPCA, and standard PCA, respectively. We can conclude that RBF-KPCA is superior to the other two approaches in reducing features in terms of cumulative variance. The result falls within the finding set out in Section 6.4.

Training Comparison
Since RBF-KPCA obtains the least feature while yielding the largest cumulative variance, we believe its reduced features are more effective and will use it in the following experiments. The four reduced features are fed into FNN. Here, the number of input neurons is assigned with a value of four, since the number of features is four. The number of output neurons is set to three, since we need to classify from three classes: green, Oolong, and black. The number of hidden neurons is determined by cross-validation. Each time the training set was updated, we performed the grid search method to find the optimal number of hidden neurons.
We compared the Jaya method with different training methods: BP, MBP, GA, PSO, and SA. The maximum iterative epochs (MIEs) of all approaches are determined as 1000, since this value is rather large for such a small network. Actually, some approaches will converge at about 50 epochs. Hence, Figure 10 shows the ability to converge to a global optimal point. The populations are all set to 20. The ASPs of each algorithm were obtained by trial-and-error approach. Their values are listed in Table 4. The average and standard deviation (SD) of MSEs are shown in Figure 10, and their detailed data is shown in Table 5.      Table 5 shows that Jaya can find the least averaged MSE among all approaches. It achieves 0.0072 mean MSE, less than BP of 0.1242, MBP of 0.0939, SA of 0.1043, GA of 0.0266, and PSO of 0.0197. This again validates the superiority of Jaya. The reason may stem from that other algorithms may have a bad performance if the ASPs are not tuned well, while Jaya does not need to tune ASPs. Since Jaya has shown more superior performances than other approaches, we will use Jaya as the default approach for training FNN.

Feature Comparison
We compared three types of features: (1) only CH; (2) only FRFE; and (3) CH with FRFE, then reduced by KPCA. Classifiers were all set to Jaya-FNN. The measure was chosen as average sensitivity rate (ASR) based on a 10ˆ10-fold SCV. The ASR is defined as the average value of sensitivity rate of 10 runs over the 10 validation folds. Table 6 lists the feature comparison results.  Oolong, and black tea, respectively. The results demonstrate that the combined feature performs better than single feature set, hence, the combination strategy is effective. Besides, the FRFE gives better results than CH, which suggests that the fractional features are more suitable to identify tea categories than color features.

Conclusions
Our contributions cover the following points: (i) We introduced a novel feature-fractional Fourier entropy (FRFE)-and showed its effectiveness in extracting features for tea images; (ii) We proposed a novel classifier-Jaya-FNN-by combining feedforward neural network with Jaya algorithm; (iii) We demonstrated that our TCI system was better than seven state-of-the-art approaches in terms of overall ASR and, meanwhile, our system used the least features.
In the future, we will test other newly proposed feature-extraction methods, such as displacement field [58,59] and image moments [60]. We will try to improve the classification performance, by introducing new swarm intelligence methods to train FNN classifier, or by introducing newly proposed classifiers, such as variants of SVMs. Besides, our method may be applied to X-ray image, magnetic resonance image, Alzheimer's disease images, and microglia image.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: