Intelligent Identification Method of Geographic Origin for Chinese Wolfberries Based on Color Space Transformation and Texture Morphological Features

Geographic origins play a vital role in traditional Chinese medicinal materials. Using the geo-authentic crude drug can improve the curative effect. The main producing areas of Chinese wolfberry are Ningxia, Gansu, Qinghai, and so on. The geographic origin of Chinese wolfberry can affect its texture, shape, color, smell, nutrients, etc. However, the traditional method for identifying the geographic origin of Chinese wolfberries is still based on human eyes. To efficiently identify Chinese wolfberries from different origins, this paper presents an intelligent identification method for Chinese wolfberries based on color space transformation and texture morphological features. The first step is to prepare the Chinese wolfberry samples and collect the image data. Then the images are preprocessed, and the texture and morphology features of single wolfberry images are extracted. Finally, the random forest algorithm is employed to establish a model of the geographic origin of Chinese wolfberries. The proposed method can accurately predict the origin information of a single wolfberry image and has the advantages of low cost, fast recognition speed, high recognition accuracy, and no damage to the sample.


Introduction
Chinese wolfberry is the dry and mature fruit of Lycium barbarum L., which is a plant of the Solanaceae family. Chinese wolfberry has the effects of nourishing the liver and kidneys and improving eyesight [1]. From 2011 to 2020, Chinese wolfberry consumption increased by an average of 9.7%. From 2012 to 2022, the Chinese national wolfberry planting scale increased rapidly, and the total output increased by more than two times [2]. At present, the Chinese wolfberry market has a large demand and a wide planting area. However, Chinese wolfberries from different origins are different in the growth environment, cultivated varieties, harvest period, processing, and storage, which leads to differences in the use of clinical compatibility [3]. This shows that studying the geo-herbalism of Chinese wolfberry has great significance for clinical treatment.
Nowadays, the methods for identifying the origin of food or medicinal materials mainly include artificial naked eye recognition [4], chemical composition analysis [5][6][7][8][9], near-infrared spectral analysis [10][11][12], image recognition, and other methods [13][14][15]. The traditional artificial naked eye method has low identification efficiency, and the screening 2 of 15 results are affected by subjective factors. At present, there are many studies on the determination of the chemical composition of food or medicinal materials as identification features. Zinicovscai et al. have predicted the origin of wine by measuring trace elements in wine [5]. Takashima et al. have predicted the origin of the product by determining the DNA of several types of aquatic products [6]. Zhao et al. have analyzed the chemical components of Chinese wolfberry from different origins, and the experimental results showed that the overall characteristics of water-soluble nutrients in the samples were quite different [7]. Liu et al. proposed that there are significant differences in the polysaccharide, total flavonoids, and total phenolic contents of wolfberry leaves in different planting areas of wolfberry [8]. Bai et al. have established the HPLC (High Performance Liquid Chromatography) fingerprint of wolfberry, identified the main common peaks by LC-MS (Liquid Chromatography with Mass Spectrometry), and performed cluster analysis and discriminant analysis on the samples of wolfberry from four main producing areas [9]. However, these methods require trained technicians to extract the components of the sample through professional means and instruments. The process is complicated to operate and requires expensive instrumentation, so it is not universally popular.
In academia, there is also a method of collecting near-infrared spectra of food or medicine as an identification feature. Zhu et al. have used near-infrared spectroscopy to predict and discriminate the types of tobacco origin [10]. Tang et al. have used near-infrared spectroscopy to scan forty samples of wolfberry from eight different origins [11]. Wang et al. have used near-infrared hyperspectral images to identify the origin of wolfberry from five sources [12]. However, the collection of near-infrared spectra requires grinding wolfberry into powder, resulting in the loss of experimental materials. Moreover, the near-infrared spectrometer is expensive, and it is difficult for small enterprises to afford the instrument's cost, so this method also has certain limitations.
It can be seen that whether it is to extract the chemical components of food or medicinal materials or to collect near-infrared spectra as identification features, there are limitations, such as high cost, strong professionalism, and low identification efficiency. Machine learning is a branch of artificial intelligence that enables computers to automatically learn the laws of data through models and algorithms, so as to realize the prediction and classification of new data. There are mainly supervised, unsupervised, semi-supervised, and reinforcement learning types, among which supervised learning is the most common. Algorithms are very important in machine learning, such as linear regression, decision trees, neural networks, etc., and feature engineering is also an important link. Machine learning technology is widely used in natural language processing, image recognition, medical diagnosis, and other fields. With the development of computer vision and machine learning technology, many researchers have established an image-based origin identification model to identify the origin of food or medicinal materials by collecting images of samples and conducting training and learning. De et al. have identified charcoal sources using macro images and deep learning algorithms [13]. Wang et al. have used the methods of image and visual information and machine learning to intelligently identify the origin of Angelica [14]. Wang et al. have presented an efficient and convenient identification method based on image processing [15]. The existing research can fully show that the method of origin identification based on images has the advantages of non-destructive samples, low cost, a high recognition rate, and fast recognition speed.
It has been proposed that texture morphology is an important feature for identifying objects or regions of interest in any image [16]. Studies have shown that Chinese wolfberries produced from different origins are significantly different in shape, size, and texture. For example, the texture of the Chinese wolfberry from Qinghai province is relatively vague, and the size is medium. The texture of the Chinese wolfberry from Qinghai province is particularly clear, and its size is relatively large [4]. Therefore, it is feasible to extract the texture and morphological features in the image to identify the geographic origin of Chinese wolfberry. The HIS (Hue Saturation Intensity) color space is based on the principle of human eye imaging and presents the hue, saturation, and brightness information of the image [17,18]. Converting the original image into HSI space can weaken the influence of light changes, thereby enhancing the stability of the algorithm. Based on HSI, the color image of standing trees in the forest area is segmented and accurately extracted from the background [19]. In this paper, after converting the original image into the HSI color space, the Gabor transformation is separately performed on the H channel, S channel, and I channel to extract the texture features of the Chinese wolfberry. Finally, the Hu invariant moments are calculated to extract the morphological features. The texture features of the image can be well extracted through the transformation of Gabor at different angles [20]. The Hu invariant moment feature has the characteristics of translation, rotation, and scale invariance [21], and the morphological characteristics of wolfberries can be preserved by calculating the Hu invariant moment. The main contributions of this study are as follows:

•
The established wolfberry image database contains images of the Chinese wolfberry from the four major production areas: Gansu, Inner Mongolia, Ningxia, and Qinghai. The images have been marked with labels of origin.

•
The shape and texture features of the Chinese wolfberry are represented. The features can accurately reflect the characteristics of the Chinese wolfberry.

•
An intelligent identification method for the geographic origin of the Chinese wolfberry is presented, and the accuracy of this method has increased by more than 60% compared with the existing related methods.

Methods
In this paper, the identification method of geographic origin for Chinese wolfberry based on color space transformation and texture morphological features is proposed. The flow chart is shown in Figure 1. First of all, the data on wolfberry samples was collected by a high-definition digital camera. Secondly, the image is cropped to a suitable size, and the background of the image is processed into a uniform black. Furthermore, after transforming the image into HSI space, the texture and shape of the image are extracted as recognition features. Finally, the random forest (RF) algorithm is employed to establish the identification model of the geographic origin of Chinese wolfberries. Chinese wolfberry. The HIS (Hue Saturation Intensity) color space is based on the principle of human eye imaging and presents the hue, saturation, and brightness information of the image [17,18]. Converting the original image into HSI space can weaken the influence of light changes, thereby enhancing the stability of the algorithm. Based on HSI, the color image of standing trees in the forest area is segmented and accurately extracted from the background [19]. In this paper, after converting the original image into the HSI color space, the Gabor transformation is separately performed on the H channel, S channel, and I channel to extract the texture features of the Chinese wolfberry. Finally, the Hu invariant moments are calculated to extract the morphological features. The texture features of the image can be well extracted through the transformation of Gabor at different angles [20]. The Hu invariant moment feature has the characteristics of translation, rotation, and scale invariance [21], and the morphological characteristics of wolfberries can be preserved by calculating the Hu invariant moment. The main contributions of this study are as follows: • The established wolfberry image database contains images of the Chinese wolfberry from the four major production areas: Gansu, Inner Mongolia, Ningxia, and Qinghai. The images have been marked with labels of origin.

•
The shape and texture features of the Chinese wolfberry are represented. The features can accurately reflect the characteristics of the Chinese wolfberry.

•
An intelligent identification method for the geographic origin of the Chinese wolfberry is presented, and the accuracy of this method has increased by more than 60% compared with the existing related methods.

Methods
In this paper, the identification method of geographic origin for Chinese wolfberry based on color space transformation and texture morphological features is proposed. The flow chart is shown in Figure 1. First of all, the data on wolfberry samples was collected by a high-definition digital camera. Secondly, the image is cropped to a suitable size, and the background of the image is processed into a uniform black. Furthermore, after transforming the image into HSI space, the texture and shape of the image are extracted as recognition features. Finally, the random forest (RF) algorithm is employed to establish the identification model of the geographic origin of Chinese wolfberries.

Sample Preparation and Image Acquisition
In China, Ningxia, Gansu, and Qinghai are the main producing areas of the Chinese wolfberry [22]. A total of 90 single wolfberries were collected from four producing areas, i.e., Ningxia, Gansu, Qinghai, and Inner Mongolia, as experimental samples and image data. The images were captured with a Canon full-frame digital camera, the EOS 5DS R, and a Sigma wide-angle lens of 24-35 mm under the same conditions to ensure the consistency and integrity of the texture and shape of the Chinese wolfberries. The original information is shown in Table 1. In Table 1, GS means Gansu, NM means Inner Mongolia, NX means Ningxia, and QH means Qinghai. The sample images of the Chinese wolfberries from the four origins are shown in Figure 2. The pixels of the four pictures are 8688 × 5792. From Figure 2, it is difficult for us to judge the origin type of the Chinese wolfberry with the naked eye.

Sample Preparation and Image Acquisition
In China, Ningxia, Gansu, and Qinghai are the main producing areas of the Chinese wolfberry [22]. A total of 90 single wolfberries were collected from four producing areas, i.e., Ningxia, Gansu, Qinghai, and Inner Mongolia, as experimental samples and image data. The images were captured with a Canon full-frame digital camera, the EOS 5DS R, and a Sigma wide-angle lens of 24-35mm under the same conditions to ensure the consistency and integrity of the texture and shape of the Chinese wolfberries. The original information is shown in Table 1. In Table 1, GS means Gansu, NM means Inner Mongolia, NX means Ningxia, and QH means Qinghai. The sample images of the Chinese wolfberries from the four origins are shown in Figure 2. The pixels of the four pictures are 8688 × 5792. From Figure 2, it is difficult for us to judge the origin type of the Chinese wolfberry with the naked eye.

Image Preprocessing
The use of a wide-angle lens makes the size of a single wolfberry too small. Therefore, the captured wolfberry images have a large amount of blank space, which will have a significant impact on the training and recognition of the model. In this paper, the image is preprocessed to remove invalid background information. First, the RGB images are converted into grayscale images by Formula (1). In the formula, Rij, Gij, and Bij are the values of the pixel points in row i and column j of the R channel, G channel, and B channel in the original image of wolfberry, respectively. Grayij represents the grayscale information of the pixel in row i and column j of the image.
Then the OTSU (Maximum Between-Class Variance) algorithm is used to select a specific threshold to minimize the intra-class variance of the thresholded black and white pixels. Let 0,1,2, … , − 1 represent the different gray levels in the digital image, and represents the number of pixels with the gray level. Then the threshold is set as , 0 − 1 . The maximum separable measure can be calculated by the Formulas (2)-(6), where n represents the total number of pixels in the image and satisfies ⋯ .
represents the probability that the gray level of the pixel is i and satisfies .
represents the probability that the pixel is classified into the first class and satisfies ∑ .
represents the probability that the pixel is classified into the second class, satisfies ∑ .
indicates the average gray value

Image Preprocessing
The use of a wide-angle lens makes the size of a single wolfberry too small. Therefore, the captured wolfberry images have a large amount of blank space, which will have a significant impact on the training and recognition of the model. In this paper, the image is preprocessed to remove invalid background information. First, the RGB images are converted into grayscale images by Formula (1). In the formula, R ij , G ij , and B ij are the values of the pixel points in row i and column j of the R channel, G channel, and B channel in the original image of wolfberry, respectively. Gray ij represents the grayscale information of the pixel in row i and column j of the image.
Then the OTSU (Maximum Between-Class Variance) algorithm is used to select a specific threshold to minimize the intra-class variance of the thresholded black and white pixels. Let {0, 1, 2, . . . , L − 1} represent the different gray levels in the digital image, and n i represents the number of pixels with the gray level. Then the threshold is set as k, (0 < k < L − 1). The maximum separable measure η max can be calculated by the Formulas (2)-(6), where n represents the total number of pixels in the image and satisfies n = n 0 + n 1 + · · · + n L−1 . p i represents the probability that the gray level of the pixel is i and satisfies p i = n i n . P 1 (k) represents the probability that the pixel is classified into the first class and satisfies P 1 (k) = ∑ k i=0 p i . P 2 (k) represents the probability that the pixel is classified into the second class, satisfies P 2 (k) = ∑ L−1 i=k+1 p i . m 1 (k) indicates the average gray value of pixels classified in class 1 and m 2 (k) represents the average gray value of represents the between-class variance and σ 2 G represents the global variance.
Next, according to the calculated maximum separable measure η max , set the gray value of the pixel point greater than or equal η max to 1, and set the gray value of the pixel point smaller than η max to 0. The obtained image BW is shown in Figure 3a. Then the Sobel edge detection algorithm is used to extract the contour information of the twodimensional image. Let g x denote the horizontal Sobel convolution factor and g y denote the vertical Sobel convolution factor. G x indicates the image grayscale value of horizontal edge detection, satisfies G x = g x × BW, G y indicates the image grayscale value of vertical edge detection, satisfies G y = g y × BW. According to the Formulas (7)-(9), the new image G can be obtained as shown in Figure 3b.
Foods 2023, 12, x FOR PEER REVIEW 6 of 16 During the entire image preprocessing process, the above algorithms are used to complete image cropping and background color unification. The purpose of this procedure is to reduce the interference of the background in subsequent model training. Figure  4 illustrates the preprocessed images of Chinese wolfberries from four origins. The wolfberries in Figure 4a-d are from Gansu, Inner Mongolia, Ningxia, and Qinghai, respectively. The resolution of the pixels for the four images is 886 × 947, 853 × 573, 514 × 1249, and 533 × 1024, respectively. The non-zero abscissa of the image G is denoted as x, and x i represents the ith abscissa. Then x i+1 − x i is calculated; if the result is greater than 10, it means that there is noise. Then set xi min in the image at this time. In the same way, x max , y min , y max can be obtained. Finally, arrange the above four values to obtain four endpoints. The cropped area is a rectangular area surrounded by four points. Figure 3c shows the cropped effect diagram; compared with Figure 3a, the redundant blank area is obviously deleted. After performing morphological expansion and erosion operations on Figure 3c, the white background of the wolfberry image is deducted, and the image of Chinese wolfberry with a unified background is obtained as shown in Figure 3d.
During the entire image preprocessing process, the above algorithms are used to complete image cropping and background color unification. The purpose of this procedure is to reduce the interference of the background in subsequent model training. Figure 4 illustrates the preprocessed images of Chinese wolfberries from four origins. The wolfberries in Figure 4a-d are from Gansu, Inner Mongolia, Ningxia, and Qinghai, respectively. The resolution of the pixels for the four images is 886 × 947, 853 × 573, 514 × 1249, and 533 × 1024, respectively. During the entire image preprocessing process, the above algorithms are used to complete image cropping and background color unification. The purpose of this procedure is to reduce the interference of the background in subsequent model training. Figure  4 illustrates the preprocessed images of Chinese wolfberries from four origins. The wolfberries in Figure 4a-d are from Gansu, Inner Mongolia, Ningxia, and Qinghai, respectively. The resolution of the pixels for the four images is 886 × 947, 853 × 573, 514 × 1249, and 533 × 1024, respectively.

Gabor Transform Feature Extraction
The three-dimensional matrix of the HSI image (Hij, Sij, and Iij) is obtained according to the Formulas (10)- (13). Among them, represents the hue of the pixel in row i and column j, represents the hue angle of the pixel in row i and column j, represents the saturation of the pixel in row i and column j, represents the brightness of the pixel in row i and column j.

Gabor Transform Feature Extraction
The three-dimensional matrix of the HSI image (H ij , S ij , and I ij ) is obtained according to the Formulas (10)-(13). Among them, H ij represents the hue of the pixel in row i and column j, θ ij represents the hue angle of the pixel in row i and column j, S ij represents the saturation of the pixel in row i and column j, I ij represents the brightness of the pixel in row i and column j.
Then the HSI three-channel image information of the Chinese wolfberry can be obtained, as shown in Figure 5. Figure 5a illustrates a three-channel overlay image. Figure 5b is an H channel image; Figure 5c is an S channel image; and Figure 5d is the I channel image.
Following that, the Gabor transform is performed on the three channels of the HSI image, respectively. Firstly, a two-dimensional Gabor filter function is generated according to the Formulas (14)- (16), where x and y represent the pixel coordinates, x p and y p represent the coordinate transformation variables, respectively, and λ represents the wavelength of the sinusoidal component. The wavelength controls the width of the Gabor function strips. θ controls the direction of the Gabor function, with zero degrees corresponding to the vertical position of the Gabor function. γ controls the aspect ratio, or height of the Gabor function. σ controls the bandwidth, or the overall size of the Gabor envelope. ϕ indicates the relative offset, which is the relative offset of the tuning function.
x p = x * cos θ + y * sin θ Then the HSI three-channel image information of the Chinese wolfberry can be obtained as shown in Figure 5. Figure 5a illustrates a three-channel overlay image. Figure 5b is an H channel image; Figure 5c is an S channel image; and Figure 5d is the I channel image. Following that, the Gabor transform is performed on the three channels of the HSI image, respectively. Firstly, a two-dimensional Gabor filter function is generated according to the Formulas (14)- (16), where and represent the pixel coordinates, and represent the coordinate transformation variables, respectively, and represents the wavelength of the sinusoidal component. The wavelength controls the width of the Gabor function strips. controls the direction of the Gabor function, with zero degrees corresponding to the vertical position of the Gabor function. controls the aspect ratio, or height of the Gabor function. controls the bandwidth, or the overall size of the Gabor envelope.
indicates the relative offset, which is the relative offset of the tuning function Finally, the image is convolved with the real part and the imaginary part of the Gabor filter function. The size of the convolution result is consistent with the original image matrix. The real part matrix and the imaginary part matrix are fused to obtain the filtered matrix. Then the histogram of the image for each channel is calculated. Finally, the 1 × 256dimensional features obtained from the three channels are formed to get a 1 × 768 texture feature matrix.
The textures of Chinese wolfberries from different origins are varied [4]. Gabor transformation can extract the texture features in wolfberry images [20]. Figure 6 illustrates the images after Gabor transformation in the I channel. From Figure 6, the texture of the wolfberry samples from Inner Mongolia and Qinghai is more rugged, with large ravine stripes While the wolfberry samples from Gansu and Ningxia are more delicate. Therefore, the histogram is calculated after Gabor transformation to reflect the distribution of different texture characteristics for the wolfberries from different origins. Finally, the image is convolved with the real part and the imaginary part of the Gabor filter function. The size of the convolution result is consistent with the original image matrix. The real part matrix and the imaginary part matrix are fused to obtain the filtered matrix. Then the histogram of the image for each channel is calculated. Finally, the 1 × 256-dimensional features obtained from the three channels are formed to get a 1 × 768 texture feature matrix.
The textures of Chinese wolfberries from different origins are varied [4]. Gabor transformation can extract the texture features in wolfberry images [20]. Figure 6 illustrates the images after Gabor transformation in the I channel. From Figure 6, the texture of the wolfberry samples from Inner Mongolia and Qinghai is more rugged, with large ravine stripes. While the wolfberry samples from Gansu and Ningxia are more delicate. Therefore, the histogram is calculated after Gabor transformation to reflect the distribution of different texture characteristics for the wolfberries from different origins.

Hu Invariant Moment Feature Extraction
The Hu invariant moment feature has the characteristics of translation, rotation, and scale invariance [19], which can well preserve the morphological characteristics of wolfberries. First, the RGB images are converted into grayscale images by Formula (1). Let the size of the image matrix be m × n, and then calculate the (p + q) ordinary moment and center distance of the grayscale image according to the Formulas (17)- (19). Although the central moment has translation invariance, it still does not have scale invariance, so the center distance must be normalized to obtain by the Formula (20). Finally, the calculation of the Hu invariant moment is completed by Formulas (21)-(27), thus obtaining a 7-dimensional feature : {H1, H2, H3, H4, H5, H6, H7}. Among them, ̄ and ̄ indi-

Hu Invariant Moment Feature Extraction
The Hu invariant moment feature has the characteristics of translation, rotation, and scale invariance [19], which can well preserve the morphological characteristics of wolfberries. First, the RGB images are converted into grayscale images by Formula (1). Let the size of the image matrix be m × n, and then calculate the (p + q) ordinary moment m pq and center distance µ pq of the grayscale image according to the Formulas (17)- (19). Although the central moment has translation invariance, it still does not have scale invariance, so the center distance must be normalized to obtain e pq by the Formula (20). Finally, the calculation of the Hu invariant moment is completed by Formulas (21) x p y q * Gary xy p, q = 0, 1, 2, · · · (17) x = m 10 m 00 , y = m 01 m 00 (18) According to the above formulas, the Hu invariant moments of the wolfberry images are calculated. Figure 7 shows the Hu invariant moments of the wolfberry images from the four origins. It can be seen that there are differences in the Hu invariant moment characteristics of the four origins. According to the above formulas, the Hu invariant moments of the wolfberry images are calculated. Figure 7 shows the Hu invariant moments of the wolfberry images from the four origins. It can be seen that there are differences in the Hu invariant moment characteristics of the four origins.

Model Training
In this paper, the 1 × 7-dimensional Hu invariant moment feature is fused with the 1*768-dimensional feature after Gabor transformation to form a 1 × 775-dimensional feature matrix = _ , _ … _ . Since a total of 360 sample features were obtained

Model Training
In this paper, the 1 × 7-dimensional Hu invariant moment feature is fused with the 1 × 768-dimensional feature after Gabor transformation to form a 1 × 775-dimensional feature matrix F tr = f tr_1 , f tr_2 . . . f tr_775 . Since a total of 360 sample features were obtained from the four origins, a 360 × 775-dimensional data set was formed after all the features were fused. Let be S the sample space; let trainX be the features of the training set; let trainY be the label of the training set; let testX be the feature of the test set; and let testY be the label of the test set. The relationship between trainX and testX is shown in the Formulas (28)-(30).
trainX ∩ testX = ∅ (28) trainX : testX = 8 : 2 The random forest algorithm is an integrated learning algorithm based on decision trees that performs classification and regression by constructing multiple decision trees. Each decision tree in a random forest is made by randomly selecting samples and features, so it has good generalization ability. According to the Formula (31), the RF algorithm was used to train the sample characteristics of the four origins of the Chinese wolfberry, and many decision trees were constructed to form the identification model.
Then input testX into the RF random forest model according to the Formula (32) and get a series of predicted values denoted as PL.
Finally, according to the Formulas (33)-(35), the accuracy rate ACC of the model is obtained by comparing with testY and PL, where m is the total number of samples in the test set and n is the number of correct recognitions.

Experimental Results
In this experiment, MATLAB 2022a is used to construct the algorithm model. The CPU is AMD Ryzen 54,600 U, the memory is DDR4 3200 MHz 16 GB memory, and the operating system is Windows 10. The implementation of the RF algorithm uses the Classification Random Forest package [23]. The Gabor transformation function parameter settings are shown in Table 2.  In this paper, a total of 360 images of Chinese wolfberry samples are used for experiments. There are four origins, namely Gansu, Inner Mongolia, Ningxia, and Qinghai, and each origin has 90 sample images.

Determination of the Hyperparameters of RF
In order to determine the optimized hyperparameters of RF, five groups of comparative experiments were executed. The ratio of the training set to the testing set in each experiment is 8:2. The hyperparameter settings of the RF model are shown in Table 3. It can be concluded that when nTree is set to 2000 and mtry is set to 50, the model works best. Thus, in the prediction model, nTree and mtry are set to 2000 and 50, respectively.

Sensitivity to Training Set Size
In the proposed method, the training set samples are trained to obtain the prediction model. Thus, the training set size can affect prediction accuracy. In order to analyze the stability of the proposed method, nine groups of comparative experiments were set up. Each experiment uses the following proportions of the training set: 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, and 90%. The train-test procedure is repeated 200 times. Figure 8 shows the average accuracy of the model for different training set ratios. It can be seen from Figure 8 that the model still has a high accuracy rate when the proportion of the training set is very low. This shows that the model proposed in this paper has high robustness.

Misjudgment Analysis
In this paper, the proportion of the training set is set to 80%, and the experiment is repeated 200 times. Then the number of errors, the number of selected samples, and the recognition error rate of each sample can be obtained. The error rates of samples from different origins are shown in Figure 9. The average error rate of samples from Inner Mongolia was the minimum of 5.51%, and the average error rate of samples from Qinghai was the maximum of 21.34%.

Misjudgment Analysis
In this paper, the proportion of the training set is set to 80%, and the experiment is repeated 200 times. Then the number of errors, the number of selected samples, and the recognition error rate of each sample can be obtained. The error rates of samples from different origins are shown in Figure 9. The average error rate of samples from Inner Mongolia was the minimum of 5.51%, and the average error rate of samples from Qinghai was the maximum of 21.34%.
In this paper, the proportion of the training set is set to 80%, and the experiment is repeated 200 times. Then the number of errors, the number of selected samples, and the recognition error rate of each sample can be obtained. The error rates of samples from different origins are shown in Figure 9. The average error rate of samples from Inner Mongolia was the minimum of 5.51%, and the average error rate of samples from Qinghai was the maximum of 21.34%.  According to the analysis of the experimental data, 86% of the samples have a correct recognition rate of 100%, but 5% of the samples have a correct recognition rate of 0%. There are a total of 27 samples with an error rate higher than 90%, and the distribution of each origin is shown in Figure 10. According to the analysis of the experimental data, 86% of the samples have a correct recognition rate of 100%, but 5% of the samples have a correct recognition rate of 0%. There are a total of 27 samples with an error rate higher than 90%, and the distribution of each origin is shown in Figure 10. Among the samples with an error rate higher than 90%, there are 13 samples belonging to Qinghai, 5 samples belonging to Gansu, 4 samples belonging to Inner Mongolia, and 5 samples belonging to Ningxia. The specific error conditions can be seen in Table 4 below. As can be seen from Table 4, Qinghai wolfberry is more easily confused with Ningxia wolfberry. Inner Mongolian wolfberry and Ningxia wolfberry are easily predicted as Qinghai wolfberry, and Gansu wolfberry is easily predicted as Ningxia wolfberry.  Among the samples with an error rate higher than 90%, there are 13 samples belonging to Qinghai, 5 samples belonging to Gansu, 4 samples belonging to Inner Mongolia, and 5 samples belonging to Ningxia. The specific error conditions can be seen in Table 4 below. As can be seen from Table 4, Qinghai wolfberry is more easily confused with Ningxia wolfberry. Inner Mongolian wolfberry and Ningxia wolfberry are easily predicted as Qinghai wolfberry, and Gansu wolfberry is easily predicted as Ningxia wolfberry.  Figure 11 illustrates the heat map of the confusion matrix obtained from one of the training results. It can be seen that while the model proposed in this paper has a high accuracy rate, Ningxia wolfberry is easily misjudged as Qinghai wolfberry.  Figure 11 illustrates the heat map of the confusion matrix obtained from one of the training results. It can be seen that while the model proposed in this paper has a high accuracy rate, Ningxia wolfberry is easily misjudged as Qinghai wolfberry.

Performance Comparison
The proposed method is compared with six other algorithms: CNN [4], KFDA [24], SVM [25], BPNN [26], Gabor_SVM, and Gabor_BPNN. CNN (Convolutional Neural Network) is a representative image recognition algorithm that has been widely used in the field of image recognition in recent years. In the CNN [4] algorithm, the epoch is set to 2000, the size of the batch is set to 64, and the learning rate is set to 0.001. In KFDA [24], the Gaussian kernel function is adopted, and the kernel function uses a matrix similarity measurement method based on Euclidean distance to determine the optimal kernel function. In Gabor_SVM and Gabor_BPNN, the Gabor transformation is used to take its mean, contrast, and entropy as the second feature to train the model. The loss function of SVM is set to 100, and the gamma function in the kernel function is set to 0.1. In BPNN (Back Propagation Neural Network), the maximum number of iterations is set to 1000, the target

Performance Comparison
The proposed method is compared with six other algorithms: CNN [4], KFDA [24], SVM [25], BPNN [26], Gabor_SVM, and Gabor_BPNN. CNN (Convolutional Neural Network) is a representative image recognition algorithm that has been widely used in the field of image recognition in recent years. In the CNN [4] algorithm, the epoch is set to 2000, the size of the batch is set to 64, and the learning rate is set to 0.001. In KFDA [24], the Gaussian kernel function is adopted, and the kernel function uses a matrix similarity measurement method based on Euclidean distance to determine the optimal kernel function. In Gabor_SVM and Gabor_BPNN, the Gabor transformation is used to take its mean, contrast, and entropy as the second feature to train the model. The loss function of SVM is set to 100, and the gamma function in the kernel function is set to 0.1. In BPNN (Back Propagation Neural Network), the maximum number of iterations is set to 1000, the target error of neural network training is 0.00001, and the learning rate is 0.01.
The images are divided into a training set and a test set with a ratio of 8:2. The number of images in the training set is 288, and the number of images in the test set is 72. Each algorithm is repeated 200 times, and the average accuracy is shown in Figure 12. It can be seen from the figure that the proposed methods, Gabor_SVM and Gabor_BPNN, have higher average accuracy than others. It indicates that the Gabor feature can effectively represent Chinese wolfberries of different origins. At the same time, the average accuracy of the proposed method is greater than the other six methods. Thus, the method proposed in this paper has unique advantages in the identification of the origin of Chinese wolfberry. Among the proposed methods, Gabor_SVM and Gabor_BPNN, the error rate for the Qinghai region is always the highest. Figure 13 illustrates the proportion of Qinghai samples among the images with an error rate of more than 70%. It can be seen that the method proposed in this paper effectively reduces the recognition error rate of Qinghai samples.

Conclusions
Chinese wolfberry has a large planting area in China. It is very popular and has high nutritional value. The origin of wolfberry has a major impact on its medicinal value. This paper proposes an intelligent identification method for the geographic origin of the Chinese wolfberry based on color space transformation and texture morphological features Among the proposed methods, Gabor_SVM and Gabor_BPNN, the error rate for the Qinghai region is always the highest. Figure 13 illustrates the proportion of Qinghai samples among the images with an error rate of more than 70%. It can be seen that the method proposed in this paper effectively reduces the recognition error rate of Qinghai samples. Among the proposed methods, Gabor_SVM and Gabor_BPNN, the error rate for the Qinghai region is always the highest. Figure 13 illustrates the proportion of Qinghai samples among the images with an error rate of more than 70%. It can be seen that the method proposed in this paper effectively reduces the recognition error rate of Qinghai samples.

Conclusions
Chinese wolfberry has a large planting area in China. It is very popular and has high nutritional value. The origin of wolfberry has a major impact on its medicinal value. This paper proposes an intelligent identification method for the geographic origin of the Chinese wolfberry based on color space transformation and texture morphological features

Conclusions
Chinese wolfberry has a large planting area in China. It is very popular and has high nutritional value. The origin of wolfberry has a major impact on its medicinal value. This paper proposes an intelligent identification method for the geographic origin of the Chinese wolfberry based on color space transformation and texture morphological features that can quickly and efficiently identify the origin category of a single wolfberry image. The Chinese wolfberry samples are first prepared, and their image data are collected. The images are then preprocessed, and the Hu invariant moment and Gabor transformation features of the single wolfberry image are extracted. The RF algorithm is finally used to establish the identification model of the Chinese wolfberry's origin. The experimental results show that, compared with other recognition algorithms, the proposed model has higher recognition accuracy and a better recognition effect.