Geographical Origin Discrimination of White Rice Based on Image Pixel Size Using Hyperspectral Fluorescence Imaging Analysis

: Geographical origin discrimination of white rice is an important endeavor in preventing illegal distribution of white rice and regulating and standardizing food safety and quality assurance. The aim of this study was to develop a method for geographical origin discrimination between South Korean and Chinese rice using a hyperspectral ﬂuorescence imaging technique and multivariate analysis. Hyperspectral ﬂuorescence images of South Korean and Chinese rice samples were obtained in the wavelength range of 420 nm to 780 nm with intervals of 4.8 nm using 365 nm wavelength ultraviolet-A excitation light. Partial least squares discriminant analysis models were developed and applied to the acquired image to determine the geographical origins of the rice samples. In addition, various pre-processing techniques were applied to improve the discrimination accuracy. Accordingly, the pixel size of the hyperspectral image was determined. The results revealed that the optimum pixel size of the hyperspectral image that was above 7 mm × 7 mm showed a high discrimination accuracy. Moreover, the geographical origin discrimination model that applied the ﬁrst-order derivative achieved a high discrimination accuracy of 98.89%. The results of this study showed that hyperspectral ﬂuorescence imaging technology can be used to quickly and accurately discriminate the geographical origins of white rice.


Introduction
In South Korea, the geographic origin of food has been used for marketing to establish a fair distribution order and trust between producers and consumers in accordance with agricultural trade liberalization. However, a significant price difference exists between imported agricultural  All rice samples were stored in a −80 • C ultra-low temperature freezer until completely used during the experiment to prevent quality deterioration caused by temperature and humidity changes. The experiment was carried out in 2017. Two days before the experiment, the samples were tempered by maintaining a −20 • C storage temperature for 24 h, and then stored at 5 • C for an additional 24 h. The temperature was stabilized for 2 h at room temperature (20 • C) before acquisition of the hyperspectral fluorescence images. Approximately 50 g of rice was used as the samples in the experiment. Based on the results of previous experiments [23], a black acrylic rectangular sample cell with dimensions of 50 mm × 50 mm × 5 mm, which did not express fluorescence, was used to obtain the hyperspectral fluorescence images (Figure 1). by maintaining a −20 °C storage temperature for 24 h, and then stored at 5 °C for an additional 24 h. The temperature was stabilized for 2 h at room temperature (20 °C) before acquisition of the hyperspectral fluorescence images. Approximately 50 g of rice was used as the samples in the experiment. Based on the results of previous experiments [23], a black acrylic rectangular sample cell with dimensions of 50 mm × 50 mm × 5 mm, which did not express fluorescence, was used to obtain the hyperspectral fluorescence images (Figure 1).

Hyperspectral Imaging System
In this study, a hyperspectral fluorescence imaging system was used to acquire the HSIs of the rice samples. It consisted of two UV-A excitation lights, a sample translation stage, and the acquisition part of the hyperspectral fluorescence image ( Figure 2). The latter was equipped with a slit for the line scan, a high-sensitivity electron multiplying charge coupled device (EMCCD; MegaLuca R, ANDOR Technology, South Windsor, CT, USA) with 1004 × 1002 pixels, an image spectrograph, and a C-mount lens (F1.9 35 mm compact lens, Schneider Optics, Hauppauge, NY, USA). EMCCD has a pixel size of 8 × 8 µm, is cooled down to −20 • C via thermoelectric cooling and can capture 14-bit images at a frequency of 12.5 MHz. The image spectrograph and C-mount lens were fixed in front of the EMCCD. The UV-A excitation lights were vertically tilted at 15 • to excite the samples. The HSI was captured in the spectral range of 420-780 nm, with intervals of 4.8 nm. The image field-of-view was determined using a 25 µm × 18 mm (width × length) slit size. The line scan image, acquired through the slit, was spectrally irradiated on the EMCCD surface through a diffraction grating to generate a spectral image for each wavelength band. The horizontal and vertical axes of each line scan image included the respective spatial and spectral information.
In this experiment, an image was obtained three times for each sample using an exposure time of 0.02 s, an amplification value of ten, and a step of 0.5 mm line scan images. Since HSIs are affected by sample conditions such as degree of milling, experiments were performed under the same conditions in this study to minimize these effects. The HSIs of the rice samples were adjusted by using dark reference images to correct for the device noise as well as fluorescent reference images. The fluorescence reference plate was white inkjet paper that exhibited uniform blue fluorescence [13]. The dark reference images were obtained without light source exposure. The HSIs of the rice samples were corrected according to Equation (1).
where I is the corrected relative hyperspectral fluorescence image, I s denotes the sample hyperspectral fluorescence image, I r represents the hyperspectral fluorescence image of the fluorescent reference plate, and D is the hyperspectral image of the dark reference plate, all at the ith wavelength.
a C-mount lens (F1.9 35 mm compact lens, Schneider Optics, Hauppauge, NY, USA). EMCCD has a pixel size of 8 × 8 μm, is cooled down to −20 °C via thermoelectric cooling and can capture 14-bit images at a frequency of 12.5 MHz. The image spectrograph and C-mount lens were fixed in front of the EMCCD. The UV-A excitation lights were vertically tilted at 15° to excite the samples. The HSI was captured in the spectral range of 420-780 nm, with intervals of 4.8 nm. The image field-of-view was determined using a 25 μm × 18 mm (width × length) slit size. The line scan image, acquired through the slit, was spectrally irradiated on the EMCCD surface through a diffraction grating to generate a spectral image for each wavelength band. The horizontal and vertical axes of each line scan image included the respective spatial and spectral information. In this experiment, an image was obtained three times for each sample using an exposure time of 0.02 s, an amplification value of ten, and a step of 0.5 mm line scan images. Since HSIs are affected by sample conditions such as degree of milling, experiments were performed under the same conditions in this study to minimize these effects. The HSIs of the rice samples were adjusted by using dark reference images to correct for the device noise as well as fluorescent reference images. The fluorescence reference plate was white inkjet paper that exhibited uniform blue fluorescence [13]. The dark reference images were obtained without light source exposure. The HSIs of the rice samples were corrected according to Equation (1).
where I is the corrected relative hyperspectral fluorescence image, Is denotes the sample hyperspectral fluorescence image, Ir represents the hyperspectral fluorescence image of the The average spectrum and pixel spectrum of the rice samples were extracted from the calibrated HSIs of the rice samples, excluding the container. Then, multivariate analysis was performed using the extracted HSI data [13].

Multivariate Data Analysis
A flowchart of the experiment is shown in Figure 3. To discriminate between the South Korean and Chinese rice, the HSI was analyzed by PLS-DA and PLSR. These are statistical mathematical operations that are used to develop a regression model between the measured spectrum and the dependent variable. In this study, PLS-DA and PLSR were developed using the mean value spectrum and the pixel spectrum of the rice samples, respectively. The values of the South Korean and Chinese rice groups were respectively set to '0' and '1'. The basic model of multivariate PLS-DA and PLSR is shown in Equations (2) and (3) [11].
where X is an n × m predictor matrix, Y is an n × p response matrix, T is an n × 1 score matrix for n observations, p is an m × 1 spectrum-data loading matrix, Q is a p × 1 spectrum-data loading matrix, and E and F are matrices of random errors.
where X is an n × m predictor matrix, Y is an n × p response matrix, T is an n × 1 score matrix for n observations, p is an m × 1 spectrum-data loading matrix, Q is a p × 1 spectrum-data loading matrix, and E and F are matrices of random errors.

Image Post-Processing
Several previous studies reported that image noise caused by overlapping of samples affects the regression model development. A method for quantifying the accuracy of each pixel was recommended to improve the detection accuracy of the samples [24][25][26]. Therefore, the reflection spectrum of each rice sample was measured. Moreover, the effect of discriminating the geographic origin of the rice according to the HSI pixel size was investigated based on the average value of the raw data. The initial raw data had a pixel size of 30 × 30 mm. The size was downgraded to the level of six pixels (2 × 2 mm, 3 × 3 mm, 4 × 4 mm, 5 × 5 mm, 6 × 6 mm, and 7 × 7 mm) to obtain a spectrum for each pixel size. Using the acquired spectrum, a PLSR model for determining the geographic origin of the rice samples according to pixel size was developed.

Spectrum Pre-Treatment
Various image spectrum pre-treatments were evaluated to improve the discrimination model performance. These pre-treatments were used to correct the distortion due to the spectrum, light scattering, and the noise components that may occur from the external environment. These pre-treatments were applied using the average spectrum of the sample to determine the optimal conditions for discriminating the geographic origin. The spectrum pre-treatments that were applied included smoothing, first-order and second-order derivatives, maximum normalization, mean normalization, range normalization, baseline, multiplicative scatter correction (MSC), and the standard normal variate (SNV). The performances of the PLS-DA models for each pre-treatment were compared and evaluated. Table 2 shows the number of data sets used for the model development and verification. The accuracy of each developed PLS model was evaluated by means of the coefficient of determination for calibration (R c 2 ), root mean square error of calibration (RMSEC), coefficient of determination for validation (R v 2 ), root mean square error of validation (RMSEV), and optimal factor (F). RMSE was calculated according to Equations (4)-(6) [27]. The overall predictive accuracy was determined based on the total accuracy of the respective South Korean and Chinese rice predictions. The sensitivity and specificity were calculated using the predictive accuracy results for each rice sample.

Model Validation
whereŷ i and y i denote the predicted and measured value of the Ith sample, I c is the number of samples in the calibration set, and I v represents the number of samples in the prediction set. Unscrambler X (v.10.4, CAMO Software, Oslo, Norway) was used for the PLS model development, validation, and spectrum pre-treatment. MATLAB (Ver. 2020, MathWorks, Natick, MA, USA) was used to obtain the PLS image and extract the spectrum data from the acquired HSIs.

PLS Image Analysis
The algorithm of the PLS image obtained by using the HSI was developed by applying the regression coefficients of the developed PLS-DA model for discriminating the geographic origin of rice. Figure 4 shows the fluorescence spectra of the South Korean and Chinese rice samples by year. The spectra of the rice samples from 2014 to 2016 show a similar trend in the entire wavelength range of 420 to 780 nm. The Chinese rice samples exhibit a higher reflectance than the South Korean rice samples at the overall spectral wavelength. Previous studies reported that the geographical origin can be discriminated by using the difference in the ratio of amylopectin to amylose in rice [23]. Generally, it has been reported that branched-chain amylopectin is identified by the crystallinity of the samples and is attenuated by the amount of amylose present [28]. In addition, a spectral difference appears owing to the apparent transparency and internal heterogeneity of the sample [29]. Figure 4 shows the fluorescence spectra of the South Korean and Chinese rice samples by year. The spectra of the rice samples from 2014 to 2016 show a similar trend in the entire wavelength range of 420 to 780 nm. The Chinese rice samples exhibit a higher reflectance than the South Korean rice samples at the overall spectral wavelength. Previous studies reported that the geographical origin can be discriminated by using the difference in the ratio of amylopectin to amylose in rice [23]. Generally, it has been reported that branched-chain amylopectin is identified by the crystallinity of the samples and is attenuated by the amount of amylose present [28]. In addition, a spectral difference appears owing to the apparent transparency and internal heterogeneity of the sample [29]. For the above reason, the South Korean rice samples had a higher amount of amylose compared to the Chinese rice samples, which had relatively low crystallinity; thus, the fluorescent intensity of the latter was low. The raw spectra revealed fluorescence peaks at 465 nm and 489 nm, along with the relatively higher reflectance of the Chinese rice samples compared to the South Korean rice samples. The main component that is used for discriminating Chinese and South Korean rice is a saturated fatty acid that shows a fluorescence peak from 450 nm to 500 nm [30]. This main component is influenced by the lipid constituent and acetic acid (CH3COOH) containing hydroxyl (-OH), stearic For the above reason, the South Korean rice samples had a higher amount of amylose compared to the Chinese rice samples, which had relatively low crystallinity; thus, the fluorescent intensity of the latter was low. The raw spectra revealed fluorescence peaks at 465 nm and 489 nm, along with the relatively higher reflectance of the Chinese rice samples compared to the South Korean rice samples. The main component that is used for discriminating Chinese and South Korean rice is a saturated fatty acid that shows a fluorescence peak from 450 nm to 500 nm [30]. This main component is influenced by the lipid constituent and acetic acid (CH3COOH) containing hydroxyl (-OH), stearic acid reacting with -OH, glycine, l-aspartic acid, and l-glutamic acid monosodium salt monohydrate. The latter constituent is a type of amino acid.

Spectral Characteristics of South Korean and Chinese Rice
These components are factors that affect the rice quality and are influenced by nitrogen fertilization, the production environment, and the cultivation environment (temperature) [31]. Chinese rice is cultivated at a higher latitude (33~47 • N) than South Korean rice, which is grown in colder regions. Therefore, it is believed that this study identified the spectral differences between the South Korean and Chinese rice samples. Chlorophyll a appeared mainly at 681 nm, and the higher reflectance of the South Korean rice samples was associated with the 681 nm peak [32,33].

Performances of the Geographical Origin Discrimination Models for South Korean and Chinese Rice by Pixel Dimensions
Various pixel dimensions were applied in the geographical origin discrimination models to determine the hyperspectral image pixel dimensions that could most effectively discriminate the geographical origin of the rice samples.  The predictive accuracies according to the various pixel dimensions are presented in Table 4. The discrimination accuracies increase proportionally when the pixel dimensions increase. The discrimination accuracies are above 90% when the pixel dimensions are over 3 mm × 3 mm. The discrimination accuracies of the South Korean and Chinese rice samples at pixel dimensions of 30 mm × 30 mm with a classification value of 0.60 are above 99.9 and 97.78%, respectively. The results show that the hyperspectral image pixel dimensions of 30 mm × 30 mm obtain the highest discrimination accuracy. In general, the length and width of rice (Oryza sativa L. subsp. Japonica) are 5.0-6.4 mm and 2.5-3.0 mm, respectively. Thus, development of a discrimination model with pixel dimensions over 7 mm × 7 mm, which can measure the overall rice sample spectrum with consideration of all the external characteristics of a rice grain, can improve the discrimination performance [34,35].

Discrimination Models for South Korean and Chinese Rice Using Pre-Treatments
Various pre-treatment methods were applied to develop the geographical origin discrimination models using the hyperspectral image pixel dimensions of 30 mm × 30 mm. The performances of the developed PLS-DA model are shown in Table 5. R V 2 and RMSEV of the discrimination model with no applied pre-treatment are 0.7601 and 0.2456, respectively. When the Savizky-Golay first-order derivative with the gap of 43.2 nm is applied, R V 2 and RMSEV of the discrimination model are 0.7684 and 0.2416, respectively. When the Savizky-Golay second-order derivative with the gap of 52.8 is applied, R V 2 , RMSEV, and optimal factor F are 0.7621, 0.2435, and 8, respectively. Optimal factor F is lower than that of the discrimination model with no applied pre-treatment. These results show that the discrimination model performance using the pre-treatment is improved compared to the models using no pre-treatment. The average spectrum of the South Korean and Chinese rice samples pre-treated by the Savizky-Golay first-order and second-order derivatives are shown in Figure 5. The first-order derivative spectrum reveals peaks at 660 nm and 695 nm, which is related to the amount of nitrogen fertilizer. This is associated with the relatively higher reflectance of the South Korean rice samples [36]. In the first-order derivative spectrum, the peaks at 660 nm to 695 nm can be clearly distinguished in the South Korean and Chinese rice samples compared to the non-treatment spectrum (Figures 4 and 5a). As shown in Figure 5b, the spectrum after application of the Savizky-Golay second-order derivative reveals lower reflectance peaks at 456, 533, 566, 605, and 624 nm for the South Korean rice samples, and higher reflectance peaks at 653, 681, and 706 nm compared to the peaks of the Chinese rice samples. The spectrum that was pretreated by a Savizky-Golay second-order derivative occurred in various wavelength bands compared to the non-treatment spectrum.  The threshold values and discrimination accuracies of the South Korean and Chinese rice samples for the PLS-DA model using various pre-treatment methods are shown in Table 6. The discrimination accuracies with application of SNV, MSC, mean normalization, and range normalization are relatively lower than the discrimination model with no applied pre-treatment. However, the discrimination accuracies of these pre-treatment methods are very high (over 95%).
The geographical discrimination models that included baseline pre-treatment, maximum normalization, smoothing pre-treatment, and Savizky-Golay first-order and second-order derivatives yielded a total accuracy of 98.89% ( Figure 6). Among these pre-treatment methods, when the Savizky-Golay first-order and second-order derivative were applied together, the discrimination model yielded a low optimal factor and high Rv 2 . Therefore, it is apparently the most efficient pretreatment method in this model (Table 5). The threshold values and discrimination accuracies of the South Korean and Chinese rice samples for the PLS-DA model using various pre-treatment methods are shown in Table 6. The discrimination accuracies with application of SNV, MSC, mean normalization, and range normalization are relatively lower than the discrimination model with no applied pre-treatment. However, the discrimination accuracies of these pre-treatment methods are very high (over 95%). The geographical discrimination models that included baseline pre-treatment, maximum normalization, smoothing pre-treatment, and Savizky-Golay first-order and second-order derivatives yielded a total accuracy of 98.89% ( Figure 6). Among these pre-treatment methods, when the Savizky-Golay first-order and second-order derivative were applied together, the discrimination model yielded a low optimal factor and high R v 2 . Therefore, it is apparently the most efficient pre-treatment method in this model (Table 5).
Chinese rice samples: (a) first-order derivative and (b) second-order derivative.
The threshold values and discrimination accuracies of the South Korean and Chinese rice samples for the PLS-DA model using various pre-treatment methods are shown in Table 6. The discrimination accuracies with application of SNV, MSC, mean normalization, and range normalization are relatively lower than the discrimination model with no applied pre-treatment. However, the discrimination accuracies of these pre-treatment methods are very high (over 95%).
The geographical discrimination models that included baseline pre-treatment, maximum normalization, smoothing pre-treatment, and Savizky-Golay first-order and second-order derivatives yielded a total accuracy of 98.89% ( Figure 6). Among these pre-treatment methods, when the Savizky-Golay first-order and second-order derivative were applied together, the discrimination model yielded a low optimal factor and high Rv 2 . Therefore, it is apparently the most efficient pretreatment method in this model (Table 5).  According to a previous study [1], the discrimination accuracies of a rice geographical origin model developed using one year of data (2016) were all above 99.99% for the South Korean rice samples. The discrimination accuracies of the PLS models developed in this study, which applied three years of data, were relatively low compared to those previous results [1]. This may be because rice samples harvested for three years included more cultivation environment differences, such as cultivation temperature, climate, and soil conditions, than the samples harvested for one year.
Considering these differences, the results show that the proposed models with pre-treatment provide excellent performance for geographical origin discrimination. Furthermore, the discrimination accuracies of these models are outstanding compared to those of another previous study [37], which showed 93.5% rice geographical origin discrimination accuracy by using an electronic nose combined with ICP-MS by PCA and linear discriminant analysis (LDA).

Results of PLS Image Discrimination of South Korean and Chinese Rice
A PLS image algorithm for geographical origin discrimination was developed. The development process of the image processing algorithm is shown in Figure 7. Binarized images were obtained to remove the background of the HSI at a 489.4 nm wavelength. From that point, the images of all wavelength bands were masked for feature extraction of the rice samples. Regression coefficients of the PLS model were applied to all wavelength band images, followed by acquisition of the PLS images. The South Korean and Chinese rice samples were discriminated from the obtained binary image by applying a threshold value to the PLS image.
process of the image processing algorithm is shown in Figure 7. Binarized images were obtained to remove the background of the HSI at a 489.4 nm wavelength. From that point, the images of all wavelength bands were masked for feature extraction of the rice samples. Regression coefficients of the PLS model were applied to all wavelength band images, followed by acquisition of the PLS images. The South Korean and Chinese rice samples were discriminated from the obtained binary image by applying a threshold value to the PLS image.  Figure 8 illustrates the PLS image of geographical origin discrimination between the South Korean and Chinese rice samples using the Savitzky-Golay first-order derivative at a pixel dimension of 30 mm × 30 mm. As shown in Figure 8, the PLS image is a potential technology for discriminating the South Korean and Chinese rice origins. The edge of the rice sample-the boundary between the sample cell and the rice sample-shows a higher pixel value. This may be because the edge portion of the rice in the sample cell has less overlap than the inside portion. Moreover, the fluorescence  Figure 8 illustrates the PLS image of geographical origin discrimination between the South Korean and Chinese rice samples using the Savitzky-Golay first-order derivative at a pixel dimension of 30 mm × 30 mm. As shown in Figure 8, the PLS image is a potential technology for discriminating the South Korean and Chinese rice origins. The edge of the rice sample-the boundary between the sample cell and the rice sample-shows a higher pixel value. This may be because the edge portion of the rice in the sample cell has less overlap than the inside portion. Moreover, the fluorescence signal of the sample cell and the fluorescence signal of the rice are simultaneously affected. As shown in Figure 8b, the white value of the South Korean sample shows the effect of the sample cell; thus, it may be appropriate to exclude the edge value to avoid the effect of the cell on the geographical origin discrimination.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 13 of 16 signal of the sample cell and the fluorescence signal of the rice are simultaneously affected. As shown in Figure 8b, the white value of the South Korean sample shows the effect of the sample cell; thus, it may be appropriate to exclude the edge value to avoid the effect of the cell on the geographical origin discrimination.

Conclusions
In this study, PLS models of geographical origin discrimination using hyperspectral fluorescence imaging were developed. A pre-treatment was applied to improve the model accuracy and minimize discrimination errors due to spectral overlap. In addition, the effect of the HSI pixel dimensions on the models was evaluated to determine the geographical origins of the South Korean and Chinese rice samples. Various HSI pixel dimensions were implemented in the PLS models, and the optimal pixel dimensions (over 7 mm × 7 mm) were determined. Moreover, the PLS models that applied a pre-treatment, such as a baseline pre-treatment, maximum normalization, smoothing, and Savizky-Golay first-order and second-order derivatives revealed a discrimination accuracy of 98.89%, independently.
These results are considered noteworthy because the hyperspectral fluorescence image spectrum can be applied to effectively and accurately discriminate the geographical origin of rice. Nevertheless, comprehensive studies should be developed to obtain more accurate and extensive methods, because only South Korean and Chinese rice can be discriminated in the results of this study. In future studies, a rice geographical origin discrimination model will be developed for various other geographical origins, such as Thailand, Vietnam, and the United States. In addition, in this study, quality factors such as moisture and protein content were not used as major factors in discriminate the geographical origin of rice because generally the contents of the quality factors were

Conclusions
In this study, PLS models of geographical origin discrimination using hyperspectral fluorescence imaging were developed. A pre-treatment was applied to improve the model accuracy and minimize discrimination errors due to spectral overlap. In addition, the effect of the HSI pixel dimensions on the models was evaluated to determine the geographical origins of the South Korean and Chinese rice samples. Various HSI pixel dimensions were implemented in the PLS models, and the optimal pixel dimensions (over 7 mm × 7 mm) were determined. Moreover, the PLS models that applied a pre-treatment, such as a baseline pre-treatment, maximum normalization, smoothing, and Savizky-Golay first-order and second-order derivatives revealed a discrimination accuracy of 98.89%, independently.
These results are considered noteworthy because the hyperspectral fluorescence image spectrum can be applied to effectively and accurately discriminate the geographical origin of rice. Nevertheless, comprehensive studies should be developed to obtain more accurate and extensive methods, because only South Korean and Chinese rice can be discriminated in the results of this study.
In future studies, a rice geographical origin discrimination model will be developed for various other geographical origins, such as Thailand, Vietnam, and the United States. In addition, in this study, quality factors such as moisture and protein content were not used as major factors in discriminate the geographical origin of rice because generally the contents of the quality factors were different for each variety of rice. In a future study, we plan to conduct a study to identify by integrating the part, which is the component of rice and the discriminant the geographical origin of rice.