3.1. Spectral Characteristics of South Korean and Chinese Rice
Figure 4 shows the fluorescence spectra of the South Korean and Chinese rice samples by year. The spectra of the rice samples from 2014 to 2016 show a similar trend in the entire wavelength range of 420 to 780 nm. The Chinese rice samples exhibit a higher reflectance than the South Korean rice samples at the overall spectral wavelength. Previous studies reported that the geographical origin can be discriminated by using the difference in the ratio of amylopectin to amylose in rice [
23]. Generally, it has been reported that branched-chain amylopectin is identified by the crystallinity of the samples and is attenuated by the amount of amylose present [
28]. In addition, a spectral difference appears owing to the apparent transparency and internal heterogeneity of the sample [
29].
For the above reason, the South Korean rice samples had a higher amount of amylose compared to the Chinese rice samples, which had relatively low crystallinity; thus, the fluorescent intensity of the latter was low. The raw spectra revealed fluorescence peaks at 465 nm and 489 nm, along with the relatively higher reflectance of the Chinese rice samples compared to the South Korean rice samples. The main component that is used for discriminating Chinese and South Korean rice is a saturated fatty acid that shows a fluorescence peak from 450 nm to 500 nm [
30]. This main component is influenced by the lipid constituent and acetic acid (CH3COOH) containing hydroxyl (-OH), stearic acid reacting with -OH, glycine,
l-aspartic acid, and
l-glutamic acid monosodium salt monohydrate. The latter constituent is a type of amino acid.
These components are factors that affect the rice quality and are influenced by nitrogen fertilization, the production environment, and the cultivation environment (temperature) [
31]. Chinese rice is cultivated at a higher latitude (33~47° N) than South Korean rice, which is grown in colder regions. Therefore, it is believed that this study identified the spectral differences between the South Korean and Chinese rice samples. Chlorophyll a appeared mainly at 681 nm, and the higher reflectance of the South Korean rice samples was associated with the 681 nm peak [
32,
33].
3.2. Performances of the Geographical Origin Discrimination Models for South Korean and Chinese Rice by Pixel Dimensions
Various pixel dimensions were applied in the geographical origin discrimination models to determine the hyperspectral image pixel dimensions that could most effectively discriminate the geographical origin of the rice samples.
Table 3 shows the performance of the developed PLSR model with pixel dimensions up to 30 mm × 30 mm. At 30 mm × 30 mm, R
V2, RMSEV, and F of the discrimination model are 0.7601, 0.2456, and 9, respectively. The HSI pixel dimensions of 30 mm × 30 mm produced the highest discrimination accuracy. For the pixel dimensions of 2 mm × 2 mm, R
V2, RMSEV, and F are 0.5752, 0.3247, and 10, respectively. The discriminative performance is not excellent when the pixel dimensions are low. Moreover, as the pixel dimensions increase, RMSE and R
2 tend to increase, and optimal factor F decreases, which confirms that the model performance improved.
The predictive accuracies according to the various pixel dimensions are presented in
Table 4. The discrimination accuracies increase proportionally when the pixel dimensions increase. The discrimination accuracies are above 90% when the pixel dimensions are over 3 mm × 3 mm. The discrimination accuracies of the South Korean and Chinese rice samples at pixel dimensions of 30 mm × 30 mm with a classification value of 0.60 are above 99.9 and 97.78%, respectively.
The results show that the hyperspectral image pixel dimensions of 30 mm × 30 mm obtain the highest discrimination accuracy. In general, the length and width of rice (
Oryza sativa L. subsp.
Japonica) are 5.0–6.4 mm and 2.5–3.0 mm, respectively. Thus, development of a discrimination model with pixel dimensions over 7 mm × 7 mm, which can measure the overall rice sample spectrum with consideration of all the external characteristics of a rice grain, can improve the discrimination performance [
34,
35].
3.3. Discrimination Models for South Korean and Chinese Rice Using Pre-Treatments
Various pre-treatment methods were applied to develop the geographical origin discrimination models using the hyperspectral image pixel dimensions of 30 mm × 30 mm. The performances of the developed PLS-DA model are shown in
Table 5. R
V2 and RMSEV of the discrimination model with no applied pre-treatment are 0.7601 and 0.2456, respectively. When the Savizky–Golay first-order derivative with the gap of 43.2 nm is applied, R
V2 and RMSEV of the discrimination model are 0.7684 and 0.2416, respectively. When the Savizky–Golay second-order derivative with the gap of 52.8 is applied, R
V2, RMSEV, and optimal factor
F are 0.7621, 0.2435, and 8, respectively. Optimal factor
F is lower than that of the discrimination model with no applied pre-treatment. These results show that the discrimination model performance using the pre-treatment is improved compared to the models using no pre-treatment.
The average spectrum of the South Korean and Chinese rice samples pre-treated by the Savizky–Golay first-order and second-order derivatives are shown in
Figure 5. The first-order derivative spectrum reveals peaks at 660 nm and 695 nm, which is related to the amount of nitrogen fertilizer. This is associated with the relatively higher reflectance of the South Korean rice samples [
36]. In the first-order derivative spectrum, the peaks at 660 nm to 695 nm can be clearly distinguished in the South Korean and Chinese rice samples compared to the non-treatment spectrum (
Figure 4 and
Figure 5a). As shown in
Figure 5b, the spectrum after application of the Savizky–Golay second-order derivative reveals lower reflectance peaks at 456, 533, 566, 605, and 624 nm for the South Korean rice samples, and higher reflectance peaks at 653, 681, and 706 nm compared to the peaks of the Chinese rice samples. The spectrum that was pretreated by a Savizky–Golay second-order derivative occurred in various wavelength bands compared to the non-treatment spectrum.
The threshold values and discrimination accuracies of the South Korean and Chinese rice samples for the PLS-DA model using various pre-treatment methods are shown in
Table 6. The discrimination accuracies with application of SNV, MSC, mean normalization, and range normalization are relatively lower than the discrimination model with no applied pre-treatment. However, the discrimination accuracies of these pre-treatment methods are very high (over 95%).
The geographical discrimination models that included baseline pre-treatment, maximum normalization, smoothing pre-treatment, and Savizky–Golay first-order and second-order derivatives yielded a total accuracy of 98.89% (
Figure 6). Among these pre-treatment methods, when the Savizky–Golay first-order and second-order derivative were applied together, the discrimination model yielded a low optimal factor and high R
v2. Therefore, it is apparently the most efficient pre-treatment method in this model (
Table 5).
According to a previous study [
1], the discrimination accuracies of a rice geographical origin model developed using one year of data (2016) were all above 99.99% for the South Korean rice samples. The discrimination accuracies of the PLS models developed in this study, which applied three years of data, were relatively low compared to those previous results [
1]. This may be because rice samples harvested for three years included more cultivation environment differences, such as cultivation temperature, climate, and soil conditions, than the samples harvested for one year.
Considering these differences, the results show that the proposed models with pre-treatment provide excellent performance for geographical origin discrimination. Furthermore, the discrimination accuracies of these models are outstanding compared to those of another previous study [
37], which showed 93.5% rice geographical origin discrimination accuracy by using an electronic nose combined with ICP-MS by PCA and linear discriminant analysis (LDA).
3.4. Results of PLS Image Discrimination of South Korean and Chinese Rice
A PLS image algorithm for geographical origin discrimination was developed. The development process of the image processing algorithm is shown in
Figure 7. Binarized images were obtained to remove the background of the HSI at a 489.4 nm wavelength. From that point, the images of all wavelength bands were masked for feature extraction of the rice samples. Regression coefficients of the PLS model were applied to all wavelength band images, followed by acquisition of the PLS images. The South Korean and Chinese rice samples were discriminated from the obtained binary image by applying a threshold value to the PLS image.
Figure 8 illustrates the PLS image of geographical origin discrimination between the South Korean and Chinese rice samples using the Savitzky–Golay first-order derivative at a pixel dimension of 30 mm × 30 mm. As shown in
Figure 8, the PLS image is a potential technology for discriminating the South Korean and Chinese rice origins. The edge of the rice sample—the boundary between the sample cell and the rice sample—shows a higher pixel value. This may be because the edge portion of the rice in the sample cell has less overlap than the inside portion. Moreover, the fluorescence signal of the sample cell and the fluorescence signal of the rice are simultaneously affected. As shown in
Figure 8b, the white value of the South Korean sample shows the effect of the sample cell; thus, it may be appropriate to exclude the edge value to avoid the effect of the cell on the geographical origin discrimination.