Near-Infrared Hyperspectral Imaging (NIR-HSI) for Nondestructive Prediction of Anthocyanins Content in Black Rice Seeds

Anthocyanins are an important micro-component that contributes to the quality factors and health benefits of black rice. Anthocyanins concentration and compositions differ among rice seeds depending on the varieties, growth conditions, and maturity level at harvesting. Chemical composition-based seeds inspection on a real-time, non-destructive, and accurate basis is essential to establish industries to optimize the cost and quality of the product. Therefore, this research aimed to evaluate the feasibility of near-infrared hyperspectral imaging (NIR-HSI) to predict the content of anthocyanins in black rice seeds, which will open up the possibility to develop a sorting machine based on rice micro-components. Images of thirty-two samples of black rice seeds, harvested in 2019 and 2020, were captured using the NIR-HSI system with a wavelength of 895–2504 nm. The spectral data extracted from the image were then synchronized with the rice anthocyanins reference value analyzed using high-performance liquid chromatography (HPLC). For comparison, the seed samples were ground into powder, which was also captured using the same NIR-HSI system to obtain the data and was then analyzed using the same method. The model performance of partial least square regression (PLSR) of the seed sample developed based on harvesting time, and mixed data revealed the model consistency with R2 over 0.85 for calibration datasets. The best prediction models for 2019, 2020, and mixed data were obtained by applying standard normal variate (SNV) pre-processing, indicated by the highest coefficient of determination (R2) of 0.85, 0.95, 0.90, and the lowest standard error of prediction (SEP) of 0.11, 0.17, and 0.16 mg/g, respectively. The obtained R2 and SEP values of the seed model were comparable to the result of powder of 0.92–0.95 and 0.09–0.15 mg/g, respectively. Additionally, the obtained beta coefficients from the developed model were used to generate seed chemical images for predicting anthocyanins in rice seed. The root mean square error (RMSE) value for seed prediction evaluation showed an acceptable result of 0.21 mg/g. This result exhibits the potential of NIR-HSI to be applied in a seed sorting machine based on the anthocyanins content.


Introduction
Rice (Oryza sativa L.) is a vital cereal food in Asia and is consumed by almost half of the world's population [1]. The chemical composition of rice is characterized by a high content of carbohydrates, mainly starch (56-74%), a moderate content of protein (8-11%), and a minor content of lipids (2-4%) and minerals (1-3%) [2]. There are many different kinds of rice, including white rice, which is widely consumed, and pigmented rice. Pigmented rice, such as purple and black rice, contains important bioactive compounds beneficial for human health. The bran of black rice is rich in fiber and many kinds of phytochemicals, such as tocopherols, tocotrienols, oryzanols, vitamin B complex, and other phenolic compounds [3,4]. The protein content of black rice was also reported to be higher than white rice. In addition, black rice contains anthocyanins and proanthocyanidin in its aleurone layer, meaning that black rice is considered an essential health product.
Anthocyanins are a water-soluble flavonoid compound that is responsible for the appearance of certain colors in nature and may appear as red, purple, or blue, depending on the pH. This color change is due to the reversible structural transformation of anthocyanins in a certain range of pH values [5,6]. Anthocyanins were reported to be an essential component of traditional herbal medicine [7], and have a wide range of pharmacological applications against various stress conditions and chronic diseases, e.g., inflammation, cognitive decline, neural dysfunction, capillary fragility and permeability, platelet aggregation, cardiovascular complication, liver damage, lipid peroxidation, and cancer tumor growth [8]. Moreover, in vivo tests reported that supplementation of black rice containing anthocyanins in the diet reduces atherosclerotic lesions in hypercholesterolemic rabbits and apolipoprotein-E-deficient mice. These biological activities were related to anthocyanins' potencies as antiradical scavenging and antioxidant activities [9].
The anthocyanins content in black rice is greatly affected by genotype and the environmental growth condition [10] and reported ranging from 0.52 to 3.47 mg/g. Cyanidin-3-glucoside (C3G) and peonidin-3-glucoside (Pn3G) are the most common forms of anthocyanins found in black rice [11]. Other forms of anthocyanins observed in black rice are cyanidin-3,5-diglucoside, cyanidin-3-rutinoside, cyanidin-3-gentiobioside, malvidin-3glucoside, and peonidin-3-rutinoside [12,13]. Anthocyanins are distributed in all parts of black rice seeds but are mostly accumulated in the bran. The anthocyanins level in bran is more than fifteen times higher than that in the embryo. Furthermore, the C3G and Pn3G content in bran were reported as being 98% and 93% of their total content in whole black rice, whereas only 2% and 7% were available in embryos, respectively [14][15][16].
Chromatography techniques, including paper chromatography and high-performance liquid chromatography (HPLC), have extensively studied the qualitative and quantitative chemical components, including anthocyanins, in agricultural products [17]. These techniques are a destructive method in sample preparation and require chemicals solvent for the pre-analysis of extraction, which is time-consuming, costly, and labor-intensive. In addition, these methods also generate chemical waste, which is harmful to the environment. Moreover, the large-scale detection of anthocyanins attributed to a single seed is quite difficult with these techniques. Thus, chemical-based seed selection for the quality control in factories remains a challenging process. Hence, the development of non-destructive methods related to spectroscopy and image processing analysis allows for overcoming the drawbacks mentioned above.
Near-infrared and mid-infrared spectroscopy methods have been widely used to detect anthocyanins in agricultural products such as soybeans [18], flowering tea [19], and sweet cherries [20]. Fourier transform near-infrared (FT-NIR) and Fourier transform infrared (FT-IR) can also be applied to predict anthocyanins content in black soybean based on the seed's spectra [21]. Many researchers have favored the hyperspectral imaging (HSI) system, which integrates the spectroscopic and imaging techniques due to the capability in acquiring spectral and spatial information simultaneously. In recent years, HSI has been adopted to assess chlorogenic acid content in Flos lonicereae [22], to determine deoxynivalenol in bulk wheat kernel [23], and to monitor the quality of strawberries during storage [24].
The potential capability of HSI to predict macro-components in grain, such as starch in corn and protein in wheat, based on single kernel data, has also been proven [25,26]. For micro-component, particularly in anthocyanins detection, visible hyperspectral imaging (Vis-HSI) has been applied to slices of purple potato [27,28] and Lychee pericarp [29]. PLSR models based on the full wavelength in the lychee pericarp sample resulted in performance over 0.9 for both calibration and prediction models, while the pre-processing method (SNV, MSC, and SG) slightly improves the performance model to predict anthocyanins in purple-fleshed sweet potato slices from 0.876 to 0.889. Furthermore, this hyperspectral method also successfully visualizes the anthocyanins distribution throughout the sample.
HSI in the NIR region also has been used to predict micro-component, including total anthocyanins, flavonoids, and phenolics of dry black goji berries (Lycium ruthenicum Murr.) [30]. The PLS regression model using a full range of spectra (975-1646 nm) resulted in the performance of the calibration model of total anthocyanins and phenolics over 0.90 and 0.85 for total flavonoids. This technique also resulted in high performance of the prediction model of 0.89, 0.84, and 0.85 for anthocyanins, flavonoids, and phenolics, respectively. In addition, the result of the PLS regression method was comparable to the other techniques, including LS-SVM and CNN, that were applied in the study. However, in this sample, anthocyanins were uniformly distributed throughout the sample, and the content was higher than those in black rice.
Anthocyanins are a micro-component in rice, mainly available in its aleurone, and are not distributed uniformly throughout the rice seed. Anthocyanins produce a dark color (purple to black) in the seed surface that may absorb the light emitted from the instrument. However, most of these chemicals are present on the rice seed's surface, which might indicate one of the advantages of measurement using a non-destructive system, which works based on the interaction between light and matter. At present, no research has been done on the application of near-infrared hyperspectral imaging (NIR-HSI) for evaluating anthocyanins content in a single seed of rice. Perhaps, the lack of such studies is due to the low concentration of anthocyanins in pigmented rice, and the market demand for this rice is not high. However, with the gradual increase in functional food demand in the last decade, the research interest related to anthocyanins in rice has risen. Thus, this study aims to evaluate the potential of near-infrared hyperspectral imaging (NIR-HSI) for predicting anthocyanins in individual rice seeds. For comparison, this study also developed a model for the prediction of anthocyanins based on the powder sample of the same rice variety.

Rice Sample
Thirty-two samples of black rice (listed in Table 1) were obtained from Rural Development Administration (RDA), Jeollabuk-do, South Korea, harvested in 2019 and 2020. Fourteen varieties were grown in both 2019 and 2020, two rice types (Heugjinju Byeo and Cheongpungheugchal) were only cultivated in 2019, and two varieties (Gancheok Byeo and Cheonghaejinmi) only harvested in 2020. The weight of 100 seeds of rice ranged from 1.46 to 2.44 g, indicating rice size variability based on the variety. The rice seed's color and size within a single variety were also varied, which resulted in the possibility of different anthocyanins contents among different seeds. Therefore, the variability of seeds in the biochemical composition will result in the variation of spectral data extracted from the image of HSI data. On the other hand, investigating anthocyanins based on a single seed is constrained by the minimum weight requirement of the HPLC procedure. Hence, in this research, twenty grams of rice seeds were collected from each rice variety of each year, and among them, 36 seeds were selected randomly for the image data acquisition using the NIR-HSI system. After acquiring the image, all of the seeds were ground to obtain a uniform particle size under 250 µm. The powder sample was then placed on the sample plate and scanned using the same NIR-HSI system for data acquisition. After both seed and powder image acquisition, the powder samples were then sent for chemical analysis using the HPLC method.

NIR Hyperspectral Imaging System
A laboratory-based line scan NIR-HSI system ( Figure 1) was used to scan and collect the sample data in reflectance mode. The system consists of six 100 W tungstenhalogen line-light sources (Light Bank, Ushio Inc., Tokyo, Japan) connected to the optical fiber used to illuminate the sample during measurement. The sensing module was composed of a line-scan spectrograph (NIR, Headwall Photonics, Fitchburg, MA, USA) that covered a spectral range of 895-2504 nm, a mercury cadmium telluride (MCT) detector (Model: Xeva-2.5-320; Xenics, Heverlee, Belgium) to detect light reflected by the sample, as well as a high-performance camera (Headwall Photonics, Fitchburg, MA, USA) with 320 (spatial) × 256 (spectral) pixel resolution and an objective lens (focal length 25 mm f/1.4). During data acquisition, the sample plate was placed on the motorized table to move the sample through the camera's capture view at a particular speed controlled by a DC motor. The software for data acquisition was developed using Microsoft Visual Basic (version 6.0) on a Windows platform. the NIR-HSI system. After acquiring the image, all of the seeds were ground to obtain a uniform particle size under 250 µ m. The powder sample was then placed on the sample plate and scanned using the same NIR-HSI system for data acquisition. After both seed and powder image acquisition, the powder samples were then sent for chemical analysis using the HPLC method.

NIR Hyperspectral Imaging System
A laboratory-based line scan NIR-HSI system ( Figure 1) was used to scan and collect the sample data in reflectance mode. The system consists of six 100 W tungsten-halogen line-light sources (Light Bank, Ushio Inc., Tokyo, Japan) connected to the optical fiber used to illuminate the sample during measurement. The sensing module was composed of a line-scan spectrograph (NIR, Headwall Photonics, Fitchburg, MA, USA) that covered a spectral range of 895-2504 nm, a mercury cadmium telluride (MCT) detector (Model: Xeva-2.5-320; Xenics, Heverlee, Belgium) to detect light reflected by the sample, as well as a high-performance camera (Headwall Photonics, Fitchburg, MA, USA) with 320 (spatial) × 256 (spectral) pixel resolution and an objective lens (focal length 25 mm f/1.4). During data acquisition, the sample plate was placed on the motorized table to move the sample through the camera's capture view at a particular speed controlled by a DC motor. The software for data acquisition was developed using Microsoft Visual Basic (version 6.0) on a Windows platform.

NIR-HSI Data Acquisition and Extraction
The images of the samples (I raw ) were captured using the NIR-HSI system based on the arrangement as follows. Two rice varieties (36 seeds of each) were arranged on the 10 × 10 grid black square plate, transferred to the moving table controlled by a stepping motor, and scanned line by line using the HSI system. In total, there were 16 scanned plates for 32 rice varieties. The powder was placed into a sample holder plate containing 9 samples of each and adjusted to 5 mm thickness for HSI scanning. The sample arrangement process of powder was carried out by flattering the sample surface using a rod stick without compression. The distance between the sample and the camera was set as 25 cm in order to cover the spatial range of the sample. The spatial and spectral data were obtained when the sample passed the camera field of view (FOV) and were saved in a 3D hypercube containing two spatial dimensions and one spectral dimension. The dark (I dark ) and white (I white ) reference images were acquired to calculate the reflectance value and to correct noise which may come from the environment. The dark reference image (0% reflectance) was collected by covering the camera lens using an opaque cap and turning off the light. In comparison, the white reference (99% reflectance) was acquired by using a white Teflon sheet. The HSI corrected image (I c ) was calculated based on the HSI raw data image (I raw ) using Equation (1), as follows: The sample spectral information was extracted from the corrected image (I c ) after removing the background by applying the region of interest (ROI) selection step. Both processes, image correction and background removal, were conducted using MATLAB software (The Math Works, Natick, MA, USA, R2019a). To simplify the image correction and spectral extraction process and to explain the data analysis strategy, the flow chart of this process is presented in Figure 2.

Data Pre-Processing
The spectral data mostly contained a considerable level of noise, possibly generated from the camera and environmental effect. The acquired spectral data were subjected to several pre-processing methods, including mean normalization, multiplication scatters

Data Pre-Processing
The spectral data mostly contained a considerable level of noise, possibly generated from the camera and environmental effect. The acquired spectral data were subjected to several pre-processing methods, including mean normalization, multiplication scatters correction (MSC), standard normal variate (SNV), and Savitzky-Golay first derivative ( Figure 2). Normalization was used to fit spectral data within a comparable range (0-1) to compensate for inconsistencies due to the thickness of the sample and optical source length. Standard normal variate (SNV) pre-processing was used to correct spectral information changes caused by scattering and particle size variability [31]. MSC was used to correct the spectra scattering intensity [31], while Savitzky-Golay's first derivatives were applied to correct the baseline effect and eliminate overlapping peaks [32].

High-Performance Liquid Chromatography (HPLC) Test for Reference Analysis
The anthocyanins content in black rice was measured using high-performance liquid chromatography (HPLC) by injected 20 µL sample extract into the system. The extraction process was carried out by dissolving the powder of black rice (1 g) into 30 mL of 1% HCL/40% CH 3 OH at 4 • C for 24 h. Before analysis, all of the samples were filtered through a 0.45 µm membrane filter. The HPLC system for quantification of anthocyanins was operated using an Agilent 1200 series HPLC (Boeblingen, Germany) system comprising of a quaternary pump, an Agilent 1200 series diode-array detector (DAD) operating at 530 nm, ChemStation software, a 1200-well-plate autosampler, and a Tosoh ODS-120T column (250 mm × 4.6 mm i.d., Tokyo, Japan) protected with a Nova-Pak C 18 guard insert column (Water, Milford, MA, USA). During the measurement, the column temperature was maintained at 30 • C.
The mobile phase included solvent A (5% formic acid/H 2 O v/v) and solvent B (5% formic acid/acetonitrile v/v), entering the system at a flow rate of 0.7 mL/min. The applied gradient program was: 0-20 min, 10-30% B; 20-25 min, 30-60% B; 25-26 min, 60-10% B; and 26-35 min, 10% B. The anthocyanins concentration was obtained by comparing the HPLC peak area to external standard calibration curves. The standard calibration curve (R 2 ≥ 0.999) was provided by injecting 0.5-1 µg of anthocyanins into 20 µL of 1% HCl/40% CH 3 OH. The total anthocyanins content and individual type of anthocyanins were analyzed, and the result was presented in mg/g.

Model Development
The partial least squares regression (PLSR) technique was applied to predict total anthocyanins content in black rice. This multivariate method combines multiple regression and feature-based extraction on the principal component analysis method, which may predict the behavior of dependent variables based on the large datasets of independent variables. The model of PLSR relies on the linear relationship between X and Y variables, which creates a possibility to predict the component in the X variable [33]. The equations to express the PLS model are as follows: where X is the independent variable matrix representing the spectral data, while Y represents the dependent variable related to the anthocyanins content of black rice. T and U are score matrices, and P and Q are the X and Y loading matrices, respectively. The E matrices denote the error matrices for the X and Y data. Finally, the inner relation between the spectral data and the anthocyanins content was constructed using least squares (Equation (4). This technique was previously used in our study to predict anthocyanins in intact soybean using FT-NIR and FT-IR methods; thus, it is recommended because of the excellent performance. In this research, 1152 spectral data of seeds (36 seeds × 32 rice samples) and every seed sample were randomly grouped into 9. The spectral data of each group were then averaging, which resulted in 288 spectral data (9 spectra × 32 rice sample). The averaging process was carried out to accommodate the probability of rice seed physical characteristics difference within a single rice sample. The weight of 100 seeds of rice ranged from 1.46 to 2.44 g, indicating rice size variability based on the variety. Average seed weight, which is closely related to seed size, has been proven to affect the phytochemical content in soybean [34], and is thought to have a similar effect on rice seed. The 288 spectral data for powder samples were obtained from 32 varieties, with 9 data from each. The entire dataset was divided into a calibration and validation dataset. The calibration dataset consisted of 70% of the data, while the validation dataset consisted of 30% of the data.
The wavelength area used to develop the model ranged from 900 to 1800 nm. This waveband selection was carried out based on the consideration that the illumination of a tungsten-halogen light source equipped in the NIR-HSI only effective in that region even though the sensing module of the instrument has a spectral range coverage of 895-2504 nm. The reflectance intensity of the white image using the Teflon sheet (99% reflection) over 1800 nm was low and flat, meaning that the data obtained from this area is unreliable or noisy.

Image Processing
The capability of creating chemical images is one of the benefits of using hyperspectral imaging in the prediction of chemical components in food and agricultural product. These advantages provide a possibility to visualize the distribution of the evaluated chemical in the examined sample. The chemical image was developed based on the beta coefficient of the created model. The process of creating the chemical image was done by transferring the 3D hyperspectral image to a two-dimensional matrix and multiplying the matrix using the PLS regression coefficient. The resulted matrix was then converted into a 3D image. The visualization of the different anthocyanins concentrations in the samples was carried out by summing the corresponding pixels of all band images. The following equation explains the process to obtain a chemical image (I chem ): According to the equation, the hypercube image at the particular band (i) is denoted by I i . The PLS regression coefficient is represented by B i , and the constant value is indicated by C.

The Spectral Characteristic of Black Rice
The rice spectral data based on the seed and powder sample in the 900-1800 nm wavelength showed a similar pattern (Figure 3), characterized by several broad peaks in the same region. The band ranged from 1000 to 1200 were assigned to CH second overtone, which may come from aromatic or aliphatic compounds. The range of wavelengths between 1200 and 1600 nm is the region for the vibration of OH first overtone, NH first overtone, and CH combinations [35]. The next area (1600-1800 nm) is assigned to the OH combination band of water and CH first overtone [36]. Figure 3 also shows a distinct intensity based on the different concentrations of anthocyanins in black rice. In the first range (1140-1350 nm), the CH second overtone band arises. In the band range between 1400 and 1500, the CH combination from the aromatic compound and OH first overtone stretching exist. Despite containing many CH groups, cyanidin-3-glucoside, which is the most abundant anthocyanins in rice, is also rich in OH groups in the structure [37].

High-Performance Liquid Chromatography (HPLC) Result
The reference values for anthocyanins in this study were evaluated from thirty-two rice samples planted in different years (2019 and 2020), and the result is presented in Table 2. Sixteen rice varieties harvested in 2019 contain anthocyanins at levels between 0.07 and 1.18 mg/g, while 2020 samples contain anthocyanins at levels ranging from 0.32 to

High-Performance Liquid Chromatography (HPLC) Result
The reference values for anthocyanins in this study were evaluated from thirty-two rice samples planted in different years (2019 and 2020), and the result is presented in Table 2. Sixteen rice varieties harvested in 2019 contain anthocyanins at levels between 0.07 and 1.18 mg/g, while 2020 samples contain anthocyanins at levels ranging from 0.32 to 2.07 mg/g. Cyanidin-3-glucoside (C3G) is the most abundant anthocyanins in the examined samples and can be evaluated in all rice varieties. Among 32 samples of black rice, twenty-five samples contain peonidin-3-glucoside (Pn3G) in a shallow concentration, ranging from 0.05 to 0.18 mg/g. A previous study by Park et al. (2008) [38] reported that C3G and Pn3G were the major anthocyanins identified in the Korean black rice, while Lee (2010) [11] showed C3G as the highest anthocyanins type in black rice. C3G and Pn3G were also reported as the most common anthocyanin types in Chinese black rice [12]. In this research, the C3G concentration was similar to the total anthocyanins content. Thus, the prediction model for anthocyanins in this study was only based on the total anthocyanins content in rice.

Partial Least Square Regression (PLSR)
The multilinear regression model in this study was constructed using 288 spectra ranging from 900 to 1800 nm for both seed and powder samples. The regression results of the PLSR model for each pre-processing method developed using all samples comprising all varieties and harvesting time are presented in Table 3. The obtained model of seed samples showed an acceptable performance indicated by a high correlation coefficient (R 2 ), more than 0.85 and 0.75 for the calibration and validation dataset, respectively. The model performances of the powder sample were better than those of seed samples, denoted by higher R 2 , which were more than 0.9 for both the calibration and validation dataset for all pre-processed and raw data, except for Savitzky-Golay's (SG) first derivative on the validation dataset. Anthocyanins in rice are not distributed uniformly throughout the seed, which affects sample homogeneity. In addition, the standard error of the models developed based on the seed sample was slightly higher than powder, indicating more scattered data. Furthermore, the model performance developed using a mixed sample of 2019 and 2020 samples was similar to the model constructed based on the separated harvesting time. This phenomenon indicated the consistency of the model and proved the robustness of the PLSR method to predict anthocyanins in rice using NIR-HSI data. Overall, the SNV pre-processing method exhibited the highest performance (R 2 ) of 0.93 in the calibration dataset and 0.90 on the validation dataset for predicting rice anthocyanins based on seed spectra of mixed samples. This performance is comparable to the previous study conducted by Zhang et al. (2020) that predicted anthocyanins of dry goji berries and provided R 2 of 0.91-0.95 [30]. The NIR-HSI technique coupled with LS-SVM also was efficiently employed with R 2 p 0.96 and RMSE 0.146 mg/g to quantify total anthocyanins in mulberry fruit (Morus sp.) [39]. Another study conducted by Chen et al. (2015), which also proves the feasibility of NIR-HSI to measure anthocyanins in wine grapes, obtained the highest accuracy (R 2 p = 0.94, RMSE = 0.0046 mg/g) by PLS with smoothed spectra [40]. Our result also revealed that model performance based on the powder was slightly better, denoted by a higher correlation coefficient of calibration (Rc 2 ) and validation (Rv 2 ) (0.94 and 0.93, respectively), and lower standard error on both calibration and validation datasets. This result was understandable due to powder samples being more homogenous than seed samples. Hence, the result will open up the door for using the NIR-HSI system to separate rice seeds based on the anthocyanins content.
The bands which contributed to the model development can be identified using the beta coefficient curve ( Figure 3C,D). The specified bands of the seed and powder sample were similar, around 1140 to 1350 nm, which can be associated with the CH second overtone, and around 1450 nm representing the CH combination from the aromatic structure. Both bands were in the range of influential bands used for predicting anthocyanins in Jaboticaba fruit (Myrciaria jaboticaba (Vell.) O. Berg) [41]. The previous study by [40] also reported three important wavebands similar to our finding (1050, 1250, and 1400 nm), which contributed significantly to the development model predicting anthocyanins in wine grapes using NIR hyperspectral imaging. This range of wavebands between 1415 and 1512 nm is also reported as essential wavebands for flavonoid constituents [42].

Visualization Image Based on Anthocyanins Content in Rice
The effectiveness of the HSI image to visualize the chemical content in powder samples has been proven by several researchers. HSI has been successfully used to detect and quantify apricot in almond powder [43] and detect peanuts and walnuts in wheat powder [44]. In this study, PLS images were generated to visualize the content of anthocyanins in black rice powder and seeds. The chemical image provided the spatial distribution and the anthocyanins concentration of the sample, which are essential to assist in anthocyanins determination. Anthocyanins content in the powder sample was clearly confirmed at a range from 0.08 to 2.07 mg/g by the corresponding color bar shown in Figure 4A, in which the color changes from blue to red following the increase in anthocyanins concentration.   Meanwhile, the color of seeds within a single variety varied, indicating the anthocyanins content variability ( Figure 4B). The color gradation based on the concentration could not distinguish the group of seeds based on the anthocyanins content due to the variability of the anthocyanins content within a single variety. In this experiment, the anthocyanins content was examined based on the rice variety. The evaluation of the microcomponents of rice could not be done for every single seed due to the minimum weight requirement in the extraction process. Hence, to justify the prediction on a single seed, the average anthocyanins content within each variety was calculated ( Figure 5). The average value was compared to the reference value from HPLC and was followed by calculating the root mean square error (RMSE) to evaluate the prediction accuracy.   The average and the broadest error between mean prediction and reference values were 0.16 and 0.49 mg/g, respectively. The RSME value was 0.21 mg/g, slightly higher than the SEP value of the model of 0.16 mg/g. Among thirty-two varieties, only seven varieties revealed an error higher than the RMSE value, and four of them were varieties with high anthocyanin concentration (more than 1 mg/g). In this experiment, only six varieties contained anthocyanins concentrations of more than 1 mg/g, which possibly affected the model's development. The prediction plot for all varieties is presented in Figure 6. The average and the broadest error between mean prediction and reference values were 0.16 and 0.49 mg/g, respectively. The RSME value was 0.21 mg/g, slightly higher than the SEP value of the model of 0.16 mg/g. Among thirty-two varieties, only seven varieties revealed an error higher than the RMSE value, and four of them were varieties with high anthocyanin concentration (more than 1 mg/g). In this experiment, only six varieties contained anthocyanins concentrations of more than 1 mg/g, which possibly affected the model's development. The prediction plot for all varieties is presented in Figure  6.

Conclusions
A novel technology to predict anthocyanins, a micro-component in rice for single seed, was reported in our research in a rapid and non-destructive technique. The model's

Conclusions
A novel technology to predict anthocyanins, a micro-component in rice for single seed, was reported in our research in a rapid and non-destructive technique. The model's performance to predict anthocyanins in seeds was evaluated by comparing the model's performance based on the powder sample. NIR hyperspectral imaging combined with PLSR resulted in a prediction performance (R 2 ) of 0.75-0.90, with SEP ranging from 0.26 to 0.16 mg/g. The SNV pre-processing method significantly improves the model performances, denoted by the highest R 2 of 0.85-0.95 and lowest SEP of 0.11-0.16 mg/g. This performance was comparable to the model developed based on the powder sample with R 2 and SEP of 0.92-0.95 and 0.09-0.15 mg/g, respectively. A chemical image clearly confirmed the anthocyanins content in the powder sample. The anthocyanins prediction based on seed samples showed the potential of NIR hyperspectral imaging to predict anthocyanins in a single seed of rice.
The HSI approach provides the potential for rapid analysis of a large numbers of seeds. Even though the single seed evaluation is not a substitute for the bulk analysis of anthocyanins, particularly in black rice, it can open up the possibility to develop an online chemical-based sorting machine for seeds. This machine has a potential application in the quality control of seed in the manufacturing industry as well as in the breeding program.