Characterisation of Castor ( Ricinus communis L . ) Seed Quality Using Fourier Transform Near-Infrared Spectroscopy in Combination with Multivariate Data Analysis

The potential of single-seed near-infrared (NIR) spectroscopy was investigated to characterise castor seeds based on their seed viability and seed oil content. Distinct differences between viable and non-viable seeds were observed in the principal component analysis (PCA) analysis. Furthermore, the PCA compared heavy and medium seeds with light seeds, which were comparable to the clusters of viable and non-viable seeds, respectively. Prediction accuracies of 98.7% and 99.6% were obtained with the partial least squares discriminant analysis (PLS-DA) model with a classification error rate of 0.8% and 1.1% for the training set and test set, respectively. The NIR spectral regions having chemical information from the oil in castor seeds were found to be vital for determination of seed viability.


Introduction
Castor (Ricinus communis L.), a member of the Euphorbiaceae family, is an important non-edible oil seed crop used for bioenergy production.The oil in castor seeds accounts for 42 to 58% of the total seed weight [1] and is a prospective candidate for the production of biodiesel.Castor oil contains more than 90% ricinoleic acid ((9Z,12R)-12-Hydroxyoctadec-9-9enoic acid, C 18 H 34 O 3 ), which has a hydroxyl group at position 12C that helps it to dissolve in alcohol at a low temperature (30 • C).This property of castor oil is advantageous in comparison to other vegetable oils for the production of biodiesel in terms of minimum energy (low or without heat) required for transesterification to reduce the viscosity of the oil [2].Castor oil is therefore considered economically important for biodiesel production.
The castor seeds mature sequentially within and between the racemes, leading to a variation in maturity stages at harvest [3].Furthermore, seeds from different racemes are reported to vary in seed vigour and weight [4].Thus, the final castor lot consists of seeds of different size, weight and physiological maturity and consequently seeds of differing quality [3,5].The variation in the seed lot relates to the germination ability required for initial plant establishment.It also gives an indication of oil recovery from the castor seeds as heavy seeds contain higher oil content than light seeds [6].Additionally, seed weight is positively correlated with the germination ability of the seeds, i.e., heavier seeds have better performance than light seeds [4].
Near-infrared (NIR) spectroscopy is a non-destructive method commonly used to estimate the quality parameters such as protein, water, carbohydrates and fats of agricultural products.The estimation of the quality parameters of a sample by NIR spectroscopy is based on light absorption in the near-infrared region and are due to overtones and combinations of fundamental mid-infrared vibrational transitions [7].The NIR spectrum of a sample contains complex information of different chemical bonds, therefore requiring a multivariate data analysis to obtain qualitative information.In combination with multivariate data analysis, NIR spectroscopy has demonstrated the ability to estimate oil quality and content in sunflower [8], jatropha [9] and castor seeds [10].Similarly, it has also demonstrated the potential to predict the viability of spinach [11], tomatoes [12] and cabbage and radish seeds [13].These potentials of the NIR spectroscopy provide an opportunity to characterise castor seeds with regards to their viability and oil content.The study investigates the capacity of NIR spectroscopy to predict the viability of castor seeds and the subsequent relationship of viability to oil content.

Seed Samples
The castor seeds (Ricinus communis L.) ecotypes Ahvaz and Arak collected from a seed company were grown under controlled (irrigated) or water-stressed conditions in Esfahan, Iran in 2013.The seeds were stored under dry and temperature-controlled conditions from harvest until further measurements.One subsample of 300 castor seeds comprising 150 seeds from each ecotype with 75 seeds from controlled and water-stressed conditions, respectively, were used for the viability study (Table 1).The other subsample of 1200 seeds (600 seeds from each ecotype) were individually weighed and subsequently used only for the categorisation of the castor seeds into three groups.A histogram of seed weight is shown in Figure 1.The three seed weight groups were light castor seeds (≤0.1455 g), medium seeds (0.1455 to 0.2348 g) and heavy seeds (≥0.2348 g).

Acquisition of NIR Spectra
Single-seed NIR diffuse transmission spectra expressed in absorbance and wavelengths were obtained in 2014 using a single-seed Fourier Transformed Near-infrared (FT-NIR) Analyser (Q-Interline A/S, QFAflex 600F; Tølløse, Denmark).Individual seeds were placed in a 30-sample carousel tray.To ensure optimal masking and uniform measurements, the dry seed was placed with the middle of the seed covering the hole on the side of the carousel facing the incident light.A cover with 2.5 mm apertures was placed on top of the carousel to avoid light leakage around the seed.Seeds were measured at a resolution setting of 32 cm −1 and each spectrum was obtained using the mean of 64 successive scans at 2-nm intervals from 965 to 1701 nm.A reference (background) spectrum was taken using the built-in reference of the instrument prior to scanning.

Viability Test
Germination was performed between filter paper in accordance with ISTA [15].In each box, 25 seeds were placed on wet pleated filter paper and germinated at 25 • C for a 14/10 (light/dark) hour photoperiod.Seeds were visually inspected for germination twice a day for seven days and finally at day 14.The seeds with radicle protrusion (>2 mm) were considered as 'germinated'.After the completion of the germination test, seeds that did not germinate were stained with tetrazolium.The non-germinated castor seeds were immersed in a tetrazolium solution of 0.1% and kept in an oven at 35 • C in the dark for 120 min.The red-stained seeds (indicator of viability) were classified as viable, while seeds that were not red-stained were classified as non-viable as per Gaspar-Oliveira et al. [16].All seeds were individually given a score of '0' for non-viable seeds or '1' if the seeds were viable.

Multivariate Data Analysis
The NIR spectra of the single castor seeds were pre-processed using Savitsky-Golay [17] 1st derivative of 11 point window size and detrended [18] before mean centring.
Principal component analysis (PCA) was applied in the current study to pre-processed NIR spectra to obtain an overview of all data and to identify possible extreme outliers.The visual scorings were used to see the effect of the viability of the seeds.Afterwards the data were divided using Onion algorithm of partial least squares (PLS) Toolbox ver.7.9 (Eigenvector Research, Inc., Wenatchee, WA, USA) into a training set and a test set consisting of 76 and 224 seeds, respectively.Partial least squares discriminant analysis (PLS-DA) is a linear supervised classification method and is a derivative of the standard PLS regression algorithm [19], which uses class variables instead of numeric variables.PLS is comparable to PCA, which is based on the latent variables (LVs), similar to principal components of PCA; however, LVs are calculated using the information from Y (response variable) for the decomposition of the main data.PLS1 and PLS2 algorithms are commonly used based on the number of classes, where PLS1 is used for two-class problems whereas PLS2 is used when there are more than two classes of samples.In PLS, the dummy variable Y is used as a response variable, and is set to 1 if the sample is one of either classes and 0 if not.For instance, in our work comprising two classes (viable and non-viable), each sample is coded as one of two vectors, [1 0] and [0 1], designating viable and non-viable classes, respectively.The model seldom predicts either 1 or 0 perfectly, and a cut-off value was determined that yielded minimum positive and negatives, above which the sample is predicted as 1 and below which it is predicted as 0. Variable importance in the projection (VIP) scores for the PLS-DA model were calculated as per Chong and Jun [20] and give an overview of the relative importance of each variable in the model calculation.VIP scores are a weighted sum of squares of the PLS weights considering the amount of explained Y-variance in each PLS component [21].Variables having VIP scores higher than 1 are considered to be important for PLS-DA model development; however, this does not indicate that variables having low VIP scores are irrelevant to the classification [20,21].
In this study, one training set was used to develop a PLS-DA calibration model and was validated on samples from the test set.The optimal number of latent variables (LVs) was chosen on the basis of minimal classification error for calibration and cross-validation of the model.The model was cross-validated by venetian blinds of 10 data splits with 10 samples in each split.The classification performances of the PLS-DA model were evaluated using sensitivity (Sn), specificity (Sp), classification error rate (CER), classification accuracy as described in Shrestha et al. [22] (2016) and the Matthews correlation coefficient (MCC) [23].

Spectral Overview and Chemical Assignment
The pre-processed NIR spectra showed a distinct variation of spectral absorbance at different wavelengths for viable and non-viable seeds (Figure 2a).
The peaks at 1400 and 1410 nm (Figure 2a) correspond to chemical information from the first O-H overtones.This information is for the viable and non-viable seeds' assigned to ricinoleic acid, which contains an OH group at C12 [6,10].The smaller peak for the seed coat is most probably due to water as the seed coat is not expected to contain high amounts of oils [6].The viable, heavy, and medium seeds had similar absorption peaks (Figure 2a,b), which suggests that the high oil content in the seeds is important for viability.
The viable seeds exhibited major spectral absorptions in the range of 1107-1205 nm, 1210-1270 nm, 1340-1483 nm and 1630-1701 nm with distinct peaks at 1155, 1185, 1223, 1379, 1400 and 1662 nm, respectively (Figure 3c,d).These spectral regions, apart from the region with the peak at 1400 nm, correspond to C-H stretching and are related to the chemical functional groups of fatty acids [7,24].The influence of the fatty acids on the major spectral absorbance (Figure 2a) is due to the high oil content (42-58%) in castor seeds [1,7,24].

Castor Seed Viability
The PCA analysis was used as an exploratory method [25] to investigate the relationship between objects (castor seeds) and variables (NIR wavelengths) and to identify possible outliers and/or extreme samples.The analysis revealed a few outliers in the dataset.However, removal of outliers did not improve the model, and the outliers were thus kept in the dataset.There was a distinct clustering between the viable and non-viable seeds (Figure 3a), as also indicated by the variation in spectral absorbance (Figure 2a).Surprisingly, it was not possible to show any distinct patterns based on ecotypes or growing conditions.
The variation in samples was a result of a single harvest done at mass maturity of the racemes, and the harvested material then consisted of seeds with varying maturity and different seed weights.Seed abortion before the dry matter accumulation in the seeds might be the reason for the non-viable empty seeds observed in the study [5].Castor seeds require up to 60 days (after pollination) for complete development, and seed filling starts after 20-23 days (after pollination) [26].The seed filling duration (SFD) of the individual seed is vital for maturity as they mature sequentially within and between the racemes [3].Therefore, we suppose partially developed or deformed embryos did not get the SFD required for maturity and hence were physiologically immature.During drying operations these immature will get deformed or shrink due to rapid loss of moisture and often undergo desiccation injuries that will lead to poor germination ability upon imbibition [27].The seed reserve accumulation in these physiologically immature seeds is low in comparison to mature seeds [3].The seed reserves could also be inferred from the spectral similarities of the non-viable seeds with ones from the seed coats and viable seeds (Figure 2a).The spectral loadings from the NIR regions at 1155, 1223, 1379, 1424 and 1662 nm, which have a correlation with the oil content in the seeds, proved to be important for discriminating between viable and non-viable seeds (Figure 3c).Therefore, we assume that the difference in the maturity level leads to differences in dry matter accumulation between viable and non-viable seeds.This difference is reflected in the spectral absorbance and subsequent determination of viable and non-viable seeds.

Relationship between Seed Weight, Oil Content and Seed Viability
The seed weight of castor has been positively associated with germination capacity; heavier seeds have higher potential for germination [5].Our study supports the results of Severino et al. [5] as most of the non-germinated or non-viable seeds were part of the lightweight seed group, while viable seeds were part of the medium and heavy seed groups (Figure 3a,b).A similar relationship between the seed weight and viability can also be anticipated from the comparable patterns of spectral absorption, where medium and heavy castor seeds are assigned to viable seeds and light seeds to non-viable seeds (Figure 2a,b).The major absorbance differences in light seeds were observed in the NIR regions with peaks/valleys at 1155, 1185, 1223, 1379, 1400 and 1662 nm from the spectra of medium and heavy seeds (Figure 2b).These NIR regions contributed significantly to the determination of viability and seed weight, as depicted by a loadings plot (Figure 3c).
Variation in seed weight might be due to the mixtures of seeds from different racemes [4] or to the differences in maturity level.The aborted seeds do not contain seed reserves [5] and immature seeds have less dry matter accumulation, resulting in a lower weight due to insufficient SFD and consequently less oil content.The NIR spectral absorbance differences between the seed weights were also observed in the NIR regions pertaining to the oil content (1155, 1185, 1223, 1400, 1410 and 1662 nm).Spectral resemblance to light seeds (Figure 2b) indicates that light seeds contained less oil compared to medium and heavy seeds.Severino et al. [6] observed very small variations of oil content (percentage) in seeds weighing between 250 and 450 mg.Thus, we assume there was little variation in oil content between medium and heavy seeds and oil content in the seeds could be regarded as an indicator of castor seed viability.This was also observed in cotton seeds, where seed oil content has been considered an indicator of seed vigour [28].However, oil content and viability in the castor seed are not only dependent on the pre-harvest conditions, but are also affected by the post-harvest period and storage conditions.Santoso et al. [29] reported a decrease in seed oil content during storage and correlated this with a decline in seed viability in castor seeds.They recorded a 12.1% decrease in the seed oil content and observed a similar 12.3% decline of seed viability for seeds stored in jute sacks for a period of 12 months [29].Therefore, characterisation of seeds based on seed viability could also give an indication of seed oil content or vice versa.The results from PCA show that single-seed NIR spectroscopy could be used for characterisation and further prediction of viable and non-viable castor seeds.

Classification of Viable and Non-Viable Seeds
The single-seed NIR spectra were used to develop a PLS-DA model for classifying seeds into two classes of viable and non-viable seeds.The PLS-DA model was developed using five LVs that explained 98% of the total variation in the NIR spectra and was further used to predict the seeds of the test set.The model classified viable and non-viable seeds at an accuracy rate of 98.7% and performed similarly on the test set with 99.6% classification accuracy (Table 3).The classification error rate of 0.8% was observed for viable and non-viable castor seeds, which was consistent when seeds from the test set were predicted (1.1% ER) (Table 3).The number of misclassified seeds was very low in the training and test set (Table 3), as can also be observed in Figure 4.The sensitivity and specificity are the two major indicators of reliability of the model and values near to 1 indicate robustness of the model [30].The values of sensitivity and specificity were almost equal to 1 for viable and non-viable seeds (Table 3), indicating that the developed model had unbiased capability to identify the viable seeds (i.e., sensitivity) and reject samples (non-viable seeds) not belonging to the class (i.e., specificity) [30].The robustness of the model could also be observed from the strong correlation between the NIR spectra of the castor seeds and seed viability indicated by Matthews correlation coefficient (MCC) values, which were 0.98 for both the training and test sets.The NIR wavelengths at 1155, 1185, 1223, 1379, 1424 and 1662 nm were recorded as important by the variable importance for projection (VIP) scores for classifying the viable and non-viable seeds (Figure 3d).These important wavelengths have a chemical correlation with oil content in the seed, as described earlier [7,10,24].
The study demonstrates the use of single-seed NIR spectroscopy for the segregation of viable and non-viable castor seeds.In areas where castor is grown in marginal lands and requires high seed germination, single-seed NIR spectroscopy can be used to increase the crop production [31].The study shows that NIR spectroscopic signatures from the seed relating to the oil content could be used as an indicator of the seed viability.Furthermore, previous studies have indicated that seed viability and seed oil content are highly correlated [29]; therefore, NIR spectroscopy could equally be used for assessing the oil content in the castor seeds.Similar studies involving single-seed NIR spectroscopy have shown potential for determining the seed oil content in oil-rich crops like sunflower [8] and jatropha [9].Moreover, a NIR spectroscopy study on the oil quality in castor seeds has also shown potential for identifying seeds with a high content of ricinoleic acids or oleic acids [10].Thus, NIR spectroscopy also presents an opportunity in a production system for supplying high-quality seeds to the international bioenergy market.

Conclusions
The PCA and PLS-DA models developed in the current study show the use of NIR spectroscopy as a non-destructive tool for assessing the seed quality of castor seeds in relation to seed viability and seed oil content.The study also indicates a positive correlation between seed viability and NIR wavelengths that Fernández-Cuesta et al. [10] found to be related to oil content in castor seeds.
The NIR spectral regions having chemical information from the oil in castor seeds were found to be vital for the determination of seed viability.In conclusion, non-destructive NIR technology has shown its applicability for segregating viable and non-viable seeds and could be an effective strategy for improving castor seed quality from viability to seed oil recovery.

Figure 1 .
Figure 1.Histogram of individual seed weight.

Figure 2 .
Figure 2. The average pre-processed absorbance from 965 to 1701 nm for (a) viable vs. non-viable seeds, (b) light, medium and heavy seeds and (c) endosperm vs. seed coat of castor seeds.Near-infrared (NIR) spectra in (a,b) are from single-seed measurements whereas NIR spectra in (c) are from glass vials.

Figure 3 .
Figure 3. Principal component analysis (PCA) score plots (a and b) for PC1 vs. PC2 on pre-processed NIR spectra and PCA loading plot of PC1 (c) and variable important in the projection (VIP) plot (d) for viable seeds using the partial least squares discriminant analysis (PLS-DA) model from 965 to 1701 nm.

Table 1 .
Number of seeds grown under controlled and stressed conditions.The values in parenthesis indicate the number of light, medium and heavy castor seeds, respectively.

Table 2 .
[14]percentage and oil yield in the field experiment.Modified from[14].

Table 3 .
The details of PLS-DA classification model on seed viability for training, cross-validation and test sets.(MS-misclassified seeds, TS-total seeds, Sn-sensitivity, Sp-specificity, CER-classification error rate, MCC-Matthews correlation coefficient).
Figure 4. PLS-DA predictions for non-viable seeds.Values greater than the threshold value indicate class membership.