Chemometric Differentiation of Pistachios ( Pistacia vera , Greek ‘Aegina’ Variety) from Two Different Harvest Years Using FTIR Spectroscopy and DRIFTS and Disk Techniques

: Food quality is a topic of utmost importance as more and more emphasis is placed on quality rather than quantity of products. Previous studies have pointed out the interaction of quality with the harvest year. In this study, 22 Pistacia vera (Greek ‘Aegina’ variety) samples (11 from 2017 and 11 from 2018) were differentiated using Fourier transform infrared spectroscopy (FTIR) and (a) diffuse reﬂectance infrared Fourier transform spectroscopy (DRIFTS) and (b) KBr/sample disk techniques. In both years, the pistachios trees’ growing followed standard cultivation methods and similar agronomic conditions. Two chemometric models were developed using partial least squares-discrimination analysis (PLS-DA). DRIFTS proved unable to statistically differentiate the samples (R 2 = 0.96266, Q 2 = 0.63152). On the contrary, the disk technique completely differentiated the pistachio samples (R 2 = 0.99705, Q 2 = 0.97719). The 1720–1800 cm − 1 region mostly contributed to the discrimination. The disk-FTIR chemometric model is fast, robust, economical, and environmentally friendly for determining pistachio matrix quality.


Introduction
Pistachios (Pistacia vera) are valued all over the world and have been described as superfoods for their special organoleptic characteristics (color, intense aroma, and taste), their ability to provide a multitude of beneficial ingredients in small quantities, and the possibility of valorizing the nut as a whole. Pistachio kernels are consumed fresh and dried in the sun or mechanically either as a snack, as a raw material to produce various products, or as a confectionery industry ingredient [1,2]. Their composition is characterized by low carbohydrate content, protein content above 20% w/w, and lipid content around 50% w/w. They contain a significant number of bioactive antioxidants, such as flavonoids, stilbenes, tocopherols, carotenoids, and chlorophylls [3,4]. In comparison to other nuts, pistachios have higher levels of lutein, zeaxanthin, γ-tocopherol, vitamin K, dietary fibers, phytosterols, and carotenoids [5,6]. Numerous studies prove the beneficial effect of pistachios to human health. They are known for their antioxidant, anti-inflammatory, anti-cancer, and cardioprotective action. In addition, their antimicrobial, anti-ischemic, and immunoregulatory properties have attracted the attention of many researchers [3,7].
In recent years, the pursuit of food quality has been a crucial goal for the food industry and consumers [8]. Several instrumental analytical methodologies have been broadly applied to address quality issues. The majority of these are not suitable for everyday or extensive analyses since they require time-consuming sample preparation and specialized laboratory staff. In recent years, researchers have focused on the application of methods that provide a whole molecular fingerprint of the food matrix, such as spectroscopic (infrared (IR), Raman, nuclear magnetic resonance (NMR)) and chromatographic tech-niques (gas chromatography-mass spectrometry (GC-MS) and high performance liquid chromatography-mass spectrometry (HPLC-MS)) [9].
In particular, Fourier transform infrared (FTIR) spectroscopy has emerged as a promising analytical tool both on industrial and research scales. Its high competitiveness lies in its speed of analysis, low cost, and convenience of sample preparation [9]. FTIR spectroscopy has been applied to nuts, flours, oils, cakes and flakes, meat, and spirit beverages [10]. The IR spectrum includes information about functional groups of the chemical compounds in the complex matrix of food matrices, such as pistachios. Diffuse reflectance infrared Fourier transform spectroscopy (DRIFTS) and KBr/sample disk spectroscopy are two FTIR sampling techniques that became very popular in a very short duration, since they offer a short analysis time and minor sample preparation [11,12].
Typically, in the DRIFTS technique, the sample may be a powder or a rough surface solid through which the radiation is scattered. Infrared light is directed to a sample cup and can lead to a single reflection from the surface (specular reflection) or be reflected multiple times causing diffuse scattered light over a large area. DRIFTS accessories are designed to reject the radiation of specular reflection and to collect by a mirror as much as possible from the diffuse scattered light, subsequently measured by the detector [11,13]. The KBr/sample disk technique involves dilution of the sample in KBr, transportation of a mixture amount in a die, and the use of a hydraulic press to form a disk. By this technique, the diffuse scattered light is increased as the incident light more deeply penetrates the sample [14].
Studies with chemometrics data, such as FTIR, result in complex multivariate datasets, so that multivariate data analysis is required to investigate these complex datasets [15]. FTIR spectroscopy associated with different chemometric tools has been frequently applied to dairy products, honey, coffee, olive oil, and wine [9]. One of the most used methods for development of multivariate discrimination models is partial least squares discriminant analysis (PLS-DA) [16]. PLS-DA can identify which explanatory variables significantly contribute to the construction of PLS components and, consequently, have a high explanatory power on the response variable [17].
The food quality and, consequently, the chemical characteristics of food matrices can be affected by the year of harvest [18]. In the present study, DRIFT and KBr disk spectra of pistachio samples of the Greek 'Aegina' variety from two consecutive years of harvest were acquired, aiming to: (a) accurately estimate whether pistachios could be classified in two classes due to year-to-year variability and (b) compare which spectroscopic technique gives the best obtained PLS-DA model ascertained through R 2 and Q 2 indicators.

Pistachio Samples
A total of 22 Pistacia vera samples of the 'Aegina' variety equally obtained from 2 consecutive years of harvest were provided directly from pistachio producers across Greece (Aegina, Megara, Phthiotis, Trizina). The origin conditions (pistachio farmers, field coordinates, cultivation care, post-harvest way of drying) of the 11 samples derived from 2017 harvest year were the same with those of the 11 samples from 2018 harvest year. All samples were unshelled, and the kernels were subjected to pulverization with a food processor equipped with metal cutting blade and stainless steel container (Izzy, Greece), followed by particle size separation to obtain ground samples (pistachio kernel flours) of 500-800 µm. Prior to further analysis, the samples were stored in re-closable plastic bags in the dark and refrigerated (−20 • C) conditions as a way of avoiding degradation and improving oxidation stability.

Moisture Content Measurement
According to the AOAC Official Method 925.40, approximately 2 g from each pistachio kernel flour was placed in a ceramic cup and dried in an oven (102 • C) until constant weight. The moisture content was calculated based on the weight difference before and after drying.

DRIFTS Sample Analysis
A randomly selected quantity from pistachio kernel flour was placed on a flat surface where infrared radiation was directed to obtain DRIFT spectrum. For each sample, three spectra were collected. Each spectrum of the triplet came from a different part of the same sample (sub-sample). A Thermo Nicolet 6700 FTIR spectrophotometer (Thermo Electron Corporation, Madison, WI, USA) was used, equipped with a deuterated triglycine sulfate (DTGS) detector using a DRIFT Spectra Tech microcup accessory (Spectra-Tech Inc., Stamford, CT, USA) (3 mm diameter, 2 mm height). The spectra were collected as absorbance from 100 scans through the wavenumber region of 400-4000 cm −1 . The speed of the interferometer moving mirror was 0.3165 mm/s. A background spectrum was collected using FTIR grade KBr as a non-absorbing matrix powder (Sigma-Aldrich, Steinhein, Germany), prior to spectrum recording of each sample. After each measurement, microcup was cleaned with acetone and dried.

KBr Disk Sample Analysis
A KBr disk sample preparation involved thorough mixing of 200 mg dried KBr powder and 5 mg ground pistachio kernel flour. The mixture was transferred into a 13 mm die and formed a clear disk when pressed under high pressure using a 2-to 8-ton bench top hydraulic press. The spectra were recorded with the above spectrophotometer using an accessory appropriate for disk measurements. Measurements' conditions (scans, wavenumber region, and mirror speed) were previously described. Background measurements were made against pure KBr disk which had no absorptions over the entire absorbance range. Triplicate KBr/sub-samples disks were made for each sample, and the spectrum of each one was recorded.

FTIR Data Processing
The FTIR data were processed using the Omnic 8.2.0.387 software (Thermo Fisher Scientific Inc., Madison, WI, USA). Both the DRIFT and disk spectra were smoothed using the Savitsky-Golay algorithm (5-point moving second-degree polynomial), and the baseline was corrected using the 'automatic baseline correct function' (second-degree polynomial, 20 iterations). Then, the average spectra of each sample were calculated and converted into Kubelka-Munk units. The Kubelka-Munk conversion compensated for some of the following undesirable effects: low intensity bands were increased relative to intense bands, and strong intensity bands had broader, rounder peak shapes. Finally, spectra absorbance scale was normalized.

Discriminant Analysis
Classification of the year of harvest was based on DRIFT and disk spectra of pistachio kernel flours. The 650-4000 cm −1 spectral region was used. Before the development of the discriminant analysis, the homogeneity of the covariance matrices was ensured since the ratio of the two groups (2017, n = 11 and 2018, n = 11) was equal to or less than 1.5 [19]. Two chemometric models were developed using the partial least square-discriminant analysis (PLS-DA) statistical technique. One was based on the DRIFTS data and another on disk spectra. Each chemometric model was examined using cross-validation and permutation tests. According to Field (2009), the classification ability of the samples is assessed with greater reliability through cross-validation [20]. The statistical analysis was performed using MetaboAnalyst 5.0 software.

Moisture Content Measurement
The pistachios were dried by the pistachio growers either in the sun or mechanically. The moisture level was found to be between 5 and 7% wet basis (w.b.) for all samples. Drying nuts at moisture levels below 11% w.b. is important for both safety and taste. A safe level of moisture is defined as a level that does not support the growth of fungi, and for shelled pistachios, this ranges between 2.2 and 8.2% w.b. at 21 • C [21]. Table 1 presents the most characteristic peaks of the pistachio kernel flours' DRIFT and disk spectra attributed to the respective modes of vibration. Figures 1 and 2 display the representative spectra of pistachio kernel flours using the DRIFTS and disk technique, respectively, from two harvest years. The peaks at 3006, 2928, and 2856 cm −1 and the deformation vibrations in the range 1300-1500 cm −1 generally arose from lipid content [12,22]. It was observed that between the 2017 and 2018 harvests, the correlation of these peaks' heights or areas changed. In a recent study, it was found that the oil content is associated with the year of harvest. Therefore, the differences in the respective spectral region are related to the different oil contents [23]. Accordingly, the spectrum range 1500-1700 cm −1 was defined as proteinic with peaks at 1660 and 1550 cm −1 attributed to the amide I and amide II vibration modes, respectively. The region of amides I and II is associated with the secondary structure of proteins and the protein content. The protein content depends on the harvest period, the temperature, and the rain rate [24][25][26], a fact which is confirmed in the present study. The spectral region from 1300 to 900 cm −1 was connected to oligo and polysaccharides' ring vibrations, while absorption bands between 900 and 600 cm −1 were generally caused by aromatic ring vibrations [12,22]. Table 1. Peak interpretation of pistachio kernel flours' (Pistacia vera, variety 'Aegina') diffuse reflectance infrared Fourier transform (DRIFT) and disk spectra [12,22].

Wavenumbers (cm
aromatic ring deformation  From the large and complex FTIR datasets, important features of the samples and relevant information for the creation of models could only be extracted with chemometric analysis [9].

Multivariate Statistical Analysis
Using either DRIFTS or the disk technique, due to the impossibility of the optical comparability between the collected spectra from year to year, it was necessary to conduct multivariate statistical analysis to investigate whether samples differed in a statistically significant manner and could be classified according to the year of harvest.
Preprocessing steps of FTIR data followed to diminish the effect that unrelated factors had on the intensity of absorbance peaks [12]. A first filtering was based on the interquartile range (IQR), which represented the 25th percentile and the 75th percentile of data distribution. Thus, 25% of the lowest and 25% of the highest data were not included in the model construction. Thereafter, as the number of independent variables (FTIR wavenumbers) ranged between 500 and 1000, 25% of data that were unlikely to be of use when modeling the data were filtered [27]. The remaining 75% of the source data were normalized to bring them all to the same baseline, so as to be able to compare data of different  From the large and complex FTIR datasets, important features of the samples and relevant information for the creation of models could only be extracted with chemometric analysis [9].

Multivariate Statistical Analysis
Using either DRIFTS or the disk technique, due to the impossibility of the optical comparability between the collected spectra from year to year, it was necessary to conduct multivariate statistical analysis to investigate whether samples differed in a statistically significant manner and could be classified according to the year of harvest.
Preprocessing steps of FTIR data followed to diminish the effect that unrelated factors had on the intensity of absorbance peaks [12]. A first filtering was based on the interquartile range (IQR), which represented the 25th percentile and the 75th percentile of data distribution. Thus, 25% of the lowest and 25% of the highest data were not included in the model construction. Thereafter, as the number of independent variables (FTIR wavenumbers) ranged between 500 and 1000, 25% of data that were unlikely to be of use when modeling the data were filtered [27]. The remaining 75% of the source data were normalized to bring them all to the same baseline, so as to be able to compare data of different From the large and complex FTIR datasets, important features of the samples and relevant information for the creation of models could only be extracted with chemometric analysis [9].

Multivariate Statistical Analysis
Using either DRIFTS or the disk technique, due to the impossibility of the optical comparability between the collected spectra from year to year, it was necessary to conduct multivariate statistical analysis to investigate whether samples differed in a statistically significant manner and could be classified according to the year of harvest.
Preprocessing steps of FTIR data followed to diminish the effect that unrelated factors had on the intensity of absorbance peaks [12]. A first filtering was based on the interquartile range (IQR), which represented the 25th percentile and the 75th percentile of data distribution. Thus, 25% of the lowest and 25% of the highest data were not included in the model construction. Thereafter, as the number of independent variables (FTIR wavenumbers) ranged between 500 and 1000, 25% of data that were unlikely to be of use when modeling the data were filtered [27]. The remaining 75% of the source data were normalized to bring them all to the same baseline, so as to be able to compare data of different scales. Additionally, normalization inconsistencies between data were smoothed out, improving the effectiveness and the performance of the algorithms. Hence, the cube root transformation was applied. Subsequently, data Pareto scaling was useful and necessary. The data were mean centered and divided by the square root of the standard deviation of each independent variable.
After data manipulation, PLS modeling was developed to transform the raw data into a new set of data by extracting a set of latent variables (latent factors or principal components) that had the optimal spectral information and thus the best predictive power. The number of principal components was decided automatically when the predicted residual error sum of squares (PRESS) values reached a minimum or levelled off. Adding more factors resulted in over-fitted calibration models. The resulting variable importance in the projection (VIP) scores of the developed PLS models indicated which wavenumbers were more active for the discrimination, in other words mainly changing as intensity changed between the consecutive years [12]. In accordance, t-tests proved to which independent variables the statistically significant difference was mainly due.
PLS approaches were tested by using the 'leave one out' cross-validation method to show how well the developed PLS models performed by quantifying each calibration sample as if it were a validation one. In this way, a recalculation of the model was made. This method is indicated in cases with a number of samples less than 100. The performance of the final PLS models was compared in terms of the R 2 parameter known as the 'goodness of fit' or explained variation and the Q 2 parameter termed as 'goodness of prediction' or predicted variation. The R 2 and Q 2 values, which indicate the model fit and predictability, respectively, range between 0 and 1. The R 2 index is a measure of the explanatory power of the main components of the model. A PLS-DA model with a high value of R 2 is regarded as providing a good fit to the data and proves that the selected number of main components is sufficient for the explanatory power of the model. The Q 2 index is calculated as a measure of the correct validation of the model by expressing the cumulative contribution of the selected principal components in the predictive quality of the model. A Q 2 value from 0.5 to 0.9 indicates good predictability, stability, and reliability of the model, while one greater than 0.9 is considered to indicate excellent predictability, stability, and reliability. A large discrepancy between Q 2 and R 2 values indicates a non-objective model, dependent on the presence of the specific dataset that created it [28,29]. Furthermore, model performance was tested by conducting permutation tests. Permutation tests assumed that there was no difference among the two groups that were formed based on the year of harvest, so the labels of the samples were randomly permuted, and a new classification model was calculated [30,31].

DRIFTS Discriminant Analysis
Applying the PLS-DA on the entire DRIFT spectrum of samples indicated that pistachio kernel flours could not be completely separated between the two years of harvest (p-value > 0.05) ( Figure 3). As shown in Figure 4, five principal components resulted in higher values of R 2 and Q 2 , described in detail in Table 2.   To give a measure of the classification's statistical significance (p-value), permutation tests were conducted [32]. The observed p-value of Figure 5 was higher than 0.05, reinforcing the conclusion of Figure 3 that there was no statistically significant difference between DRIFT spectra from year to year.   To give a measure of the classification's statistical significance (p-value), permutation tests were conducted [32]. The observed p-value of Figure 5 was higher than 0.05, reinforcing the conclusion of Figure 3 that there was no statistically significant difference between DRIFT spectra from year to year.  To give a measure of the classification's statistical significance (p-value), permutation tests were conducted [32]. The observed p-value of Figure 5 was higher than 0.05, reinforcing the conclusion of Figure 3 that there was no statistically significant difference between DRIFT spectra from year to year.

Disks Discriminant Analysis
PLS-DA analysis of the total wavenumber range proved a clear sample distinction according to the year of harvest ( Figure 6). Five principal components were the optimal number for classification, accounting for as much of the variation explained as possible, while high R 2 and Q 2 values were already given by the three principal components ( Figure  7 and Table 3).

Disks Discriminant Analysis
PLS-DA analysis of the total wavenumber range proved a clear sample distinction according to the year of harvest ( Figure 6). Five principal components were the optimal number for classification, accounting for as much of the variation explained as possible, while high R 2 and Q 2 values were already given by the three principal components (Figure 7 and Table 3).

Disks Discriminant Analysis
PLS-DA analysis of the total wavenumber range proved a clear sample distinction according to the year of harvest ( Figure 6). Five principal components were the optimal number for classification, accounting for as much of the variation explained as possible, while high R 2 and Q 2 values were already given by the three principal components ( Figure  7 and Table 3).    The statistical significance of the obtained PLS-DA model was evaluated with permutation testing (Figure 8). A separation between the two groups significant from a statistical point of view (p-value < 0.05) was evident when the indicator with the red mark was completely to the right [33]. PLS-DA ranked the independent variables using the VIP scores at p = 0.05 (Figure 9). For evaluation of year-to-year variability, data from the two consecutive harvests were  The statistical significance of the obtained PLS-DA model was evaluated with permutation testing (Figure 8). A separation between the two groups significant from a statistical point of view (p-value < 0.05) was evident when the indicator with the red mark was completely to the right [33].  The statistical significance of the obtained PLS-DA model was evaluated with permutation testing (Figure 8). A separation between the two groups significant from a statistical point of view (p-value < 0.05) was evident when the indicator with the red mark was completely to the right [33]. PLS-DA ranked the independent variables using the VIP scores at p = 0.05 (Figure 9). For evaluation of year-to-year variability, data from the two consecutive harvests were PLS-DA ranked the independent variables using the VIP scores at p = 0.05 (Figure 9). For evaluation of year-to-year variability, data from the two consecutive harvests were compared using a t-test (Figure 10), which showed that almost the entire spectrum played an important role in the identification of differences between the two classes. Statistically significant wavenumbers to the discrimination are shown with a pink color. The absorbance range of 1720-1800 cm −1 , with the peak at 1750 cm −1 included, had the most contributory VIP variables in the discrimination having the highest VIP scores. The colored boxes on the right of Figure 9 indicate the relative intensity of the corresponding wavenumber in each group under study.
AppliedChem 2021, 1, FOR PEER REVIEW 10 compared using a t-test (Figure 10), which showed that almost the entire spectrum played an important role in the identification of differences between the two classes. Statistically significant wavenumbers to the discrimination are shown with a pink color. The absorbance range of 1720-1800 cm −1 , with the peak at 1750 cm −1 included, had the most contributory VIP variables in the discrimination having the highest VIP scores. The colored boxes on the right of Figure 9 indicate the relative intensity of the corresponding wavenumber in each group under study.

Discussion
Food quality is an issue of increasingly high concern to society and to all stakeholders involved in food production. In this sense, one of the products in which quality labels are mostly useful are pistachios. Food quality is inextricably linked to the chemical profile of food, which can be affected by several factors including the year of harvest [18]. A recent study related to pistachio oils identified statistically significant differences in terms of quality and nutritional value between crops of two consecutive years of harvest. This year- compared using a t-test (Figure 10), which showed that almost the entire spectrum played an important role in the identification of differences between the two classes. Statistically significant wavenumbers to the discrimination are shown with a pink color. The absorbance range of 1720-1800 cm −1 , with the peak at 1750 cm −1 included, had the most contributory VIP variables in the discrimination having the highest VIP scores. The colored boxes on the right of Figure 9 indicate the relative intensity of the corresponding wavenumber in each group under study.

Discussion
Food quality is an issue of increasingly high concern to society and to all stakeholders involved in food production. In this sense, one of the products in which quality labels are mostly useful are pistachios. Food quality is inextricably linked to the chemical profile of food, which can be affected by several factors including the year of harvest [18]. A recent study related to pistachio oils identified statistically significant differences in terms of quality and nutritional value between crops of two consecutive years of harvest. This year-

Discussion
Food quality is an issue of increasingly high concern to society and to all stakeholders involved in food production. In this sense, one of the products in which quality labels are mostly useful are pistachios. Food quality is inextricably linked to the chemical profile of food, which can be affected by several factors including the year of harvest [18]. A recent study related to pistachio oils identified statistically significant differences in terms of quality and nutritional value between crops of two consecutive years of harvest. This year-to-year differentiation of pistachio oils was possibly attributed to environmental factors [23].
In light of these results, a further investigation was carried out, not for pistachio oils but for pistachio samples of the Greek 'Aegina' variety collected from four different Greek regions. It was difficult to obtain standard results for pistachios cultivated in different locations and with different agricultural practices [18]. To overcome this problem, each sample of the 2017 harvest year (11 samples in total) was derived from the same farmer and the same field with each corresponding sample of the 2018 harvest season (11 samples in total). FTIR spectroscopy is quick, environmentally friendly, and not complex and can be applied in routine analysis and official pistachio quality control [9,10]. FTIR spectra were obtained for each sample in triplicate using DRIFTS and the disks technique. The performance assessment of PLS-DA models obtained with two different types of FTIR analysis (DRIFTS and disks) was achieved with coefficients R 2 and Q 2 as a statistical measure of the model fitting. The disk acquired spectra in combination with the use of chemometric tools such as PLS-DA showed a clear tendency of discrimination (p-value < 0.05) between pistachios from the two different years with R 2 = 0.99705 and Q 2 = 0.97719. From the results of this model, it could be stated that the 1720-1800 cm −1 region had the highest contribution to this classification. However, the constructed PLS-DA model using the DRIFT spectra could not predict the year of harvest from the set of independent variables (p-value > 0.05) with R 2 = 0.96266 and Q 2 = 0.63152.
The DRIFTS quantitative spectral analysis is difficult, as the intensity of the light scattered is strongly dependent on the refractive index, particle size, density, and homogeneity of the sample [34]. In order to obtain a high-quality DRIFT spectrum, the sample must be well homogenized, its particle size must be small and uniform so that a spectrum with narrower bandwidths and more accurate zones of intensity is received, and the placement of the sample on a flat surface for measurement must not be excessively compact to maximize IR beam penetration [35].
On the other hand, in the disk technique, the sample was diluted with KBr, which enhances the contribution of the scattered light and minimizes the specular reflection. The specular reflection causes changes in band intensity, shape, and, in some cases, band inversions (Restrahlen bands). The KBr/sample mixture minimizes the negative effects of specular reflection. Generally, KBr is very hygroscopic, and if it is not well dried, it may result in bands at 3440 (OH stretch), 1630 (OH bend), and 560 cm −1 (OH wag) that affect the ability to interpret these spectral regions. During disk construction, excessive pressure along with the presence of water can change the hydration state and crystallinity [11]. In the current study, the complete drying of KBr was achieved, which made it possible to cope with this issue.

Conclusions
In conclusion, the disk-FTIR is a highly sensitive spectroscopic technique that allowed the collection of spectra from pulverized pistachio samples with minimal sample preparation. Results showed a complete discrimination between pistachios from the two different years of harvest. The proposed disk-FTIR spectroscopic chemometric model is fast, accurate, economical, and environmentally friendly.