1H NMR and Multivariate Analysis for Geographic Characterization of Commercial Extra Virgin Olive Oil: A Possible Correlation with Climate Data

1H Nuclear Magnetic Resonance (NMR) spectroscopy coupled with multivariate analysis has been applied in order to investigate metabolomic profiles of more than 200 extravirgin olive oils (EVOOs) collected in a period of over four years (2009–2012) from different geographic areas. In particular, commercially blended EVOO samples originating from different Italian regions (Tuscany, Sicily and Apulia), as well as European (Spain and Portugal) and non-European (Tunisia, Turkey, Chile and Australia) countries. Multivariate statistical analysis (Principal Component Analisys (PCA) and Orthogonal Partial Least Squares Discriminant Analysis (OPLS-DA)) applied on the NMR data revealed the existence of marked differences between Italian (in particular from Tuscany, Sicily and Apulia regions) and foreign (in particular Tunisian) EVOO samples. A possible correlation with available climate data has been also investigated. These results aim to develop a powerful NMR-based tool able to protect Italian olive oil productions.


Introduction
Olive oil is an ancient food widely spread throughout the Roman Empire. Olive trees are now mainly diffused around the Mediterranean Sea and Italy-located in the center of this region-is one of the most important olive producers. The extra virgin olive oil (EVOO) produced in various Italian regions has an excellent quality and is greatly appreciated worldwide for its attributes, which depend, among other factors, on the geographical origin. The importance of the geographical origin for EVOO has been documented by the European Union since 1992, when the Protected Designation of Origin (PDO) and Protected Geographical Indication (PGI) arrangements were created to protect and support foodstuffs of particular quality [1]. Bianchi et al. (1993) and Angerosa et al. (1999) used Isotopic Ratios Spectrometry (IRMS) to find olive oil components and to characterize the geographical origin [2,3]. The traceability and authenticity of olive oils have been widely investigated by Nuclear Seville-Spain, Lisbon-Portugal, Sydney-Australia, Santiago-Chile, Tunis-Tunisia, Istanbul-Turkey). Average precipitation (monthly cumulative rainfall) and temperature were calculated for each studied area over a four-year period (2009-2012). Table 1. Origins (from Italian regions and/or country), blend type and number of olive oil samples are reported in columns. Average of cumulative rainfall (mm) and temperature ( • C) are reported for each country and calculated over a four-year period (2009)(2010)(2011)(2012)

Nuclear Magnetic Resonance Spectroscopy
For each NMR sample preparation, 20 mg of olive oil was exactly weighed, dissolved in a volume of 0.9 mL of deuterated chloroform (CDCl 3 ), and transferred directly to a 5 mm NMR tube. All the 1 H NMR spectra were recorded on a 499.84 MHz spectrometer, operating at 11.7 T (Varian NMR UNITY INOVA Narrow Bore, workstation UNIX-based Sun Microsystems, Varian NMR Instruments, Palo Alto, CA, USA). Experiments (pulse program s2pul) were run at 298.15 K, using a 12 ms pulse 56 db (90 • flip angle), an acquisition time of 5.82 s (64 k data points) a spectral width of 5500 Hz (11 ppm) and 16 transients. Prior to Fourier transformation, the free induction decays (FIDs) were zero-filled to 128 k and a −0.15 Hz line-broadening factor was applied.

Multivariate Data Processing
The data were Fourier-transformed, and phase and baseline corrected with ACD/NMR software (Advanced Chemistry Development, ACD/Spectrus software, version 2016.1.1, Toronto, ON, Canada). Chemical shifts were expressed in δ values relative to CHCl 3 (δ 7.27 ppm) as internal reference. Spectra were segmented with a variable size intelligent bucketing width of 0.04 ppm and 50% looseness factor. The interval containing the signals of the solvent (in the range 7.60-6.90 ppm) was removed, and the sum of the remaining integrals (buckets) normalized for each spectrum. A total of 221 variables for each 1 H NMR spectrum was obtained and considered for statistical analysis. Since the NMR spectra were dominated by the resonances of functional groups of all the fatty acids, each bucket row represents the entire NMR spectrum, and all the molecules present in the sample. The data table generated by all aligned bucket row reduced spectra was used for multivariate data analysis. The Pareto scaling method, which is performed by dividing the mean-centered data by the square root of the standard deviation, was then applied to the variables. Multivariate statistical analysis and graphics were obtained using SIMCA-P (version 14, Sartorius Stedim Biotech, Umea, Sweden). For multivariate statistical analyses of bucket reduced NMR spectra, different statistical procedures (principal component analysis (PCA) and orthogonal partial least squares discriminant analysis (OPLS-DA)) were used. PCA is used as a preliminary step in multivariate analysis of data. It works by reducing the dimensionality of data and reveals the presence of correlations among the samples. The principal components (PC1, PC2, . . . , PCn) are linear combinations of the original variables (in this case the NMR data) accounting for most of the variation in the data set. Hence, when a significant correlation occurs the number of useful PCs is much less than the number of original variables [23][24][25]. While PLS-DA is one of the most recent supervised MVA techniques used to discriminate samples with different characteristics according to known classification classes (such as cultivars and/or geographical origin) [22,26], we preferred OPLS-DA in our studies. As shown in several studies of metabolomics, OPLS-DA is a modification of the usual PLS-DA method that filters out variation that is not directly related to the focused discriminating response, by separating the portion of the variance useful for predictive purposes from the non-predictive variance (which is made orthogonal). The result is a model with improved interpretability. Furthermore, OPLS-DA condenses the predictive information into one component, facilitating the interpretation of spectral data. The R2(cum) and Q2(cum) are the two parameters used to describe the goodness of the model at the minimum number of components cumulatively required (cum) to optimally give account of the data variability. The R2 explains the total variations in the data, giving a quantitative measure of the goodness of fit. The goodness of prediction was estimated by Q2(cum), according to cross validation (sevenfold cross-validation) [26][27][28].

Chemicals
All chemical reagents for analysis were of analytical grade. Deuterated solvent (CDCl 3 99.8 atom%D) was purchased from was purchased from Sigma-Aldrich S.r.l. (Milan, Italy).

NMR Spectroscopy and MVA
Multivariate statistical analysis was applied to all of the NMR data, focusing on the possible differences existing between Italian (from Tuscany, Sicily and Apulia regions) and foreign (Spain, Portugal, Tunisia, Turkey, Chile and Australia) EVOO samples. The fatty acid composition, together with squalene and β-sitosterol, was calculated by methods presented previously in the literature [29], on the basis of 1 H NMR data. One-way analysis of variance (ANOVA) was performed to assess whether the means were significantly different among groups (see Tables S1 and S2). The obtained data indicated that all the investigated oils had fatty acid composition within the expected range for, in particular with regard to the polyunsaturated fatty acids (linolenic and linoleic fatty acids) and oleic acid. A first level of investigation was performed using the unsupervised exploratory statistical technique (PCA), without considering the climatic data, to look for trends among samples and/or possible outliers, which were excluded from further analyses, and to obtain a general overview of EVOOs ( Figure 1). Of the original 221 variables per spectrum, six PCs were enough to describe 92.5% of the variance of the entire NMR dataset, giving R2X(cum) = 0.925 and Q2(cum) = 0.822 (with PC1, PC2 and PC3 describing 57.9%, 20.8% and 5.6% of the variance, respectively). The first principal component, PC1, gave a clear separation of Tunisian samples from the remaining classes, while all the other oils appeared to overlap considerably in the PC1/PC2 scoreplot ( Figure 1A). Nevertheless, a certain degree of separation was also observed for Chilean oils, in particular on the third component (PC3), while European oils (Spanish, Portuguese, Italian) overlapped considerably in the scoreplot. By examining the loadings of the original variables it was possible to define the molecular components responsible for the observed trend ( Figure 1B). In particular, Tunisian oils were characterized by a high relative content of polyunsaturated fatty acids, such as linolenic acid (1.38 ppm methylene protons of the unsaturated acyl groups, 2.78 ppm diallilyc groups, 5.40 ppm linolenic olefinic protons), while a high relative content of monounsatured fatty acids (1.32 ppm acyl group of oleic acid) was associated with all other oils. PCA analysis was further used considering only the EVOOs with a higher sample size, in order to increase the statistical significance for each class. Three oil groups were therefore excluded (with 4, 14, and 4 samples from Australia, Portugal, and Turkey, respectively) and the PCA was repeated using the remaining four clusters (with 61, 58, 34, and 62 samples from Italy, Tunisia, Chile, Spain, respectively). Analyzing the resulting PCA model (Figure 2, 93.9% of the variance with six PCs, R2X(cum) = 0.939 and Q2(cum) = 0.859), it could be observed again that the Tunisian samples were clearly separated on the first principal component PC1, while Chilean oils appeared clearly distinct from Italian and Spanish EVOOs, in particular when the third component (PC3) was considered. As described above, Tunisian oils were characterized by a high relative content of linolenic acid, while a high relative content of monounsatured fatty acids (1.32 ppm signal corresponding to the acyl group of oleic acid) was associated with the other oils. The most scattered clusters, Italian and Spanish oils, remained considerably overlapped in this PCA model, suggesting that further considerations are needed. PCA analysis was further used considering only the EVOOs with a higher sample size, in order to increase the statistical significance for each class. Three oil groups were therefore excluded (with 4, 14, and 4 samples from Australia, Portugal, and Turkey, respectively) and the PCA was repeated using the remaining four clusters (with 61, 58, 34, and 62 samples from Italy, Tunisia, Chile, Spain, respectively). Analyzing the resulting PCA model (Figure 2, 93.9% of the variance with six PCs, R2X(cum) = 0.939 and Q2(cum) = 0.859), it could be observed again that the Tunisian samples were clearly separated on the first principal component PC1, while Chilean oils appeared clearly distinct from Italian and Spanish EVOOs, in particular when the third component (PC3) was considered. As described above, Tunisian oils were characterized by a high relative content of linolenic acid, while a high relative content of monounsatured fatty acids (1.32 ppm signal corresponding to the acyl group of oleic acid) was associated with the other oils. The most scattered clusters, Italian and Spanish oils, remained considerably overlapped in this PCA model, suggesting that further considerations are needed. PCA analysis was further used considering only the EVOOs with a higher sample size, in order to increase the statistical significance for each class. Three oil groups were therefore excluded (with 4, 14, and 4 samples from Australia, Portugal, and Turkey, respectively) and the PCA was repeated using the remaining four clusters (with 61, 58, 34, and 62 samples from Italy, Tunisia, Chile, Spain, respectively). Analyzing the resulting PCA model (Figure 2, 93.9% of the variance with six PCs, R2X(cum) = 0.939 and Q2(cum) = 0.859), it could be observed again that the Tunisian samples were clearly separated on the first principal component PC1, while Chilean oils appeared clearly distinct from Italian and Spanish EVOOs, in particular when the third component (PC3) was considered. As described above, Tunisian oils were characterized by a high relative content of linolenic acid, while a high relative content of monounsatured fatty acids (1.32 ppm signal corresponding to the acyl group of oleic acid) was associated with the other oils. The most scattered clusters, Italian and Spanish oils, remained considerably overlapped in this PCA model, suggesting that further considerations are needed. In the first place, the unsupervised PCA and supervised OPLS-DA analyses were applied in order to deeply analyze the differences existing between Italian and Tunisian EVOO samples. Indeed, due to the recent introduction of Tunisian product in the EU olive oil market, and the serious impact on especially Italian production [30], differentiation of Tunisian EVOO appears to be a key issue [31]. In particular, taking into account the very different pedoclimatic conditions of the three Italian regions studied (Tuscany, Sicily and Apulia), sub-groups of samples from these regions were considered separately. Analyzing the resulting PCA scoreplots of Tunisian vs. Tuscan and Tunisian vs. Sicilian samples ( Figure 3A,B respectively), a very good separation between the clusters was interestingly found even in the unsupervised analysis. Moreover, also in the case of Apulian vs. Tunisian oils, despite the low number of Apulian samples considered in this work, a good separation between the two clusters was observed, as already reported in other studies [32]. The samples were then analyzed by OPLS-DA in order to accurately analyze the differences observed in the PCA analysis and to investigate the goodness of fit (R2X) and prediction (Q2) for the models. In both cases (Tunisian vs. Tuscan and Tunisian vs. Sicilian samples), good OPLS-DA models were obtained, in which one predictive and two orthogonal components (1 + 2) gave R2X = 0.86, R2Y = 0.91 and Q2 = 0.89 and R2X = 0.85, R2Y = 0.92 and Q2 = 0.905, respectively. By considering the Q2 predictivity parameter for the OPLS-DA models of Figure 4A,B, it should be noted that both the OPLS-DA models showed a very high prediction ability (Q2 = 0.89 and Q2 = 0.905, respectively). Again, Tunisian oils were characterized by a high relative content of polyunsaturated fatty acids, such as linolenic acid (1.38 ppm, methylene protons of the unsaturated acyl groups, 2.78 and 5.40 ppm, linolenic diallilyc and olefinic protons, respectively), while a high relative content of oleic acid (5.34, 2.01, 1.32 ppm) was associated to both Tuscan and Sicilian oils. In the first place, the unsupervised PCA and supervised OPLS-DA analyses were applied in order to deeply analyze the differences existing between Italian and Tunisian EVOO samples. Indeed, due to the recent introduction of Tunisian product in the EU olive oil market, and the serious impact on especially Italian production [30], differentiation of Tunisian EVOO appears to be a key issue [31]. In particular, taking into account the very different pedoclimatic conditions of the three Italian regions studied (Tuscany, Sicily and Apulia), sub-groups of samples from these regions were considered separately. Analyzing the resulting PCA scoreplots of Tunisian vs. Tuscan and Tunisian vs. Sicilian samples ( Figure 3A,B respectively), a very good separation between the clusters was interestingly found even in the unsupervised analysis. Moreover, also in the case of Apulian vs. Tunisian oils, despite the low number of Apulian samples considered in this work, a good separation between the two clusters was observed, as already reported in other studies [32] (data not shown). The samples were then analyzed by OPLS-DA in order to accurately analyze the differences observed in the PCA analysis and to investigate the goodness of fit (R2X) and prediction (Q2) for the models. In both cases (Tunisian vs. Tuscan and Tunisian vs. Sicilian samples), good OPLS-DA models were obtained, in which one predictive and two orthogonal components (1 + 2) gave R2X = 0.86, R2Y = 0.91 and Q2 = 0.89 and R2X = 0.85, R2Y = 0.92 and Q2 = 0.905, respectively. By considering the Q2 predictivity parameter for the OPLS-DA models of Figure 4A,B, it should be noted that both the OPLS-DA models showed a very high prediction ability (Q2 = 0.89 and Q2 = 0.905, respectively). Again, Tunisian oils were characterized by a high relative content of polyunsaturated fatty acids, such as linolenic acid (1.38 ppm, methylene protons of the unsaturated acyl groups, 2.78 and 5.40 ppm, linolenic diallilyc and olefinic protons, respectively), while a high relative content of oleic acid (5.34, 2.01, 1.32 ppm) was associated to both Tuscan and Sicilian oils. Finally, analyzing the resulting OPLS-DA models between Italian (Sicilian, Tuscan) and Spanish EVOO samples, a good separation was obtained for Tuscan (in particular from Arezzo province, Tuscany region) vs. Spanish ( Figure 5A) and for Sicilian vs. Spanish oils ( Figure 5B). Again, also in the case of the Apulian (limited samples) vs. Spanish oils, a reasonable separation between the two clusters was observed, which is in agreement with results already reported in other studies [32].
Further consideration deserves to be given to the comparison of the OPLS-DA discriminating models and the average cumulative rainfall (mm) temperature ( • C) data for the considered classes. A careful analysis of Table 1 data and the quality model descriptors for the OPLS-DA discriminations does not seem to give an indication of a clear correlation between both average rainfall and temperature differences and model discrimination performance (predictivity). Higher differences in average rainfall (Tuscan vs. Tunisian and Tuscan vs. Spanish oils) are generally associated with a more constant discriminating ability of the studied OPLS-DA models. This trend is also observed when considering average temperature differences. On the other hand, average country or regional temperature, although calculated for a wide time span (between the year 2009 and 2012), may not correctly account for the specific pedoclimatic conditions associated with the examined samples. Therefore conclusive correlations could not be obtained by using these simple climate descriptors, which were observed and calculated on a country and/or regional basis and chosen for the ease of their availability. Further studies possibly based on a detailed climate data detection and analysis are required in order to obtain sound correlation with specific EVOOs metabolic profiles. Finally, analyzing the resulting OPLS-DA models between Italian (Sicilian, Tuscan) and Spanish EVOO samples, a good separation was obtained for Tuscan (in particular from Arezzo province, Tuscany region) vs. Spanish ( Figure 5A) and for Sicilian vs. Spanish oils ( Figure 5B). Again, also in the case of the Apulian (limited samples) vs. Spanish oils, a reasonable separation between the two clusters was observed, which is in agreement with results already reported in other studies [32]. Further consideration deserves to be given to the comparison of the OPLS-DA discriminating models and the average cumulative rainfall (mm) temperature (°C) data for the considered classes. A careful analysis of Table 1 data and the quality model descriptors for the OPLS-DA discriminations does not seem to give an indication of a clear correlation between both average rainfall and temperature differences and model discrimination performance (predictivity). Higher differences in average rainfall (Tuscan vs. Tunisian and Tuscan vs. Spanish oils) are generally associated with a Finally, analyzing the resulting OPLS-DA models between Italian (Sicilian, Tuscan) and Spanish EVOO samples, a good separation was obtained for Tuscan (in particular from Arezzo province, Tuscany region) vs. Spanish ( Figure 5A) and for Sicilian vs. Spanish oils ( Figure 5B). Again, also in the case of the Apulian (limited samples) vs. Spanish oils, a reasonable separation between the two clusters was observed, which is in agreement with results already reported in other studies [32]. Further consideration deserves to be given to the comparison of the OPLS-DA discriminating models and the average cumulative rainfall (mm) temperature (°C) data for the considered classes. A careful analysis of Table 1 data and the quality model descriptors for the OPLS-DA discriminations does not seem to give an indication of a clear correlation between both average rainfall and temperature differences and model discrimination performance (predictivity). Higher differences in average rainfall (Tuscan vs. Tunisian and Tuscan vs. Spanish oils) are generally associated with a more constant discriminating ability of the studied OPLS-DA models. This trend is also observed when considering average temperature differences. On the other hand, average country or regional temperature, although calculated for a wide time span (between the year 2009 and 2012), may not correctly account for the specific pedoclimatic conditions associated with the examined samples. Therefore conclusive correlations could not be obtained by using these simple climate descriptors,

Conclusions
NMR spectroscopy, combined with multivariate analysis techniques, successfully allowed characterization of metabolomic profiles of EVOOs, and their linkage with main cultivar composition and/or country of origin. Moreover, the fine composition of olive oil, and therefore its sensory characteristics, besides being strongly dependent on the nature of the cultivar used for its production, is also influenced by several other factors, like climate (temperature, relative humidity of summer months, yearly rainfall) as well as agricultural practices and technological factors (crop year, oil extraction system and storage time). In this work, the 1 H NMR-based MVA approach was used on EVOOs collected over a four-year period (2009-2012), considering the potential influence of the historical meteorological parameter averages. Marked differences existing between Italian and foreign EVOO samples were observed, in particular when the three studied Italian regions (Tuscany, Sicily and Apulia), were considered separately. Higher differences in average rainfall and temperature generally resulted associated with a more constant discriminating ability of the studied OPLS-DA models. However, conclusive correlations could not be obtained by using the simple climate descriptors here considered, and further studies are required in order to obtain sound correlation of detailed climate parameters with specific EVOO metabolic profiles. It should be noted that the simplified approach used which takes into account indicative average weather data (rather than specific climate data variation across each country), was chosen for the ease of average climate data availability for each country. Nevertheless, these results suggest the possible use of NMR-based metabolic profiling for olive oils geographical origin prediction and assessment of the possible correlation with climate data. In fact, among the EVOOs studied here, the Tunisian case appears to be an interesting key issue, due to the recent introduction of Tunisian olive oil into the EU market. In this respect, it should be pointed out that the present oil characterization study is only aimed at clearly indicating possible metabolic profile differences among oils, rather than quality ranking.