Prediction of Antimicrobial and Antioxidant Activities of Mexican Propolis by 1H-NMR Spectroscopy and Chemometrics Data Analysis

A feasibility study to predict antimicrobial and antioxidant activity properties of propolis extracts using 700-MHz 1H-NMR spectra and multivariate regression data analysis is presented. The study was conducted with thirty-five propolis samples to develop a rapid and reliable method for the evaluation of their quality. The extracts have been evaluated by measuring phenolic and flavonoid contents; the antioxidant activity; and the antimicrobial activity. The obtained spectral data were submitted to multivariate calibration (partial least squares (PLS) and orthogonal partial least squares (OPLS)) to correlate the relative intensity and position of NMR resonance peaks with the metabolites contents and biological activities. The developed PLS and OPLS model were successfully applied to the determination of the target properties for proof of the concept. The OPLS observed vs. predicted properties plots indicate the absence of systematic errors with determination coefficients between the ranges 0.7207 to 0.9990. Up to 86.1% of explication of variation in the spectral data and 99.9% in the measured properties were attained with 88.6% of prediction capabilities in the best case (S. mutans activity) according to the cross-validation procedure. The figures of merit of the developed PLS and OPLS methods were evaluated and compared as well.


Introduction
Propolis (bee glue), is a sticky dark-colored hive product collected by bees from living plant sources [1,2]. It possesses pharmacological activities such as antibacterial, antifungal, antioxidant, antitumoral, anti-inflammatory properties and is used extensively as an ingredient of candies, honeys, biopharmaceuticals, cosmetics and in beverages in various parts of the world where it is claimed to improve human health and to prevent diseases such as diabetes and cancer [3,4]. Recently, propolis has been proposed as chemical preservative in ground meat and as a germicide and insecticide for food packaging [4].
More than 300 compounds have been identified in different propolis samples [5]. This complex mixture contains a variety of chemical compounds such as flavonoid aglycones, phenolic acids and their esters, phenolic aldehydes, alcohols, ketones, sesquiterpenes, coumarins, steroids, amino acids and inorganic compounds [4,[6][7][8]. The results have revealed that the propolis composition varies with geography and is strongly related with the flora surrounding the hive [1,4].
The main constituents of propolis in North America are flavonoids and phenolic acid esters [9]. Limited research has been conducted on the chemical composition and pharmacological properties of Mexican propolis. A study conducted by Velazquez et al., [10], investigated the antibacterial and free-radical scavenging (FRS) activities of propolis collected from three different areas of Sonora (Mexico). Navarro-Navarro et al., [11] reported the anti-Vibrio activity of propolis collected from three different regions of Sonora. Valencia et al., [3], studied the seasonal effect on the chemical composition and biological activities (antiproliferative and antioxidant activities) of Sonoran propolis.
The biological effects of propolis can be associated with its antioxidant activity, and in the last few decades new analytical techniques have been proposed to determine its antioxidant activity [12,13]. They are based, for example, on the determination of total phenolic and flavonoid contents or the antioxidant activity/capacity assays: 1,1-diphenyl-2-picrylhydrazyl (DPPH), ferric reducing/antioxidant power (FRAP), and generation of the (2,2 -azinobis-(3-ethylbenzothiazoline-6-sulfonic acid] (ABTS)) radical cation [14]. It is known that "quantitative evaluation of antioxidant capacity" needs more than one single assay method. A range of analytical methods have also been used to profile propolis, including chromatography techniques, linked to spectroscopic detection, resulting in various modern hyphenated techniques, e.g., GC-MS and HPLC-MS [15].
As already mentioned, propolis consists of a wide range of organic compounds of varying polarity and the only technique that can simultaneously examine waxes, terpenoids and phenolics is Nuclear Magnetic Resonance (NMR) spectroscopy [16]. One of the main advantages of this technique is that structural and quantitative information can be obtained for a wide range of chemical species in a single NMR experiment. NMR is frequently applied to samples that can be directly examined as liquids, but very simple extraction or sample preparation procedures may also be used [17,18].
Since the NMR pattern of natural products in propolis is extremely complex, the use of chemometric methods to analyze such complex spectral data sets is mandatory [19]. In the case of propolis, NMR with chemometric techniques have been proposed to identify and classify different propolis sources or geographic origins [18,20,21]. However, to the best of our knowledge, no study concerning the prediction of antioxidating and antibacterial properties of propolis based on multivariate calibration has been reported up to now.
In the present paper, the application of 1 H-NMR coupled with multivariate statistical analysis, based on partial least squares, is employed to quantitatively predict the antibacterial and antioxidant activities of propolis extracts. The net analyte signal concept is used to determine the figures of merit of the developed methods. The study was conducted with 35 propolis samples obtained from different Mexican apiaries and four samples from out of the country (one from Ecuador and three from China) to develop a rapid and reliable method to evaluate the quality of them.

Extraction, Antioxidant and Antibacterial Activities
In this work, the ethanolic extracts of thirty-five samples of propolis obtained from different Mexican apiaries and four samples out of the country (one from Ecuador and three from China) were studied. The total phenolic and flavonoid contents were estimated using standard chemical assay procedures (Folin-Ciocalteu and AlCl 3 methods). Several biological activities were evaluated including antioxidant capacity using the free radical scavenging DPPH assay and antimicrobial properties using Streptococcus mutans, Streptococcus oralis and Streptococcus sanguinis as test models. The results of the bioassays of the ethanolic extracts of propolis (EEP) samples are reported in Table 1. The total phenolic and flavonoid contents and antioxidant activity are in agreement with the literature for poplar propolis [3,6,10].

1 H-NMR
The 1 H-NMR spectra of the EEP were recorded and, as an example, two selected spectra are shown in Figure 1. While spectrum Figure 1a belongs to an active extract, the Figure 1b one corresponds to an inactive one. In spectrum Figure 1a flavonoid compound signals are observed. Antioxidant and antimicrobial activities are well documented for this type of natural products [22,23]. The singlets around δ 12.0 ppm could be attributed to intramolecular hydrogen bond forming -OH groups frequently present in the A-ring of flavonoids. The aromatic protons of these phenolic constituents are observed between δ 6.0 ppm and 8.0 ppm. The signals between δ 6.0 ppm and 5.0 ppm could correspond to the vinylic protons of the C-ring of flavones present in the extract. The protons of the ABX system of the C-ring of a flavanone are expected between δ 5.0 ppm and 2.5 ppm. The singlet nearby δ 4.0 ppm could be attributed to a methyl moiety of an aromatic methoxy group frequently observed in flavonoids. On the other hand, spectrum Figure 1b is dominated by signals in the δ 2.0 ppm-0.5 ppm region, which could be originated from protons belonging to waxes or linear fatty acids whose contribution to antimicrobial or antioxidant activities may be considered less relevant.

1 H-NMR
The 1 H-NMR spectra of the EEP were recorded and, as an example, two selected spectra are shown in Figure 1. While spectrum Figure 1a belongs to an active extract, the Figure 1b one corresponds to an inactive one. In spectrum Figure 1a flavonoid compound signals are observed. Antioxidant and antimicrobial activities are well documented for this type of natural products [22,23]. The singlets around δ 12.0 ppm could be attributed to intramolecular hydrogen bond forming -OH groups frequently present in the A-ring of flavonoids. The aromatic protons of these phenolic constituents are observed between δ 6.0 ppm and 8.0 ppm. The signals between δ 6.0 ppm and 5.0 ppm could correspond to the vinylic protons of the C-ring of flavones present in the extract. The protons of the ABX system of the C-ring of a flavanone are expected between δ 5.0 ppm and 2.5 ppm. The singlet nearby δ 4.0 ppm could be attributed to a methyl moiety of an aromatic methoxy group frequently observed in flavonoids. On the other hand, spectrum Figure 1b is dominated by signals in the δ 2.0 ppm-0.5 ppm region, which could be originated from protons belonging to waxes or linear fatty acids whose contribution to antimicrobial or antioxidant activities may be considered less relevant.

Multivariate Analysis
The obtained spectral data were submitted to multivariate analysis; first, to study the variations among the sample spectra, and second, to correlate the relative intensity and position of NMR resonance peaks to antioxidant activity determined by DPPH, the total phenolic and flavonoid contents, and the antimicrobial activity.
Principal component analysis (PCA) is a technique used to emphasize variation and bring out strong patterns in a dataset. It's often used to make data easy to explore and visualize. By examining the underlying structure of the variables, a new coordinate system is defined. The original variables are linear combined in new ones, named principal components, and in such form the dimensionality, i.e., complexity of the data space is reduced. The PCA analysis of the 1 H-NMR spectra of propolis showed that with six components 73.3% of spectral variation was explained (R2X(cum)). This value is a measure of the amount of information contain within the model to explain the dispersion observed when comparing the different sample spectra. The percent of variation that can be predicted by the model according to a leave-one-out cross-validation procedure reached 48.7% (Q2X(cum)). Cross-validation is used to estimate how accurately a predictive model will perform in practice and it is employed as an estimator of the prediction behavior in the absence of an independent set of samples for validation. A quick view of the sample distribution according to spectral similarities in the plot of scores t2 vs. t1 (Figure 2), where the scores are the values of the new variables, indicated a natural tendency of the samples of the same apiary or apiaries to lie in

Multivariate Analysis
The obtained spectral data were submitted to multivariate analysis; first, to study the variations among the sample spectra, and second, to correlate the relative intensity and position of NMR resonance peaks to antioxidant activity determined by DPPH, the total phenolic and flavonoid contents, and the antimicrobial activity.
Principal component analysis (PCA) is a technique used to emphasize variation and bring out strong patterns in a dataset. It's often used to make data easy to explore and visualize. By examining the underlying structure of the variables, a new coordinate system is defined. The original variables are linear combined in new ones, named principal components, and in such form the dimensionality, i.e., complexity of the data space is reduced. The PCA analysis of the 1 H-NMR spectra of propolis showed that with six components 73.3% of spectral variation was explained (R2X(cum)). This value is a measure of the amount of information contain within the model to explain the dispersion observed when comparing the different sample spectra. The percent of variation that can be predicted by the model according to a leave-one-out cross-validation procedure reached 48.7% (Q2X(cum)). Cross-validation is used to estimate how accurately a predictive model will perform in practice and it is employed as an estimator of the prediction behavior in the absence of an independent set of samples for validation. A quick view of the sample distribution according to spectral similarities in the plot of scores t2 vs. t1 (Figure 2), where the scores are the values of the new variables, indicated a natural tendency of the samples of the same apiary or apiaries to lie in close proximity, but no grouping among the samples according to their different origins is in fact observed in the plot. This was confirmed by the tolerance ellipse that defines a 95% confidence interval for a Hotelling T 2 test, indicating that all samples can be considered as representative of the same population. It was also observed that although some samples lay very close to the limits of the ellipse, no outliers were really present in the data. On the basis of the analysis of the loadings plots (supplementary information), the differences among samples are mainly of quantitative rather than of qualitative nature, as the chemical shifts in their NMR spectra cannot be assigned to any particular discriminant unique features. In addition, the analysis showed that some samples had distance to the model (DModX) values just slightly above the critical value; however, it was decided to include them in further treatments. DModX is the distance of an observation in the data set to the X model plane or hyperplane, which is proportional to the residual standard deviation (RSD) of the X observation. Interestingly such values corresponded to samples outside Mexico City (Puebla) and even the country (China).
The spectra were also treated with PLS regression analysis. PLS is a method for relating two data matrices, X (the 1 H-NMR spectra) and Y (the properties, e.g., phenol content), by a linear multivariate model, but goes beyond traditional regression in that it models the structure of X and Y by PCA analysis as well. The regression is then performed with the analogous of the principal components, named latent variables, of the X and Y matrices.
In a first step the complete spectral range was employed (0.5 ppm-13.5 ppm). However, from the analysis of the regression coefficients, an improvement in regression parameters was observed when the range was restricted to 0.5 ppm-8.2 ppm, and further processing was done using this interval. close proximity, but no grouping among the samples according to their different origins is in fact observed in the plot. This was confirmed by the tolerance ellipse that defines a 95% confidence interval for a Hotelling T 2 test, indicating that all samples can be considered as representative of the same population. It was also observed that although some samples lay very close to the limits of the ellipse, no outliers were really present in the data. On the basis of the analysis of the loadings plots (supplementary information), the differences among samples are mainly of quantitative rather than of qualitative nature, as the chemical shifts in their NMR spectra cannot be assigned to any particular discriminant unique features. In addition, the analysis showed that some samples had distance to the model (DModX) values just slightly above the critical value; however, it was decided to include them in further treatments. DModX is the distance of an observation in the data set to the X model plane or hyperplane, which is proportional to the residual standard deviation (RSD) of the X observation. Interestingly such values corresponded to samples outside Mexico City (Puebla) and even the country (China). The spectra were also treated with PLS regression analysis. PLS is a method for relating two data matrices, X (the 1 H-NMR spectra) and Y (the properties, e.g., phenol content), by a linear multivariate model, but goes beyond traditional regression in that it models the structure of X and Y by PCA analysis as well. The regression is then performed with the analogous of the principal components, named latent variables, of the X and Y matrices.
In a first step the complete spectral range was employed (0.5 ppm-13.5 ppm). However, from the analysis of the regression coefficients, an improvement in regression parameters was observed when the range was restricted to 0.5 ppm-8.2 ppm, and further processing was done using this interval. In Table 2, the values of R2X (cum), R2Y (cum), and Q2X (cum) for the different evaluated properties are indicated. R2Y (cum) has the same meaning that R2X (cum) but instead of analyzing spectrum data, it considers the data contained in Y matrix (responses). Values for the determination coefficient (R 2 ), the Root Mean Square Error of Estimation (RMSEE) and the Root Mean Standard Error of Cross Validation (RMSECV), as well as for the number of latent variables used in the models are in addition included. RMSEE and RMSECV are descriptive statistic parameters that allow the accuracy of the model to be quantitatively measured. The numbers of significant latent variables were selected according to the cross-validation rules included in SIMCA for such purposes: (i) Q 2 > In Table 2, the values of R2X (cum), R2Y (cum), and Q2X (cum) for the different evaluated properties are indicated. R2Y (cum) has the same meaning that R2X (cum) but instead of analyzing spectrum data, it considers the data contained in Y matrix (responses). Values for the determination coefficient (R 2 ), the Root Mean Square Error of Estimation (RMSEE) and the Root Mean Standard Error of Cross Validation (RMSECV), as well as for the number of latent variables used in the models are in addition included. RMSEE and RMSECV are descriptive statistic parameters that allow the accuracy of the model to be quantitatively measured. The numbers of significant latent variables were selected according to the cross-validation rules included in SIMCA for such purposes: (i) Q 2 > limit, where limit = 0 for PLS models with more than 100 observations. Limit = 0.05 for PLS models with 100 observations or less, and limit = 0.01 for OPLS; (ii) Q 2 V > limit for at least 20% of the y-variables when M ≥ 25 or sqrt(M) when M < 25, where M = number of y-variables and Q 2 V is Q 2 for individual variables. Overall, good performance is achieved for all properties and no systematic variations are detected based on the slope and intercept values of the regression equations between the defined and predicted values.
With the aim of improving the prediction error for the data by eliminating orthogonal variation in X, the OPLS method was tested. This orthogonal variation is due to sources of variation which are not correlated with the measured properties, i.e., it is the non-predictive part of the variation in the X matrix. As observed in the same Table, in general, better performance characteristics are obtained, i.e., a reduction in RMSEE and RMSECV, an increase in R2X (cum), R2Y (cum), Q2 (cum) and R 2 , without deterioration in the regression equations, as most of the points fall close to the 45 degree line, with no systematic errors present. Values of R 2 ranging from 0.7207 to 0.9990 were observed for the regression lines indicating strong relationships between the defined and predicted values of total phenol and flavonoid content, DPPH radical scavenging activity, and in vitro antibacterial activity against Streptococcus mutans, Streptococcus oralis and Streptococcus sanguinis. At this point it is also important to mention that the residual plots of the data for both the PLS and OPLS analyses showed no systematic trends and a satisfactory fit to normal probability plots, thus confirming the correct application of the models. To better understand the differences between PLS and OPLS methods to model and predict the response values, some characteristic examples of the inner relationship plot of the analysis of the models described in Table 2 are shown in Figures 3 and 4. These plots represent the correlation between the scores of the predictors (u data) and response (t data). A perfect match between the X-and the Y-data is observed when all data points are located on the diagonal line with slope equal to one. Conversely, when there is a weak correlation structure between X and Y, there is a considerable spread of the points around such line. The plot is also useful to identify curved (non-linear) relationships between the predictors and the responses and to identify outliers in X-and Y-data, and in the relationship between X and Y. As observed in Figure 3, PLS models give moderate correlations between spectra and properties, denoted by both medium values of the determination coefficient (r 2 values ranging from 0.3867 to 0.4617) and significant spread of the samples along the reference line. Some samples inside Mexico City (CDMX) and outside the city (Oaxaca (OAX), Puebla (PUE) and Tlaxcala (TLAX)) and even the country (China) look like outliers in the relationship between the X-and Y-blocks. In contrast, OPLS modeling produces very strong correlation results (Figure 4) as high reduction in the spread of the samples along the reference line is observed with a considerable increase in the values of the determination coefficients (r 2 values ranging from 0.8298 to 0.999). This time, the outlier samples observed in PLS modeling practically disappear at all, suggesting that OPLS modeling reduces a particular source of variability in the NMR chemical shifts associated with such samples. Further analysis of the regression coefficients of the PLS and OPLS models will be latter performed to identify the chemical shifts responsible for differences in PLS and OPLS modeling. Similar results were observed for the properties not shown in Figures 3 and 4. such samples. Further analysis of the regression coefficients of the PLS and OPLS models will be latter performed to identify the chemical shifts responsible for differences in PLS and OPLS modeling. Similar results were observed for the properties not shown in Figures 3 and 4.  In Figures 5 and 6 the observed vs. predicted values plots of the different properties using OPLS modeling are shown. It is obvious that the samples are not homogeneously distributed, as most of the observations are clustered and others grouped outside the main array. This is especially true concerning the antibacterial activity, in which it is clearly noted that the inclusion of samples outside Mexico City (CDMX), especially Puebla (PUE), Oaxaca (OAX) and Tlaxcala (TLAX), allows a more suitable prediction due to the extend range that such samples confers for modeling. This fact is reflected in the RMSEE and RMSECV values which are lower for phenol and flavonoids contents and DPPH activity than for MIC assays. The plots also shown that although phenol and flavonoid contents as well as DPPH activity are almost equally spaced between samples, the MIC activities are not. This trend clearly indicates that although the compounds that produce the antioxidant properties are presents in an extended range of concentrations in the samples discernable by the measuring method by a continuous variable, not all of them have antimicrobial activities. In addition, the observed grouping in the MIC activities is a logical consequence of the nature of the MIC analysis (two-fold serial dilutions) which produces a discrete variable as results and the similarities between samples concerning this parameter. The low antibacterial activity of certain samples, especially those from Puebla (PUE), Oaxaca (OAX) and Tlaxcala (TLAX), is clearly related to their low phenol and flavonoid contents, as expected for the antioxidant capacity of such compounds. The inclusion of new samples with a diversity of origins and further characterization of such samples. Further analysis of the regression coefficients of the PLS and OPLS models will be latter performed to identify the chemical shifts responsible for differences in PLS and OPLS modeling. Similar results were observed for the properties not shown in Figures 3 and 4.  In Figures 5 and 6 the observed vs. predicted values plots of the different properties using OPLS modeling are shown. It is obvious that the samples are not homogeneously distributed, as most of the observations are clustered and others grouped outside the main array. This is especially true concerning the antibacterial activity, in which it is clearly noted that the inclusion of samples outside Mexico City (CDMX), especially Puebla (PUE), Oaxaca (OAX) and Tlaxcala (TLAX), allows a more suitable prediction due to the extend range that such samples confers for modeling. This fact is reflected in the RMSEE and RMSECV values which are lower for phenol and flavonoids contents and DPPH activity than for MIC assays. The plots also shown that although phenol and flavonoid contents as well as DPPH activity are almost equally spaced between samples, the MIC activities are not. This trend clearly indicates that although the compounds that produce the antioxidant properties are presents in an extended range of concentrations in the samples discernable by the measuring method by a continuous variable, not all of them have antimicrobial activities. In addition, the observed grouping in the MIC activities is a logical consequence of the nature of the MIC analysis (two-fold serial dilutions) which produces a discrete variable as results and the similarities between samples concerning this parameter. The low antibacterial activity of certain samples, especially those from Puebla (PUE), Oaxaca (OAX) and Tlaxcala (TLAX), is clearly related to their low phenol and flavonoid contents, as expected for the antioxidant capacity of such compounds. The inclusion of new samples with a diversity of origins and further characterization of In Figures 5 and 6 the observed vs. predicted values plots of the different properties using OPLS modeling are shown. It is obvious that the samples are not homogeneously distributed, as most of the observations are clustered and others grouped outside the main array. This is especially true concerning the antibacterial activity, in which it is clearly noted that the inclusion of samples outside Mexico City (CDMX), especially Puebla (PUE), Oaxaca (OAX) and Tlaxcala (TLAX), allows a more suitable prediction due to the extend range that such samples confers for modeling. This fact is reflected in the RMSEE and RMSECV values which are lower for phenol and flavonoids contents and DPPH activity than for MIC assays. The plots also shown that although phenol and flavonoid contents as well as DPPH activity are almost equally spaced between samples, the MIC activities are not. This trend clearly indicates that although the compounds that produce the antioxidant properties are presents in an extended range of concentrations in the samples discernable by the measuring method by a continuous variable, not all of them have antimicrobial activities. In addition, the observed grouping in the MIC activities is a logical consequence of the nature of the MIC analysis (two-fold serial dilutions) which produces a discrete variable as results and the similarities between samples concerning this parameter. The low antibacterial activity of certain samples, especially those from Puebla (PUE), Oaxaca (OAX) and Tlaxcala (TLAX), is clearly related to their low phenol and flavonoid contents, as expected for the antioxidant capacity of such compounds. The inclusion of new samples with a diversity of origins and further characterization of the propolis samples will be a recommendable form to extend the model prediction capabilities.  In Table 3 the figures of merit of the PLS and OPLS methods are reported. As observed, both methods perform similarly. Clearly the orthogonal signal correction of OPLS algorithm filters uncorrelated variability in the sample spectra, thus increasing the selectivities up to its maximum values of 1.00, thus allowing better prediction capabilities of the model as measured by Q2X (cum). By comparing PLS and OPLS selectivity results, this uncorrelated variability has an average value of 17%. A comparison of the sum of squares of the regression coefficients for all properties for the PLS and OPLS models (Figure 7) reveals that both models give high importance to predict the target properties to the 0.5 ppm-6.0 ppm region; however the OPLS technique give more relevance to the 1.7 ppm-2.2 ppm and 5 ppm-5.8 ppm regions of the 1 H-NMR spectra, which according to the discussion above, such chemical shifts were mainly attributed to protons belonging to waxes or linear fatty acids and to the vinylic protons of the C-ring of flavones present in the extract, respectively, which content seems to be determinant in the values of the of total phenol and flavonoid content, DPPH radical scavenging activity, and in vitro antibacterial activity against Streptococcus mutans, Streptococcus oralis and Streptococcus sanguinis.
Further improvement in the developed methods may be performed for the implementation of potential quality control protocols and more accurate predictions by the inclusion of new samples with a diversity of origins, the determination of flavanones and dihydroflavonols with specific methods and the addition of IC50 values of the samples as a target property. Specifically, as the method which involves the measurement at 410 nm-430 nm after addition of AlCl3 solution is selective only for flavonols (quercetin, morin, kaempferol and rutin) and flavones luteolin, complementing the data with a measurement procedure at 510 nm in the presence of NaNO2 in alkaline medium, may be a feasible form to evaluate rutin, luteolin and catechins, although it should be considered that phenolic acids exhibit considerable absorbance at this wavelength. With this new information, an improved interpretation of the relationship between polyphenols/flavonoids  In Table 3 the figures of merit of the PLS and OPLS methods are reported. As observed, both methods perform similarly. Clearly the orthogonal signal correction of OPLS algorithm filters uncorrelated variability in the sample spectra, thus increasing the selectivities up to its maximum values of 1.00, thus allowing better prediction capabilities of the model as measured by Q2X (cum). By comparing PLS and OPLS selectivity results, this uncorrelated variability has an average value of 17%. A comparison of the sum of squares of the regression coefficients for all properties for the PLS and OPLS models (Figure 7) reveals that both models give high importance to predict the target properties to the 0.5 ppm-6.0 ppm region; however the OPLS technique give more relevance to the 1.7 ppm-2.2 ppm and 5 ppm-5.8 ppm regions of the 1 H-NMR spectra, which according to the discussion above, such chemical shifts were mainly attributed to protons belonging to waxes or linear fatty acids and to the vinylic protons of the C-ring of flavones present in the extract, respectively, which content seems to be determinant in the values of the of total phenol and flavonoid content, DPPH radical scavenging activity, and in vitro antibacterial activity against Streptococcus mutans, Streptococcus oralis and Streptococcus sanguinis.
Further improvement in the developed methods may be performed for the implementation of potential quality control protocols and more accurate predictions by the inclusion of new samples with a diversity of origins, the determination of flavanones and dihydroflavonols with specific methods and the addition of IC50 values of the samples as a target property. Specifically, as the method which involves the measurement at 410 nm-430 nm after addition of AlCl3 solution is selective only for flavonols (quercetin, morin, kaempferol and rutin) and flavones luteolin, complementing the data with a measurement procedure at 510 nm in the presence of NaNO2 in alkaline medium, may be a feasible form to evaluate rutin, luteolin and catechins, although it should be considered that phenolic acids exhibit considerable absorbance at this wavelength. With this new information, an improved interpretation of the relationship between polyphenols/flavonoids In Table 3 the figures of merit of the PLS and OPLS methods are reported. As observed, both methods perform similarly. Clearly the orthogonal signal correction of OPLS algorithm filters uncorrelated variability in the sample spectra, thus increasing the selectivities up to its maximum values of 1.00, thus allowing better prediction capabilities of the model as measured by Q2X (cum). By comparing PLS and OPLS selectivity results, this uncorrelated variability has an average value of 17%. A comparison of the sum of squares of the regression coefficients for all properties for the PLS and OPLS models (Figure 7) reveals that both models give high importance to predict the target properties to the 0.5 ppm-6.0 ppm region; however the OPLS technique give more relevance to the 1.7 ppm-2.2 ppm and 5 ppm-5.8 ppm regions of the 1 H-NMR spectra, which according to the discussion above, such chemical shifts were mainly attributed to protons belonging to waxes or linear fatty acids and to the vinylic protons of the C-ring of flavones present in the extract, respectively, which content seems to be determinant in the values of the of total phenol and flavonoid content, DPPH radical scavenging activity, and in vitro antibacterial activity against Streptococcus mutans, Streptococcus oralis and Streptococcus sanguinis. Further improvement in the developed methods may be performed for the implementation of potential quality control protocols and more accurate predictions by the inclusion of new samples with a diversity of origins, the determination of flavanones and dihydroflavonols with specific methods and the addition of IC50 values of the samples as a target property. Specifically, as the method which involves the measurement at 410 nm-430 nm after addition of AlCl 3 solution is selective only for flavonols (quercetin, morin, kaempferol and rutin) and flavones luteolin, complementing the data with a measurement procedure at 510 nm in the presence of NaNO 2 in alkaline medium, may be a feasible form to evaluate rutin, luteolin and catechins, although it should be considered that phenolic acids exhibit considerable absorbance at this wavelength. With this new information, an improved interpretation of the relationship between polyphenols/flavonoids quantification and antimicrobial activity may be anticipated. This article allows a proof of the concept for such purposes.

Samples
Thirty-nine propolis samples were provided by local beekeepers (Federico Palma Valderrama and MVZ Ángel López Ramírez). The propolis samples were collected between 2011 and 2014 (Table 1). These 39 samples were obtained by different harvesting methods, 18 by scraping, one by wooden wedges (3 mm−5 mm thick), and 16 by plastic nets (mesh size = 2 mm).

Extract Preparation
Five g of each crude propolis sample was extracted with ethanol (250 mL) at room temperature during 7 days. Each extract was taken to dryness under reduced pressure to afford the ethanolic extracts of propolis (EEP). Extracts were stored at −20 • C until analysis.

DPPH Radical Scavenging Assay
DPPH radical scavenging activity was investigated according to the method of Cheng et al. [24]. Briefly, an ethanolic solution of DPPH (0.208 mM, 0.1 mL) was mixed with extract (1 mg/mL, 0.1 mL) or Trolox (positive control, 1 mg/mL). The 96-well plate was incubated in the dark at room temperature for 20 min, and the absorbance was recorded at 540 nm. The percentage inhibition of the DPPH by each sample was calculated considering the percentage of the steady DPPH in solution after the reaction. All the determinations were performed in triplicates. The percentage scavenging effect was calculated as: where A 0 is the absorbance of the control, A 1 the absorbance in presence of the sample, A 2 the absorbance of sample without DPPH radical.

Total Phenolic Content
The total phenolic content of propolis was determined as described by Singleton and Rossi [25] and Popova et al. [26]. Briefly, propolis extract (1 mg/mL, 20 µL) and Folin-Ciolcateau reagent (80 µL) were mixed well during 5 min and 7.5% sodium carbonate solution (80 µL) was added. The plate was covered and incubated in the dark (at room temperature) during 2 h. The absorbance was measured at 760 nm with a spectrophotometric microplate reader (Benchmark 11130, Bio-Rad, Hercules, CA, USA). Distilled water was used as a blank. The obtained absorbances were interpolated in a calibration curve (y = 4.10x + 0.0324, R 2 = 0.9980) of gallic acid. The results were expressed as mg equivalents of gallic acid/g of dry extract of propolis (EEP). All the determinations were performed in triplicates. The total phenolic content was estimated using gallic acid and quercetin as standards.

Total Flavonoid Content
The concentration of flavonoids was determined using the method described by Marquele et al. [27] using aluminum chloride reagent (2% in methanol). Extract (100 µL) was mixed with aluminum chloride solution (2% in methanol, 100 µL). After incubation for 30 min at room temperature, the absorbance was read at 420 nm and concentrations of flavonoids were determined from a calibration curve obtained with quercetin. The obtained absorbances were interpolated in a calibration curve (y = 16.33x + 0.1032, R 2 = 0.9993) of quercetin. The results were expressed as mg equivalents of quercetin/g of dry extract of propolis (EEP).

Determination of Minimum Inhibitory Concentration (MIC)
The in vitro antibacterial activity of each EEP was determined using a broth microdilution test as recommended by Clinical and Laboratory Standards Institute M7-A4 for bacteria CLSI [28]. The MIC was defined as the lowest concentration of the test agent that had restricted growth to a level <0.05 at 660 nm after incubation at 37 • C for 16 h-24 h. Growth inhibitory effects of the extracts were tested against Streptococcus mutans (ATCC 10449), Streptococcus oralis (ATCC 35037) and Streptococcus sanguinis (ATCC 10556). The procedures employed were as described previously [29]. Sterile 96-well microtiter plates were used. Each well in the microtiter plate contained Streptococcus (final concentration of 5 × 10 5 colony forming units (CFU)/mL), serially diluted EEP, and the appropriate growth medium. Triplicate samples were performed for each test concentration. The controls included inoculated growth medium without test compounds. Sample blanks contained uninoculated growth medium only. All plates were incubated at 37 • C under appropriate atmospheric conditions with growth estimated spectrophotometrically (A 660 nm) after 24 h using a microtiter plate reader. The MIC value for each test organism was defined as the minimum concentration of test compound limiting turbidity to <0.05 A 660 nm. As a positive control, chlorhexidine gluconate (CHX) was used.

NMR Experiments
All 1 H-NMR spectra of propolis extract were collected at 300 K on an Avance III HD 700 MHz spectrometer (Bruker, Billerica, MA, USA) equipped with a 5-mm z-axis gradient inverse probe. The spectrum was recorded using the standard single-pulse sequence, with the 90 • pulse length of 7.76 µs. 128 scans were collected into 32 k data points using a spectral width of 14 kHz with a relaxation delay of 5 s, and acquisition time 2.3 s. The free induction decays (FIDs) were multiplied by an exponential function with a line-broadening factor of 0.3 Hz before Fourier transformation. The 1 H-NMR spectra were manually corrected for phase and baseline distortion using MestReNova software (version 10.0.2, Mestrelab Research, Santiago de Compostela, Spain). The 1 H-NMR chemical shifts were referenced to TMS signal at 0.0 ppm. 20 mg of sample was weighed out and dissolved in 0.5 mL of DMSO-d 6 containing 0.03% TMS.

Data Processing for Multivariate Analysis
Using the software MestReNova each one-dimensional 1 H-NMR spectrum was sliced into 0.02 ppm sections between 0.5 ppm and 13.5 ppm. Processed spectra were normalized to the total average sum of integrals. The resulting normalized integrals composed the data matrix that was submitted to multivariate analysis.

Multivariate Analysis
Principal component analysis (PCA), an unsupervised explorative data analysis technique, and partial least squares regression projection to latent structures (PLS), and its orthogonal form (OPLS), regression models employed to find the fundamental relations between two data matrices, were used for data analysis. The quality of the models was evaluated based on the diagnostic tools: the cumulative modeled variation in matrix X, R2X (cum), the proportion of the variance of the response variable that is explained by the model, R2Y (cum), and the predictive ability parameter, Q2 (cum).
All statistical data analyses were performed as implemented in the SIMCA 14.1.0.2047 software (MKS Umetrics, Malmö, Sweden) using unit variance (UV) scaling after optimization of this variable. For figures of merit determination an in house-made MATHLAB program was used with the outputs of the SIMCA software.

Figures of Merit
A figure of merit is a quantity used to characterize the performance of an analytical method. Well known in univariate calibration (where a single number is measured for each sample), the figures of merit can also be defined in multivariate calibration in an easy form through the Net Analyte Signal (NAS) concept [30][31][32].
The NAS concept arises from the fact that a prediction sample spectrum may have varying contributions from other sample components. Hence, the spectrum can be decomposed in two orthogonal parts: a part that can be uniquely assigned to the analyte of interest (the NAS), and the remaining part that contains the contribution from other components. Using the NAS, a multivariate calibration model can be represented in a pseudo-univariate plot. NAS is evaluated as: where x i is a sample spectrum after preprocessing and b is a column vector of the PLS regression coefficients.
Accuracy. It expresses the proximity between the reference value and that predicted by the model. It can be measured in many forms, among them the Root Mean Square Error of Estimation (RMSEE) and the Root Mean Standard Error of Cross Validation (RMSECV): where y i yŷ i are the estimated and reference values, respectively, of the i, simple and n the total number of samples. RMSECV is calculated in a similar way by leaving out all permutations of a given number of samples from the training set and computing the total RMSEE value of the procedure by adding the RMSEE value for each calibration. RMSEE measures the fit of the model while RMSECV its predictive power. Selectivity (sel). It expresses the fraction of the signal that changes when the concentration of the analyte varies in one unit. It can be evaluated through the NAS concept as: where ||s k || stands for the norm of the sensitivity coefficients of the spectra containing the analyte k at unit concentration and ||s * k for that corresponding to its NAS. Sensitivity (sen). It is a measure of the response change with analyte concentration. In multivariate context represents the NAS generated by an analyte concentration equal to unity, and is evaluated through: where ||b|| is the norm of the vector of regression coefficients of the calibration model.
Analytical sensitivity (γ). Defined by the ratio between sensitivity and instrumental noise, δx, as: γ = sen |δx| it allows a comparison between methodologies based on very different instrumental measurements, as it is independent on the measured signal. The inverse of this parameter, γ −1 , establishes a minimum concentration difference that is discernible by the analytical method considering the random experimental noise as the only source of error. Limit of detection (LD). It is defined as the minimum detectable value of the net signal (or concentration) for which the probabilities of false negatives (β) and false positives (α) are at maximum 5%. It is evaluated as: LD = 3.3δx 1 sen Limit of quantitation (LQ). It determines the net signal or analyte concentration value which can be estimated with a relative error lower than 10%. It is evaluated as: 1 sen

Conclusions
The total phenol and flavonoid contents as well as the antioxidant (DPPH) and in vitro antibacterial activities against Streptococcus mutans, Streptococcus oralis and Streptococcus sanguinis were quantitatively correlated with 1 H-NMR spectra data using PLS and OPLS calibration models. Preliminary PCA analysis was performed to characterize the samples and to identify possible outliers. Results indicated a natural tendency of the samples of the same apiary or apiaries to lie in close proximity. PLS and OPLS regression methods gave excellent calibration models, although OPLS performed better in terms or the RMSEE, RMSECV, R2X (cum), R2Y (cum), Q2 (cum) and R 2 values, as expected due to the separation of the systematic variation in the predictive and non-predictive parts. The figures of merit of the developed methods were determined as well, so that methods were characterized in terms of their limits of detection and quantitation, sensitivity, selectivity and analytical sensitivity values ( Table 3). The inclusion of new samples with a diversity of origins will be a recommendable form to improve the prediction capabilities of the developed models. The study demonstrates for the first time the possibility to develop a rapid and reliable method based on 1 H NMR for the evaluation of the quality of propolis samples of different origin in terms of the evaluation of their chemical composition and antioxidant and antibacterial properties.