Quantitative Structure-Property Study on Pyrazines with Bell Pepper Flavor

A quantitative structure-property (QSPR) study on pyrazines with bell pepper aroma is performed by means of different statistical methods, which correlate appropriate molecular descriptors with the biological activity. The different methods lead to consistent results, indicating which of the molecular properties of the compounds under consideration are significant for bell pepper flavor. These results are compared with other models.


INTRODUCTION
Ch. Th. K I e i n et al.: The relationship between the molecular structure of flavor compounds and the intensity and the quality of their aroma impression has received more and more interest in the past years. The conformational change, induced by binding the odor or aroma molecule to olfactory receptors, activates the adenylate cyclase cascade, leading to the opening of an unspecific cation channel by CAMP, and thus releasing an action potential [2]. However, it has been shown that only some of the odor molecules simulate the adenylate cyclase cascade. In the mean time, in different species, inositol-l,4,5-triphosphate (IP3) was found to be a second messenger in the olfactory signal transduction [3]. IP3 is supposed to open a specific ca2'channel by binding to this membrane protein [3].
On the other hand, the gap between the knowledge of the primary structure and the threedimensional geometry of olfactory receptors is large: while the sequence of some receptors is already known, no detailed structural elucidation exists for the moment. This is a strong motivation to study the odor molecule-receptor interaction by molecular modeling approaches.
In the present study, structure-flavor relationships on pyrazine-based flavor molecules with bell pepper aroma are analyzed by means of three different methods: multiple linear regression (MLR), cluster analysis and comparative molecular field analysis (CoMFA) [4].
Pyrazine-based aroma compounds show a broad spectrum of flavor impressions, reaching fiom earthy, nutty, roasted to bell pepper or woody. Their general structure is presented in Figure 1. Pyrazines were first identified in heated food as bread [5], different meats [6], backed potatoes [7], or coffee [8], where they are formed during the Maillard reaction from reducing sugars and amino acids [9], but they also occur in fresh vegetables like tomatoes, asparagus, beans, spinach [lo], or in bell pepper [1 11.
From the analysis of the obtained regression and CoMFA [4] models, conclusions on steric and electronic requirements, responsible for the bell pepper flavor are deduced. with the semiempirical AM1 method [13] implemented in the MOPAC program package [ 141.

MATERIALS AND METHODS
For the obtained structures the following molecular properties are calculated, using the TSAR (Tools for Structure-Activity Relationships) software [15]: (i) steric descriptors: molecular volume (V) and surface(S), molecular refractivity (MR), and the Verloop parameters (L, BI, B2, Bh B4) [16] for the four possible substituents, R1 to Rq.
( Figure 1). As R1 always the substituent with the heteroatom is considered, except for the three compounds where the substituents do not contain any heteroatom: compound 3 (R~=rnethyl), compound 59 (R~=ethyl) and compound 8 1 (R~=ethyl). The Verloop parameter L(~) represents the maximal length of substituent i along the axis defined by the bond which connects the substituent with the heterocycle. B:) u=1, ..., 4) denotes the width of a substituent i perpendicular to this axis, and is chosen in such a way that BI < B2 < B3 < Bq; the molecular refractivity, being related to the volume and to the polarizability of a compound, is not only a steric descriptor, but also gives information whether dispersion forces are important in the interaction with the receptor or not.
(ii) descriptor of lipophilicity: logP where P is the partition coefficient of the respective compound between octanol and water; the larger P (and thus logP) is, the more hydrophobic is a compound.  Within cluster analysis a distance matrix is calculated from the molecular properties, which is then used to class@ samples into clusters of similar members. Cluster analysis is performed with the TSAR [15] program, using Ward clustering [18] with Euclidean distances.
CoMFA analysis [4] is performed with the SYBYL software [19]. The molecules are superimposed by fitting the atoms of the heterocycle and the first atom of substituent 1 (R1).
Grid sizes of 1, 2 and 3 A and different probe atoms [sp3C(+1), sp30(-1) and H(+l)] are employed for the evaluation of the molecular field. For the calculation of the electrostatic field the same AM1 charges as in MLR are used. The SAMPLS [20] variant of PLS is applied, with the cross-validation option of leaving out one compound in turn. The quality of the models is estimated by the same statistical indicators as in MLR.

Multiple Linear Regression
The best regression model found by two-way stepping reads as: As can be seen from Table 1, the variables from Eqn. (1) show no significant correlation among each other (2 < 0.40 indicates that no correlation exists among the variables). i.e. the significance, at 95 % level, of the individual regression coefficients is also given.
The predictive power of the model is high, since $, , is high and fairly close to / (0.893). The values in brackets denote the standard errors of the coefficients [which for Eqn. (1) are given in Table 21. As in the previous case, the used variables do not correlate with each other.
The overall regression and the individual regression coefficients again are statistically significant, as judged by the Fand the t-values. The predictions of the biological activity with the MLR equations is shown in Table 3.
The high cross-validation 2 (2cv) values suggest that the remarkable statistical qualities of the models should not stem from chance correlation. Nevertheless, in order to exclude chance correlation, the effects of randomization on the dependent variables are analyzed: the 32 dependent variables are redistributed by a random number generator, and subsequently models are generated as previously by F-stepping variable selection. Table 4 shows that randomization causes, in all cases, the loss of correlation and statistical significance. 4 < 0.40 (i.e. r < 0.63) indicates that no signifcant correlation exists among the independent variables. This is the case in five out of the ten situations. However, the other five cases have values only slightly above  Table 3. Actual and predicted biological effects. CoMFA prediction stems from the best model Another test which confirms the good statistical qualities of the regression models (1) and (2) is the relatively high stability of the r?,, to different sizes of leave out groups, shown in Table 5 Actual   The results of CoMFA are summarized in Table 6. The models obtained with the three different probes [sp3c(+1), sp30(-1) and H(+l)] and different grid spacing have comparable qualities, as reflected by the statistical parameters. The best model (no. 3, Table 6) has the highest predictive 2 (3cv) and the lowest standard error of prediction (SEP). Favorable and non favorable steric and electrostatic components of the molecular field are shown in Figure 4.  In Figure 4 light grey indicates unfavourable, dark grey favourable steric regions, i.e. bulky substituents in the light grey zone will diminish, in the dark grey region will increase the biological activity (bell pepper flavor). A corresponding picture for electrostatic interactions indicates the region@) where a stronger negative field (light grey), or a stronger positive field (dark grey) increases the biological activity.    (Fig. 4), while Eqn. (2) is in better concordance with CoMFA concerning the electrostatical situation (Fig. 4).
Three of the four steric regions, which appear to be important according to CoMFA (Fig. 4 Only the unfavorable contribution of bulky groups in the region of substituent % as predicted by CoMFA, is not reproduced by the MLR equations. In a similar fashion, Eqns. (1) and (2) suggest that increased negative charges on atoms C3 and C6 are of advantage for the bell pepper flavor, because the values of dC3) and C'C6) are negative for all compounds and have negative regression coefficients. This situation is more or less in agreement with the CoMFA picture (Fig. 4). However, the favorable negative electrostatic field in this regions seems to result fi-om both, the ipso and the substituents atoms.
The favorable effect of a positive electrostatic field in the region of substituent R1 resulting fi-om CoMFA, is also in agreement with Eqn. (2): since the values of 6"" can be positive and negative; the more positive 6"" will be, the larger its contribution to "bell pepper flavor" will be. However, one has to keep in mind that the MLR models consider only the first atom of the substituent, while CoMFA takes into account the whole substituent. The differences of the two methods, 2D-QSAR and CoMFA, stem obviously from the differences in the approaches used. a bulky, rather long shaped [correlation with L~' ] substituent R2 is favorable for bell pepper flavor; this suggests the existence of a binding pocket for this substituent.
an increased electrostatic field in the regions of atoms C3 and C6 (and the substituents R2 and %, respectively) is advantageous for bell pepper aroma impression. the substituent RI should not be too bulky; larger substituents than the methoxy group appear to be unfavorable for the biological activity. Rz for bell pepper aroma, suggesting that larger substituents (up to 6-9 C-atoms) favor the aroma impression. This is in agreement with both approaches (MLR and CoMFA ). [23] propose that besides the hydrophobic interaction stemming fiom the alkyl group R2, hydrogen bonding between the nitrogen atoms of the pyrazine nucleus and the heteroatom as donors on one hand, and acceptors from the receptor-pocket on the other hand, should be important for bell pepper flavor. Although we have no direct evidence for hydrogen bonding in o w models, a favorable negative electrostatic field in the vicinity of the pyrazine nitrogen and the heteroatom of R1 is in agreement with the hydrogen bonding hypothesis. A more general model (including pyrazines, pyridines and thiazoles) has been proposed by Rognon and Chastrette 1241 . It is presented in Figure 5 and will be briefly discussed. The bulky group at C2, with a volume between 34 A3 and 85 A3, is R2 in our notation.

Masuda and Mihara
However, R2 is supposed to consist of two substructures, G I and G2, with G' lying in the N-CZC3-plane. G~ is preferentially a branched allcyl group. The sp2-nitrogen is assumed to form a hydrogen bond with a donor from the receptor. The substituent at C3, -x-G3, is R1 in our notation. It is supposed to be smaller (volume between 13 A3 and 34 A3 ) than R2. Dimensions and positions of G', G~ and G3 are postulated to be relevant parameters for bell pepper flavor.
The steric requirements of the model are, more or less, in agreement with our results (bulky R2, less bulky RI). However, in our models no substructures, G' and G2, in R2 are identified, since they do not exist in the pyrazine derivatives used.

CONCLUSIONS
The MLR models developed on the basis of 16 pyrazines with bell pepper and 16 pyrazines with no bell pepper flavor, have high predictive power, as reflected by the cross-validation 4 (9,"). The dependent variables are uncorrelated, and thus permit conclusion on the parameters important for bell pepper flavor. The results from MLR models are in good agreement with CoMFA and identlfy steric and electronic requirements for bell pepper aroma impression of pyrazine molecules. Moreover, the results are in good agreement with other models.