Identification of (−)(E)-N-[2(S)-Hydroxy-2-(4-hydroxyphenyl) ethyl]ferulamide, a Natural Product Isolated from Croton Pullei: Theoretical and Experimental Analysis

Ferulic acid (FA) and its derivatives (FADs) are known for a variety of biological activities, such as photo-protective agent, antioxidant, antiatherogenic and antiplasmodial activities. During structural definition of a FAD isolated from Croton pullei, the possibility of a heterologous series made this definition difficult. In this regard, computational simulations were performed using theoretical calculations at DFT level to predict Infrared (IR) and Nuclear Magnetic Resonance (NMR) data. The IR and NMR 13C and 1H data were compared with the theoretical calculations performed for three structural possibilities of a heterologous series. The theoretical results were compared with the experimental data through linear regression in order to define the most probable structure and showed satisfactory values.


Introduction
Ferulic acid (FA) and its derivatives (FADs) have been isolated from in a vast number of plant species. FA is a phenolic compound found in plant cell wall components [1], especially in wheat, corn, rice, tomatoes, spinach, cabbage and asparagus [2]. It has a variety of biological activities, such as photo-protective agent [3] and antioxidant [4]. Additionally, some ferulic ester dimers are potential antiatherogenic and antiplasmodial agents [5].
Analysis of pure substances using nuclear magnetic resonance (NMR) spectroscopy is one of the steps for obtaining more information about the chemical nature of known or unknown organic compounds [6]. Due to the versatility of NMR techniques, as well as the amount of information that can be extracted from analyzing NMR spectra, such information is extremely important in natural product chemistry [6], the area where this tool is most applicable.
Complementary to NMR techniques, molecular modeling has appeared as an important set of computational tools for constructing, editing, visualizing and analyzing structural [7] of large [8,9] and small [10][11][12][13] molecular systems. Several recent papers have been published comparing data from experimental NMR with theoretical calculations performed using various computational models. Published work comparing theoretical and experimental data may be found for julocrotine [14], also isolated from Croton pullei (Euphorbiaceae) [15]. Besides julocrotine, the chemical investigation of C. pullei led to identification of the alkaloids crotonimides A and B, together with the terpenoids, lupeol, ribenone, sitosterol, kaurenoic acid, stigmasterol, among other compounds [15,16]. The chemical study of C. pullei was retaken and another substance was isolated and identified as a FAD but the heteroatom of the position 9′ was not identified by experimental NMR techniques. NMR data points to three structural possibilities with different heteroatom in position 9′: oxygen (an ester) [17], nitrogen (an amide) [18][19][20] or sulfur (a tioesther) ( Figure 1). Thus, computational methods were used as an auxiliary tool for the elucidation of the structure of isolated natural products. All three structural possibilities were submitted to theoretical calculations using Density Functional Theory (DFT) [21]. Theoretical chemical shifts were compared with experimental data using linear regression, in order to define the heteroatom.

Conformational Analysis and Geometric Data
In Table S1 (see supplementary material) were presents the results to conformational analysis carried out to the O, S and N structures. The dihedral angle analyzed was C1′-C7′-C8′-X (X = O, S or N) for the three molecular structures. These results indicate that the conformers to each case have similar energies, thus we have selected two conformational structures to each compound (O, N and S) to start DFT calculations.
The data referring to the binding angles, binding distances and dihedral angles calculated using the B3LYP/6-31G(d,p) and B3LYP/6-31+G(d,p) methods (Table S2) after geometry optimization were analyzed to find the most probable heteroatom. As shown in Table S2, the structures calculated using B3LYP/6-31G(d,p) and B3LYP/6-31+G(d,p) showed very close bond lengths and bond angles for both conformers of structures O, S and N, however there are differences on the structural parameters between the three possibilities that are relevant especially near the O, S or N heteroatoms, as expected. The dihedral angle C1′-C7′-C8′-X obtained for the three structures possibilities (O, S and N structures) showed a difference of about 110° when calculated using B3LYP/6-31G(d,p) and B3LYP/6-31+G(d,p). While the calculated dihedral angle O7′-C7′-C8′-X varied significantly with the geometry change, the calculated dihedral angles C8′-X-C9-O9 e X-C9-C8-C7 were quite close. The dihedral angles C9-C8-C7-C1, C2-C3-O3-CMe and O4-C4-C3-O3 showed that the conformers are more sensitive to the geometry changes than to the applied DFT methods. So both methods can be used to describe the geometry of these molecules.

NMR Spectra and Statistical Analysis
The data for the FA, FADs and TMS (internal standard) (shielding constants of 32.1843 to 1 H and 186.3296 to 13 C) were calculated in gas phase at the B3PW91/DGDZVP and B3LYP/6-31+G(d,p) levels. The experimental and theoretical chemical shift for the 13 C and 1 H NMR data (chemical shifts) of the six structural possibilities of the FADs are showed in Tables 1 and 2 respectively, as well as the residue (RS) in ppm for each of the carbon and hydrogen atoms present in the structures. Tables 1 and 2 show the proximity existing between the values calculated by the DFT methodologies and those obtained experimentally for the O, S and N structures (Figures 1 and S1), which confirms the effectiveness of the computational model utilized for analyzing the possible structures. Consequently, the residual values in the structural region near the possible heteroatom provide an indication of which heteroatom may be the ligand of the studied FAD. This study showed that the proposed method can be used to identify unknown derivatives by the comparison between experimental and theoretical spectra. An example was carried out for the FA, confirming the effectiveness of computational methods (see Tables 3 and 4).

Infrared Spectrum
Frequency calculations of the normal vibration modes of the proposed structures were performed with the B3LYP/6-31G(d,p) and B3LYP/6-31+G(d,p) methods and the values obtained for the main absorption bands are showed in Table 5

Polarimetry
The [α] D 25 = −16° obtained experimentally indicate that the enantiomer isolated has absolute configuration S in the position 7′. Similar result was obtained by Dellagreca et al. [18].

Collection and Extraction
Stems of C. pullei (1.00 kg) were collected in the municipality of Peixe-boi (PA, Brazil) and identified by Ricardo Secco, a botanist at the Museu Paraense Emílio Goeldi (Belém-PA, Brazil). The stems was air dried, ground and extracted by percolation with hexane (7 days) and methanol (14 days), with filtrations every 3 or 4 days. The solutions were concentrated under vacuum in a rotary evaporator, resulting in hexane extract (0.65 g) and methanolic extract (80.00 g). Part of the methanolic extract (40.00 g) was submitted to partition with dichloromethane, ethyl acetate and n-butanol. The resulting solutions were concentrated in a rotary evaporator. The dichloromethane phase (4.80 g) was fractionated by column chromatography (CC) in silica, using mixtures of hexane, ethyl acetate and methanol in gradients of increasing polarities as eluents. The column fraction eluted with the mixture of hexane-AcOEt 60% was submitted to column chromatography procedure on Sephadex LH-20 using methanol as eluent, leading to isolation of 35 mg of the FAD.
[α] D 25 = −16° (c 0.01, CH 3 OH), IR spectra were recorded in KBr in the spectrometer Nicolet IS10 FT-IR of Thermo Scientific) and 1 H and 13 C NMR data were obtained at 300 and 75.4 MHz, respectively, in CDCl 3 using the solvent peak as the internal standard.

Computational Method
The O, N and S structures were drawn using HyperChem Release 7.5 software [23] and submitted to an initial optimization at PM3 [24]. In addition, the conformational analysis was carried out to confirm the minimum energy structure to the three possible structures (O, N and S), by carrying out a series of partial optimizations constraining the concerned dihedral angle step by step within the appropriate range, with a step size of 10°, these calculations were carried out using the HF/STO-3G basis set, the dihedral angle analyzed was C1′-C7′-C8′-X (X = O, S or N) for the three molecular structures. Previous studies about the importance of conformational analysis involving NMR calculations has been published [25,26] for a large number of natural products. The molecular structures were optimized with the Gaussian ® 03W [27] program, using the hybrid functional B3LYP together with the 6-31G(d,p) and 6-31G+(d,p) basis set. Vibrational analysis was performed using the procedure contained in the Gaussian ® 03W package [27] with the DFT method using the B3LYP/6-31G(d,p) and B3LYP/6-31+G(d,p) levels, in the gas phase. This ensured that each gradient optimization located indeed a true minimum energy structure (no imaginary frequencies). The normal vibration modes were visualized using the Hyperchem 7.5 program [23]. Data for NMR ( 13 C and 1 H chemical shifts) were calculated using the DFT/B3PW91/DGDZVP and B3LYP/6-31+G(d,p) methods in vacuum. Recently, we have successfully used DFT/B3PW91/DGDZVP methodology to study 1 H and 13 C NMR spectra of cordatin [28] and 8-epicordatin [29]. Two conformers each structure (O, N and S) were submitted to DFT calculations by different methodologies (see Table S3); we chose this methods because the energies of the conformers are similar. The Spartan ′08 program [30] was utilized to calculate electrostatic potential surfaces of the O, S and N structures utilizing the DFT/B3LYP/6-31G(d,p) method.

Statistical Analysis
MINITAB14 [31] software was employed for statistical analysis of NMR linear regression data. The correlation coefficients (R 2 ), the Fischer values (F) and the standard deviation (s) were the statistical parameters chosen for this analysis. For each one of the conformer of the molecules studied (O, N and S structures) parameters are presented for linear adjustment a and b: δ calc = a + bδ exp , mean absolute error: MAE = ∑│δ calc − δ exp │/n and corrected mean absolute error: CMAE = ∑│ δ corr − δ exp │/n [32]. These parameters, calculated for experimental and theoretical data in these structures, allow the study of chemical displacements (ppm), as well as of residues: RS = │δ exp − δ calc │of the hydrogen and carbon atoms and the influence of the heteroligands involved in the region next to positions 8 and 9′.
The equations obtained were tested for their predictive power using a cross-validation procedure, which is a practical and reliable method for testing significance. This approach, known as "leave on out", consists in developing a number of models with one sample omitted at the time. After these models are obtained, the omitted data are predicted and the differences between the real and predictor values are calculated. The sum of the squares of this difference is computed, and finally, the performance of the model (its predictive capacity) can be given by PRESS (predicted sum of squares) and by S PRESS (standard deviation of the cross-validation) [ (1) where y is the experimental value,  y is the predictor value, n is the number of samples used for the construction model and k is the number of NMR parameters.
The predictive capacity [33] for the model was also quantified in terms of Q 2 , which is defined as:

Conclusions
Computational calculations performed at the DFT level for a heterologous series showed excellent results. The computational method together with the 13 C and 1 H NMR and polarimetric analysis confirmed that the ferulic acid derivative present in a the stems of C. pullei is (−)(E)-N-[2(S)-Hydroxy-2-(4-hydroxyphenyl)ethyl]ferulamide. Thus the heteroatom of position 9′ is a nitrogen atom (N). The statistical parameters revealed that the B3PWP1/DGDZVP and B3LYP/6-31+G(d,p) methodology tested for FADs and FA offer good predictive capacity and good significance. The N structure present the vest values for R 2 , F, S PRESS and Q 2 after cross-evaluation to 13 C NMR data.