Quantitative Structure Activity Relationship of Cinnamaldehyde Compounds against Wood-Decaying Fungi

Cinnamaldehyde, of the genius Cinnamomum, is a major constituent of the bark of the cinnamon tree and possesses broad-spectrum antimicrobial activity. In this study, we used best multiple linear regression (BMLR) to develop quantitative structure activity relationship (QSAR) models for cinnamaldehyde derivatives against wood-decaying fungi Trametes versicolor and Gloeophyllun trabeum. Based on the two optimal QSAR models, we then designed and synthesized two novel cinnamaldehyde compounds. The QSAR models exhibited good correlation coefficients: R2Tv = 0.910 for Trametes versicolor and R2Gt = 0.926 for Gloeophyllun trabeum. Small errors between the experimental and calculated values of two designed compounds indicated that these two QSAR models have strong predictability and stability.


Introduction
Wood, an extremely common and multi-purpose material, is susceptible to corrosion and degradation by fungal rot [1]. For practical application, wood is typically processed with preservatives to lengthen its life cycle. Traditional wood preservatives are Copper Chrome Arsenic (CCA), Copper Chrome Boron (CCB), Ammoniacal Copper Quate (ACQ), etc. Most of them consist of copper, chromium or arsenic compounds and their metal salts, which have a serious impact on human health and the environment. Consequently, most European countries have strictly limited the use of chromium and arsenic-based wood preservatives, especially in children's playground equipment and garden furniture [2,3]. Natural wood preservatives as an alternative have attracted a great deal of research [4]. Some specific woods or plants have the ability to self-protect to resist decay caused by fungi and insects, like cinnamon. The effective material in cinnamon is cinnamaldehyde, which is extracted from the bark and leaves of cinnamon trees [5,6].
Cinnamaldehyde exhibits extensive antimicrobial abilities, particularly in regards to inhibiting the growth of fungi and gram-positive bacterium [7,8]. The aforementioned antimicrobial capability is largely due to an aldehyde group conjugated with a benzene ring in cinnamaldehyde's structure [9,10]. This aldehyde group is a nucleophilic group that is easily absorbed by the hydrophilic group on the surfaces of bacteria and, once across the cell wall, begins a process of inhibition and sterilization by destroying the bacteria's polysaccharide structure. Because mammalian cells lack cell walls, cinnamaldehyde is safe for humans and their environment when used as a wood preservative [7,9].
There are negative consequences associated with the use of cinnamaldehyde as a wood preservative, however. First, its poor water solubility can cause few kinds solvents to permeate into the wood material [11]; second, it has high volatility and a strong smell, which limits its long-term application [12]. In this study, we endeavored to add to the limited research concerning cinnamaldehyde derivatives by exploring the relationship between their structure and antifungal activity against Trametes versicolor and Gloeophyllun trabeum. Then, QSAR models were established and those models provided a basic theoretical frame work for the application of cinnamaldehyde derivatives as a wood preservative. According to the QSAR models, two new cinnamaldehyde derivatives with satisfactory antifungal activity against two wood-decaying fungi were designed and tested, which could be used to validate the predictability of the QSAR models.

Determining Optimal QSAR Models against Trametes versicolor and Gloeophyllun trabeum
2.1.1. Establishing Optimal QSAR Models "Breaking point" method was used to determine the optimal QSAR models of cinnamaldehyde compounds against Trametes versicolor and Gloeophyllun trabeum as shown in Figure 1. The x-coordinate represents number of descriptors, and the y-coordinate represents the correlation coefficient R 2 of the corresponding model. As the trend line shows: the correlation coefficient R 2 increased as the number of descriptors increased. When the number of descriptors (n) was less than 4, the correlation coefficient R 2 increased sharply. The fitting line with high correlation coefficient is 0.997 and 0.9785. When the number of descriptors exceeded 4, the correlation coefficient R 2 increased slightly. The fitting line also had a high correlation coefficient 0.939 and 0.947. According to this method, the breaking point appeared when the number of descriptors was 4 or higher, as shown in Figure 1. The number of descriptors of the best models should also meet the requirements of multi-linear regression, as evidenced by the number of descriptors (k) of the optimal models and the sample number (n) ≥ 3(k + 1) [13]. Therefore, the number of descriptors of the optimal QSAR models against Trametes versicolor and Gloeophyllun trabeum is 4. The value of optimal descriptors is shown in Tables 1 and 2. The optimal models are shown in Tables 3 and 4, these models had the following statistical characteristics: R 2 = 0.910, F = 35.32, and s 2 = 0.0093 for Trametes versicolor; R 2 = 0.926, F = 43.95, and s 2 = 0.0049 for Gloeophyllun trabeum. In this study, we endeavored to add to the limited research concerning cinnamaldehyde derivatives by exploring the relationship between their structure and antifungal activity against Trametes versicolor and Gloeophyllun trabeum. Then, QSAR models were established and those models provided a basic theoretical frame work for the application of cinnamaldehyde derivatives as a wood preservative. According to the QSAR models, two new cinnamaldehyde derivatives with satisfactory antifungal activity against two wood-decaying fungi were designed and tested, which could be used to validate the predictability of the QSAR models.

Determining Optimal QSAR Models against Trametes versicolor and Gloeophyllun trabeum
2.1.1. Establishing Optimal QSAR Models "Breaking point" method was used to determine the optimal QSAR models of cinnamaldehyde compounds against Trametes versicolor and Gloeophyllun trabeum as shown in Figure 1. The xcoordinate represents number of descriptors, and the y-coordinate represents the correlation coefficient R 2 of the corresponding model. As the trend line shows: the correlation coefficient R 2 increased as the number of descriptors increased. When the number of descriptors (n) was less than 4, the correlation coefficient R 2 increased sharply. The fitting line with high correlation coefficient is 0.997 and 0.9785. When the number of descriptors exceeded 4, the correlation coefficient R 2 increased slightly. The fitting line also had a high correlation coefficient 0.939 and 0.947. According to this method, the breaking point appeared when the number of descriptors was 4 or higher, as shown in Figure 1. The number of descriptors of the best models should also meet the requirements of multilinear regression, as evidenced by the number of descriptors (k) of the optimal models and the sample number (n) ≥ 3(k + 1) [13]. Therefore, the number of descriptors of the optimal QSAR models against Trametes versicolor and Gloeophyllun trabeum is 4. The value of optimal descriptors is shown in Tables  1 and 2. The optimal models are shown in Tables 3 and 4, these models had the following statistical characteristics: R 2 = 0.910, F = 35.32, and s 2 = 0.0093 for Trametes versicolor; R 2 = 0.926, F = 43.95, and s 2 = 0.0049 for Gloeophyllun trabeum.    Note: ID: compound number; ESP: electrostatic potential; FNSA-3: fractional atomic charge weighted partial negative surface area; TMSA: total molecular surface area; PNSA-3: total charge weighted partial negatively charged molecular surface area; SAMPOS*RPCG is the result of the partial surface area multiplied by the relative positive charge; * represents multiplier.   Table 5 shows a comparison between experimental values (Exp.logAR) and calculated values (Calc.logAR). And the plot of Exp.logAR versus Calc.logAR is shown in Figure 2. The Calc. logAR was calculated according to the optimal QSAR models. There was little difference among Calc.logAR and Exp.logAR, demonstrating that calculated values were close to the experimental values at averages of 0.0661 and 0.0465, respectively, as shown in Table 5. This miniscule difference indicated that the optimal QSAR models are capable of accurately describing the relationship between chemical structure and bioactivity.

Validation of Optimal QSAR Models
The internal validation results of the optimal QSAR models against Trametes versicolor and Gloeophyllun trabeum are shown in Table 6. The training set models for Gloeophyllun trabeum had the following characteristics: R 2 (fit) ≥ 0.900, F(fit) ≥ 18.03, s 2 (fit) ≤ 0.0057 for Gloeophyllun trabeum; R 2 (fit) ≥ 0.909, F(fit) ≥ 20.07, s 2 (fit) ≤ 0.0078 for Trametes versicolor, and the average correlation coefficient were 0.932 and 0.929, respectively. Each test set compound was predicted according to the above training

Validation of Optimal QSAR Models
The internal validation results of the optimal QSAR models against Trametes versicolor and Gloeophyllun trabeum are shown in Table 6. The training set models for Gloeophyllun trabeum had the following characteristics: R 2 (fit) ≥ 0.900, F(fit) ≥ 18.03, s 2 (fit) ≤ 0.0057 for Gloeophyllun trabeum; R 2 (fit) ≥ 0.909, F(fit) ≥ 20.07, s 2 (fit) ≤ 0.0078 for Trametes versicolor, and the average correlation coefficient were 0.932 and 0.929, respectively. Each test set compound was predicted according to the above training test models, then compared and evaluated according to the predicted and experimental values by linear fitting. The results for linear fitting showed that the average correlation coefficient (R 2 (pred)) was 0.833 and 0.792, respectively. All the internal validation results indicated that the optimal QSAR models are predictable and stable in effect. As described in Section 3.2.2, the optimal QSAR models were subjected to external validation; the correlation coefficient of the external validated models were R 2 Tv = 0.948 and R 2 Gt = 0.926. The last compounds were predicted by the above external validated models. In the linear fitting of the predicted and experimental values of last compounds, the correlation coefficients were 0.804 and 0.984, respectively. These results also demonstrated that the optimal QSAR models had good predictability [14].
According to the external and internal validation tests, the optimal QSAR models were those which could be described using mathematical equations. The optimal QSAR models of the cinnamaldehyde derivatives against Trametes versicolor and Gloeophyllun trabeum were best described using Equations (1) and (2).

Descriptor Analysis in the Optimal QSAR Models
A t-test is typically utilized to measure the importance of descriptors in correlation [15]. According to the t-test values in Table 3, the most statistically significant descriptor is the minimum net atomic charge for an H atom, d1. This is a quantum chemical descriptor that indicates the hydrogen-bond and electrostatic interaction between negative ion and positive ion [16]. In Table 3, the positive correlation coefficient for d1 demonstrated that increasing the hydrogen-bonding and electrostatic interaction in cinnamaldehyde derivatives led to an increase in antifungal activity against Trametes versicolor [17].
The second descriptor was FNSA-3 fractional PNSA (PNSA-3/TMSA), d2 [18], which is the ratio of PNSA-3 and TMSA that can be computed as follows [19]: where TMSA is total area of the molecule and PNSA-3 is the atomic charge weight of the negatively charged molecular surface area [20]: where q A is the partial charge of the atom and S A is the respective atomic negatively charged solvent-accessible surface area. Both q A and S A were computed in Codessa. FNSA-3 is a significant factor on polar active and hydrogen-bond active charges. The third descriptor was ESP-RPCS relative charged SA (SAMPOS*RPCG) (Quantum-Chemical PC), d3, which is also a quantum chemical descriptor. This descriptor reflects the total molecular surface area and properties of the function group and indicates interactions among polar molecules [18].
The fourth descriptor was YZ Shadow/YZ Rectangle, d4, a space property descriptor [21]. The YZ Shadow was calculated by projecting a molecule on the YZ plane, which is related to molecular conformation and molecular orientation. This shape parameter provided a positive indication of the antifungal activity of the cinnamaldehyde derivatives. As the value of descriptor YZ Shadow increased, the antifungal activity of cinnamaldehyde derivatives against Trametes versicolor also increased.
As shown in Table 4, the most statistically significant descriptor was the ESP minimum net atomic charge for an H atom, d1 for QSAR model against Gloeophyllun trabeum. The second most important descriptor was ESP-RPCS Relative positive charged SA(SAMPOS*RPCG) (Quantum-Chemical PC), d5, which is similar to ESP-RPCS Relative charged SA (SAMPOS*RPCG) (Quantum-Chemical PC), d3. It is the result of the partial positive charged surface multiplied by the relative positive charge [18]. The third and fourth most important descriptors were FNSA-3 (PNSA-3/TMSA) (Quantum-Chemical PC), d6, and FNSA-3Fractional PNSA (PNSA-3/TMSA), d7. These are quantum chemical descriptors which describe the total molecular surface properties and the functional group as well as the activity of polar molecules [19].

Designing the New Compound with High Bioactivity, Calculating Its AR
Two cinnamaldehyde amino acid Schiff base compounds with satisfactory predicted activities were selected to synthesize and test their antifungal activity, the structures of those two designed compounds were shown in Figure 3. The chemical structures of new compounds were confirmed by 1 H-NMR, IR, MS, HPLC, purity and melting point. The antifungal activity of new compounds was tested by the same method described in Section 3.2.1, and antifungal activity ratio (AR) of two designed compounds were listed in Table 7. where qA is the partial charge of the atom and SA is the respective atomic negatively charged solventaccessible surface area. Both qA and SA were computed in Codessa. FNSA-3 is a significant factor on polar active and hydrogen-bond active charges. The third descriptor was ESP-RPCS relative charged SA (SAMPOS*RPCG) (Quantum-Chemical PC), d3, which is also a quantum chemical descriptor. This descriptor reflects the total molecular surface area and properties of the function group and indicates interactions among polar molecules [18].
The fourth descriptor was YZ Shadow/YZ Rectangle, d4, a space property descriptor [21]. The YZ Shadow was calculated by projecting a molecule on the YZ plane, which is related to molecular conformation and molecular orientation. This shape parameter provided a positive indication of the antifungal activity of the cinnamaldehyde derivatives. As the value of descriptor YZ Shadow increased, the antifungal activity of cinnamaldehyde derivatives against Trametes versicolor also increased.
As shown in Table 4, the most statistically significant descriptor was the ESP minimum net atomic charge for an H atom, d1 for QSAR model against Gloeophyllun trabeum. The second most important descriptor was ESP-RPCS Relative positive charged SA(SAMPOS*RPCG) (Quantum-Chemical PC), d5, which is similar to ESP-RPCS Relative charged SA (SAMPOS*RPCG) (Quantum-Chemical PC), d3. It is the result of the partial positive charged surface multiplied by the relative positive charge [18]. The third and fourth most important descriptors were FNSA-3 (PNSA-3/TMSA) (Quantum-Chemical PC), d6, and FNSA-3Fractional PNSA (PNSA-3/TMSA), d7. These are quantum chemical descriptors which describe the total molecular surface properties and the functional group as well as the activity of polar molecules [19].

Designing the New Compound with High Bioactivity, Calculating Its AR
Two cinnamaldehyde amino acid Schiff base compounds with satisfactory predicted activities were selected to synthesize and test their antifungal activity, the structures of those two designed compounds were shown in Figure 3. The chemical structures of new compounds were confirmed by 1 H-NMR, IR, MS, HPLC, purity and melting point. The antifungal activity of new compounds was tested by the same method described in Section 3.2.1, and antifungal activity ratio (AR) of two designed compounds were listed in Table 7.   As shown in Table 7, the designed compounds exhibited better antifungal qualities than the 19 cinnamaldehyde compounds listed in Figure 4. The AR Gt of the new compounds against Gloeophyllun trabeum exceeded the AR Tv against Trametes versicolor, indicated that the new compounds possessed better antifungal properties than cinnamaldehyde alone. Additionally, the antifungal activity of the new compounds against Gloeophyllun trabeum significantly exceeded the AR of cinnamaldehyde alone. Concerning the experimental logAR and calculated logAR from optimized models, the experimental value was close to the calculated value for both compounds against both fungi. The smallest error was 0.0155 for Compound A against Gloeophyllun trabeum. This suggested that the QSAR model against Gloeophyllun trabeum exhibited stronger predictability and stability, with a higher correlation coefficient (R 2 = 0.926) and better validation results than models against Trametes versicolor (R 2 = 0.910).  As shown in Table 7, the designed compounds exhibited better antifungal qualities than the 19 cinnamaldehyde compounds listed in Figure 4. The ARGt of the new compounds against Gloeophyllun trabeum exceeded the ARTv against Trametes versicolor, indicated that the new compounds possessed better antifungal properties than cinnamaldehyde alone. Additionally, the antifungal activity of the new compounds against Gloeophyllun trabeum significantly exceeded the AR of cinnamaldehyde alone. Concerning the experimental logAR and calculated logAR from optimized models, the experimental value was close to the calculated value for both compounds against both fungi. The smallest error was 0.0155 for Compound A against Gloeophyllun trabeum. This suggested that the QSAR model against Gloeophyllun trabeum exhibited stronger predictability and stability, with a higher correlation coefficient (R 2 = 0.926) and better validation results than models against Trametes versicolor (R 2 = 0.910).

Paper Disc Method
The paper disc method was used to determine the antifungal activity for cinnamaldehyde compounds [23]. Two wood-decaying fungi, Trametes versicolor and Gloeophyllun trabeum, were used as the test microorganisms after cultivation for two days at 30 • C [24]. The concentration of cinnamaldehyde compounds used in the experiment was 0.25 mol/L.
The medium, paper disc with diameter 8 mm, 0.9 wt % normal saline and petri dishes were sterilized 30~35 min under high pressure and temperature. All the vessels and instruments were subjected to ultraviolet germicidal irradiation for 20 min. Then, 10 mL of the melted medium was transferred into each petri dish and allowed to solidify. After that, 125 µL microorganism suspension was spread on solid medium. And the paper disc impregnated with 0.25 mol/L cinnamaldehyde derivatives solution, were placed in the center of the petri dishes. At last, the petri dishes were cultivated in a constant temperature cultivator (incubator) at 30 • C for 2-3 days. The antifungal activity was determined by measuring the inhibition zones around the discs, the larger the inhibition zone, the greater antifungal activity. All tests were performed in triplicate.
The cinnamaldehyde served as the control. The antifungal activity ratio of cinnamaldehyde derivatives were described using the following equation [8,24]: where d is the average inhibition zone of the cinnamaldehyde derivatives, and d 0 is the average inhibition zone of cinnamaldehyde. The antifungal activity rates and their two-dimensional structure of the 19 cinnamaldehyde derivatives are shown in Figure 4.

Establishing QSAR Models
There were three steps for establishing the QSAR models of the cinnamaldehyde derivatives [12].
(1) Molecule structure geometry optimization: By ChemDraw3D software, the structures of 19 cinnamaldehyde compounds were drawn, and their three-dimensional structures were initially optimized geometrically using the MM 2+ function. The initial optimized structures were inputted in AMPAC Agui 9.2.1 software to conduct geometric optimizing.
(2) Descriptor calculation: In Codessa 2.7.16 software, 4 kinds of descriptors could be calculated for a molecular, Molecule descriptor, Fragment descriptor, Pair and Atom descriptor. In this paper, optimal structures of cinnamaldehyde derivatives were inputted into Codessa 2.7.16 software to calculate Molecule descriptors. These descriptors were divided into six groups: structural, topological, geometrical, thermodynamic, electrostatic, and quantum-chemical descriptors. All were involved in this paper with the exception of thermodynamic descriptor. These descriptors were the basis for establishing the QSAR models [25]. (3) The establishing for best QSAR model: The Best Multi-Linear Regression equation was built by Codessa 2.7.16 software [26]. After Best Multi-Linear Regression analysis, a series of QSAR models were developed. A general method "breaking point" was used to determine the number of descriptors by searching the breaking point of the two R 2 trend lines. The relationship between R 2 and number of descriptor were described as Figure 1 [27]. Two different solutions were used to validate the best models and to explore predictability and stability-internal validation and external validation, respectively.

Validating QSAR Models
Internal validating: The 19 compounds were divided into three groups A (1, 4, 7, 10 . . . ), B (2, 5, 8 . . . ), and C (3, 6, 9 . . . ). Each coupled groups (A + B, B + C, and A + C) was combined as the training set, and the individual group as the test set (C, A, and B). The training set was inputted to Codessa software to develop new four-descriptor QSAR model, then used these models to predict the bioactivity of the group (test set) that had been left out. This was done for each coupled group (A + B, B + C, and A + C). The predicted AR and experimental AR of each testing set compounds were linear fitted by Origin Pro 8.0 software with fixed slope. A series of results R 2 , s 2 , and F values of each training set and testing set were listed in Table 6 [28].
External validation was determined using a similar validation method [29]. Four of 19 compounds were chosen as the external set, and the other compounds as the training set. Training set compounds were inputted to Codessa to establish four-descriptor QSAR models, then QSAR models were used to predict the external set.

Design of New Compounds
Cinnamaldehyde amino acid Schiff base compounds are novel compounds with good water solubility, very weak odor, and good bioactivity [10,30]. Several kinds of cinnamaldehyde amino acid Schiff base compounds were designed. The structures of designed compounds were drawn by ChemDraw 3D software and optimized by AMPAC Agui 9.2.1 software. Then the optimal geometric molecular structures of designed compounds were inputted to Codessa to calculate the molecule descriptor and predict logAR by the best QSAR models. The logAR values of the designed compounds were screened, and two designed compounds had higher logAR value than cinnamaldehyde. Finally, two designed compounds A and B were synthesized as Figure 5 shows [30]. The AR of the two designed compounds was determined as described in Section 3.2.1. to validate the best models and to explore predictability and stability-internal validation and external validation, respectively.

Validating QSAR Models
Internal validating: The 19 compounds were divided into three groups A (1, 4, 7, 10…), B (2, 5, 8…), and C (3, 6, 9…). Each coupled groups (A + B, B + C, and A + C) was combined as the training set, and the individual group as the test set (C, A, and B). The training set was inputted to Codessa software to develop new four-descriptor QSAR model, then used these models to predict the bioactivity of the group (test set) that had been left out. This was done for each coupled group (A + B, B + C, and A + C). The predicted AR and experimental AR of each testing set compounds were linear fitted by Origin Pro 8.0 software with fixed slope. A series of results R 2 , s 2 , and F values of each training set and testing set were listed in Table 6 [28].
External validation was determined using a similar validation method [29]. Four of 19 compounds were chosen as the external set, and the other compounds as the training set. Training set compounds were inputted to Codessa to establish four-descriptor QSAR models, then QSAR models were used to predict the external set.

Design of New Compounds
Cinnamaldehyde amino acid Schiff base compounds are novel compounds with good water solubility, very weak odor, and good bioactivity [10,30]. Several kinds of cinnamaldehyde amino acid Schiff base compounds were designed. The structures of designed compounds were drawn by ChemDraw 3D software and optimized by AMPAC Agui 9.2.1 software. Then the optimal geometric molecular structures of designed compounds were inputted to Codessa to calculate the molecule descriptor and predict logAR by the best QSAR models. The logAR values of the designed compounds were screened, and two designed compounds had higher logAR value than cinnamaldehyde. Finally, two designed compounds A and B were synthesized as Figure 5 shows [30]. The AR of the two designed compounds was determined as described in Section 3.2.1.

Conclusions
In this study, two optimal QSAR models of cinnamaldehyde derivatives against wood-decaying fungi were established and validated, with the following statistical characteristics: R 2 = 0.910, F = 35.32, and s 2 = 0.0093 for Trametes versicolor; R 2 = 0.926, F = 43.95, and s 2 = 0.0049 for Gloeophyllun trabeum. There were seven main parameters effecting antifungal activity of cinnamaldehyde compounds in QSAR models: ESP minimum net atomic charge for an H atom, FNSA-3 Fractional PNSA (PNSA-3/TMSA), ESP-RPCS Relative charged SA (SAMPOS*RPCG), YZ Shadow/YZ Rectangle, ESP-RPCS Relative positive charged SA (SAMPOS*RPCG), FNSA-3 (PNSA-3/TMSA), and FNSA-3 Fractional PNSA (PNSA-3/TMSA). Two new cinnamaldehyde amino acid compounds were designed and synthesized on the basis of these QSAR models and obtained satisfactory results, as the experimental logAR was extremely close to the calculated logAR. The errors were smaller (and thus the model more predictable) for Gloeophyllun trabeum than the errors for Trametes versicolor, but taken together, internal and external validation results reflect a level of predictability in our QSAR models that is highly consistent. In summary, this study showed that QSAR models of cinnamaldehyde derivatives can be used to predict the antifungal activity of new cinnamaldehyde compounds against wood-decaying fungi.