QSAR Study of N-Myristoyltransferase Inhibitors of Antimalarial Agents

Malaria is a disease caused by protozoan parasites of the genus Plasmodium that affects millions of people worldwide. In recent years there have been parasite resistances to several drugs, including the first-line antimalarial treatment. With the aim of proposing new drugs candidates for the treatment of disease, Quantitative Structure–Activity Relationship (QSAR) methodology was applied to 83 N-myristoyltransferase inhibitors, synthesized by Leatherbarrow et al. The QSAR models were developed using 63 compounds, the training set, and externally validated using 20 compounds, the test set. Ten different alignments for the two test sets were tested and the models were generated by the technique that combines genetic algorithms and partial least squares. The best model shows r2 = 0.757, q2adjusted = 0.634, R2pred = 0.746, R2m = 0.716, ∆R2m = 0.133, R2p = 0.609, and R2r = 0.110. This work suggested a good correlation with the experimental results and allows the design of new potent N-myristoyltransferase inhibitors.


Results
The GA-PLS analysis using grid cells of 1.0 A generated several models or equations. The statistical parameters of ten alignments studied for Test Set I (compounds 1, 3, 5, 6, 12, 16, 20, 30, 33, 39, 40, 50, 56, 57, 61, 65, 66, 69, 76, and 80) and Test Set II (compounds 3, 6, 9, 13, 20, 21, 27, 28, 31, 32, 40, 56, 57, 58, 64, 70, 73, 76, 78, and 82) are shown in Tables 1 and 2, respectively. All tested alignments showed q 2 values higher than 0.5. This reveals that the model can be a useful tool for predicting affinities of new compounds based on these structures; r 2 greater than 0.7 indicates that the model is correlated and may be considered to represent the training set in the same manner [16]. Alignments 6B and 7B were eliminated from the analysis because it presented a low r 2 value (<0.7).  Evaluating the predictive ability in terms of R 2 pred (means of an external validation), the alignment 5A was eliminated (R 2 pred < 0.5). All R 2 m values were greater than 0.60, and values over 0.5 are acceptable. Analyzing the R 2 p values, alignments B1, B2, B4, B6, and B7 were excluded because this parameter value was less than 0.5 [17]. Alignment 3 from Test Set II (B3) provides the best 4D-QSAR models as judged by the highest q 2 adj , in addition to presenting fewer descriptors. Among the alignments with only seven descriptors, this still has the highest values of r 2 , q 2 adj , R 2 p , and the lowest value of R 2 r . According to these results, we selected Model B3 as the best alignment. We will only present the analysis of the best model derived from B3.

Discussion
GCODs are related to the coordinates of IPE mapped in a common grid. A graphic representation of the descriptors of Model B3 is shown in Figure 2 using Compound 81 as a reference. Light and dark spheres represent GCODs with positive and negative coefficients, respectively, in accordance with Model B3. GCOD-1 (0,−3,−1, hba) ( Figure 3) is the descriptor that most contributes to the increased effectiveness of compounds and presents a coefficient of 4.942. This grid cell represents an acceptor hydrogen bond atom type (IPE) and shows high frequency of occupation for compounds 42, 48, 65, 68, and 69. It is located close to the nitrogen atom of the oxadiazole ring and indicates an amino acid donor hydrogen bond in N-myristoyltransferase. Another different variant of R 2 m metrics was calculated from Model B3 to assess the predictive ability of the test set, ∆R 2 m. The value of ∆R 2 m found was 0.133. It has been suggested that to be considered a predictive model, this value should be less than 0.2 [18]. Model B3 generated seven descriptors, where GCODs (−1,−4,−3, any), (0−0, any), (0,6,2, any), (0,−5,−1, any), (0,3,−3, any), and (0,−3, −1, hba) present positive coefficients (Equation (3)) and correspond to favorable interactions between the molecule substituent and amino acid residues in the active site of NMT. Therefore, substituents in these positions increase the effectiveness of the compounds. The GCOD (−1,−4,−4, np) has negative coefficient and correspond to unfavorable interactions between the molecule substituent and amino acid residues in the active site of NMT. Therefore, the occupation of GCOD (−1,−4,−4, np) decreases the compound potency.

Discussion
GCODs are related to the coordinates of IPE mapped in a common grid. A graphic representation of the descriptors of Model B3 is shown in Figure 2 using Compound 81 as a reference. Light and dark spheres represent GCODs with positive and negative coefficients, respectively, in accordance with Model B3. GCOD-1 (0,−3,−1, hba) ( Figure 3) is the descriptor that most contributes to the increased effectiveness of compounds and presents a coefficient of 4.942. This grid cell represents an acceptor hydrogen bond atom type (IPE) and shows high frequency of occupation for compounds 42, 48, 65, 68, and 69. It is located close to the nitrogen atom of the oxadiazole ring and indicates an amino acid donor hydrogen bond in N-myristoyltransferase. The oxadiazole ring in the ortho position allows the nitrogen atom to occupy this grid cell, exemplified by compound 42 (Figure 3). However, the most active molecules of the training set, 81 and 83, do not have this descriptor that contributes most to the increase in the potency of the compounds. Once in this position there is a methyl group and the oxadiazole group that are displaced. In fact, the oxadiazole ring in these compounds does not occupy this grid cell. The oxadiazole ring in the ortho position allows the nitrogen atom to occupy this grid cell, exemplified by compound 42 (Figure 3). However, the most active molecules of the training set, 81 and 83, do not have this descriptor that contributes most to the increase in the potency of the compounds. Once in this position there is a methyl group and the oxadiazole group that are displaced. In fact, the oxadiazole ring in these compounds does not occupy this grid cell. GCOD-2 (−1,−4,−4, np) ( Figure 4) contributes to decrease compound potency and presents a coefficient of −8.269. This grid cell corresponds to a nonpolar IPE and shows high occupation frequency for Compounds 42, 48, and 55. These molecules present non-polar groups in this local, such as ethyl and methyl groups. Thus, the occupation of this cell is drastically reduced when this position is non-polar substituted, that decrease the activity of these compounds. Meanwhile, if the polar groups, such as OH or N in 81 or 83, respectively, are localized in this grid cell, the GCOD-2 descriptor yields less negative value which does not decrease the predicted pIC50. We can see that compound 42 occupies the descriptor that decreases the activity. However, it also presents occupancy for GCOD-1, which increases the activity. As expected, this compound exhibits an intermediate power. This suggests that the occupation of this cell by acceptor hydrogen bond should be favorable in contrast to nonpolar atoms that are unable to perform hydrogen bond interactions.
GCOD-3 (0,−5,−1, any) is present as a non-specific IPE. The atom type "any" is used when more than one specific atom [IPE] type across the training set is found to satisfy the interaction being captured by a particular GCOD [19]. GCOD (0,−5,−1, any) has a positive coefficient of 2.345 which increases the potency of the compounds ( Figure 5). This grid cell is close to methyl and ethyl groups, toward the left side, and shows greater occupation frequency for compounds 26, 30, and 33. On the other hand, compounds 42 and 45 showed no occupancy for this descriptor, because during the molecular dynamics simulation, they assumed a different conformation, facing right. On the other hand, GCOD (0,−5,−1, any) also provides occupation frequency in the benzene ring of molecules 59, 60, 61, and 62. 269. This grid cell corresponds to a nonpolar IPE and shows high occupation frequency for Compounds 42, 48, and 55. These molecules present non-polar groups in this local, such as ethyl and methyl groups. Thus, the occupation of this cell is drastically reduced when this position is non-polar substituted, that decrease the activity of these compounds. Meanwhile, if the polar groups, such as OH or N in 81 or 83, respectively, are localized in this grid cell, the GCOD-2 descriptor yields less negative value which does not decrease the predicted pIC 50 . We can see that compound 42 occupies the descriptor that decreases the activity. However, it also presents occupancy for GCOD-1, which increases the activity. As expected, this compound exhibits an intermediate power. GCOD-2 (−1,−4,−4, np) ( Figure 4) contributes to decrease compound potency and presents a coefficient of −8.269. This grid cell corresponds to a nonpolar IPE and shows high occupation frequency for Compounds 42, 48, and 55. These molecules present non-polar groups in this local, such as ethyl and methyl groups. Thus, the occupation of this cell is drastically reduced when this position is non-polar substituted, that decrease the activity of these compounds. Meanwhile, if the polar groups, such as OH or N in 81 or 83, respectively, are localized in this grid cell, the GCOD-2 descriptor yields less negative value which does not decrease the predicted pIC50. We can see that compound 42 occupies the descriptor that decreases the activity. However, it also presents occupancy for GCOD-1, which increases the activity. As expected, this compound exhibits an intermediate power. This suggests that the occupation of this cell by acceptor hydrogen bond should be favorable in contrast to nonpolar atoms that are unable to perform hydrogen bond interactions.
GCOD-3 (0,−5,−1, any) is present as a non-specific IPE. The atom type "any" is used when more than one specific atom [IPE] type across the training set is found to satisfy the interaction being captured by a particular GCOD [19]. GCOD (0,−5,−1, any) has a positive coefficient of 2.345 which increases the potency of the compounds ( Figure 5). This grid cell is close to methyl and ethyl groups, toward the left side, and shows greater occupation frequency for compounds 26, 30, and 33. On the other hand, compounds 42 and 45 showed no occupancy for this descriptor, because during the molecular dynamics simulation, they assumed a different conformation, facing right. On the other hand, GCOD (0,−5,−1, any) also provides occupation frequency in the benzene ring of molecules 59, 60, 61, and 62. This suggests that the occupation of this cell by acceptor hydrogen bond should be favorable in contrast to nonpolar atoms that are unable to perform hydrogen bond interactions.
GCOD-3 (0,−5,−1, any) is present as a non-specific IPE. The atom type "any" is used when more than one specific atom [IPE] type across the training set is found to satisfy the interaction being captured by a particular GCOD [19]. GCOD (0,−5,−1, any) has a positive coefficient of 2.345 which increases the potency of the compounds ( Figure 5). This grid cell is close to methyl and ethyl groups, toward the left side, and shows greater occupation frequency for compounds 26, 30, and 33. On the other hand, compounds 42 and 45 showed no occupancy for this descriptor, because during the molecular dynamics simulation, they assumed a different conformation, facing right. On the other hand, GCOD (0,−5,−1, any) also provides occupation frequency in the benzene ring of molecules 59, 60, 61, and 62. GCOD-4 (0,−1,0, any) have a positive coefficient and, thus, also greatly influence the increase in inhibitor potency ( Figure 6). It is located near the methyl group in benzofuran, 2,3-dihidro-3-methyl and represents a non-specific IPE. It shows the highest occupation frequency for the most active compounds in this series, compounds 1, 81, and 83. The GCOD (0,−3,−1, hba) reflects the importance of occupation of this receptor region for the effectiveness of the inhibitors.  GCOD-4 (0,−1,0, any) have a positive coefficient and, thus, also greatly influence the increase in inhibitor potency ( Figure 6). It is located near the methyl group in benzofuran, 2,3-dihidro-3-methyl and represents a non-specific IPE. It shows the highest occupation frequency for the most active compounds in this series, compounds 1, 81, and 83. The GCOD (0,−3,−1, hba) reflects the importance of occupation of this receptor region for the effectiveness of the inhibitors. GCOD-4 (0,−1,0, any) have a positive coefficient and, thus, also greatly influence the increase in inhibitor potency ( Figure 6). It is located near the methyl group in benzofuran, 2,3-dihidro-3-methyl and represents a non-specific IPE. It shows the highest occupation frequency for the most active compounds in this series, compounds 1, 81, and 83. The GCOD (0,−3,−1, hba) reflects the importance of occupation of this receptor region for the effectiveness of the inhibitors.  GCOD-4 (0,−1,0, any) have a positive coefficient and, thus, also greatly influence the increase in inhibitor potency ( Figure 6). It is located near the methyl group in benzofuran, 2,3-dihidro-3-methyl and represents a non-specific IPE. It shows the highest occupation frequency for the most active compounds in this series, compounds 1, 81, and 83. The GCOD (0,−3,−1, hba) reflects the importance of occupation of this receptor region for the effectiveness of the inhibitors.  GCOD-6 (−1,−4,−3, any) represents a non-specific IPE and also has a positive coefficient, indicating an increase in potency of compounds which have high occupation frequency for this descriptor (Figure 8). The molecules 14, 19, and 22 have a high occupation frequency for this GCOD, located near the benzene group. This grid cell suggests a hydrophobic region in the receptor close to the benzene ring, which should be making a π-π staking interaction between the aromatic ring of an inhibitor and one aromatic amino acid residue.
Lastly, GCOD-7 (0,6,2, any) ( Figure 9) has a positive coefficient and shows a non-specific class. Molecules 59-63 have a high occupation frequency for this descriptor. The presence of naphthalene group in this grid cell increase the activity. In fact, it shows that the aromatic substituents in this position should be preferred.
Molecules 2018, 23, x FOR PEER REVIEW 7 of 18 GCOD-6 (−1,−4,−3, any) represents a non-specific IPE and also has a positive coefficient, indicating an increase in potency of compounds which have high occupation frequency for this descriptor (Figure 8). The molecules 14, 19, and 22 have a high occupation frequency for this GCOD, located near the benzene group. This grid cell suggests a hydrophobic region in the receptor close to the benzene ring, which should be making a π-π staking interaction between the aromatic ring of an inhibitor and one aromatic amino acid residue.
Lastly, GCOD-7 (0,6,2, any) ( Figure 9) has a positive coefficient and shows a non-specific class. Molecules 59-63 have a high occupation frequency for this descriptor. The presence of naphthalene group in this grid cell increase the activity. In fact, it shows that the aromatic substituents in this position should be preferred. In order to find new active structures, the information of the descriptors obtained by the Model B3 was used. Modifications to the structure of compounds 60, 65, and 81 are suggested and compounds A-E were proposed.
The proposed compounds C-E, exhibited predicted pIC50 higher than 81 (the best experimental compound). The structure of the five compounds and their predicted pIC50 values are shown in Table  3. The ADME (absorption, distribution, metabolism, and excretion) of a drug is an important property that can determine the utilization of molecules proposed in the therapeutic usage. For the evaluation of pharmacokinetic parameters for molecules A-E we used the Lipinski's Rule of Five, where molecular properties are closely related to the oral bioavailability of a drug [20], wherein compounds should not violate more than one rule. In this rule, the compounds should present logP no more than 5, molecular weight of 500 Daltons, number of hydrogen bond acceptors (nON) of 10, number of hydrogen bond donors (nOHNH) of 5, and number of rotatable bonds (nrotb) no more than 10. The proposed molecules have been designed in the Molinspiration Online Property Calculation Software Toolkit [21] to evaluate the criteria discussed above.  GCOD-6 (−1,−4,−3, any) represents a non-specific IPE and also has a positive coefficient, indicating an increase in potency of compounds which have high occupation frequency for this descriptor (Figure 8). The molecules 14, 19, and 22 have a high occupation frequency for this GCOD, located near the benzene group. This grid cell suggests a hydrophobic region in the receptor close to the benzene ring, which should be making a π-π staking interaction between the aromatic ring of an inhibitor and one aromatic amino acid residue.
Lastly, GCOD-7 (0,6,2, any) ( Figure 9) has a positive coefficient and shows a non-specific class. Molecules 59-63 have a high occupation frequency for this descriptor. The presence of naphthalene group in this grid cell increase the activity. In fact, it shows that the aromatic substituents in this position should be preferred. In order to find new active structures, the information of the descriptors obtained by the Model B3 was used. Modifications to the structure of compounds 60, 65, and 81 are suggested and compounds A-E were proposed.
The proposed compounds C-E, exhibited predicted pIC50 higher than 81 (the best experimental compound). The structure of the five compounds and their predicted pIC50 values are shown in Table  3. The ADME (absorption, distribution, metabolism, and excretion) of a drug is an important property that can determine the utilization of molecules proposed in the therapeutic usage. For the evaluation of pharmacokinetic parameters for molecules A-E we used the Lipinski's Rule of Five, where molecular properties are closely related to the oral bioavailability of a drug [20], wherein compounds should not violate more than one rule. In this rule, the compounds should present logP no more than 5, molecular weight of 500 Daltons, number of hydrogen bond acceptors (nON) of 10, number of hydrogen bond donors (nOHNH) of 5, and number of rotatable bonds (nrotb) no more than 10. The proposed molecules have been designed in the Molinspiration Online Property Calculation Software Toolkit [21] to evaluate the criteria discussed above. In order to find new active structures, the information of the descriptors obtained by the Model B3 was used. Modifications to the structure of compounds 60, 65, and 81 are suggested and compounds A-E were proposed.
The proposed compounds C-E, exhibited predicted pIC 50 higher than 81 (the best experimental compound). The structure of the five compounds and their predicted pIC 50 values are shown in Table 3. The ADME (absorption, distribution, metabolism, and excretion) of a drug is an important property that can determine the utilization of molecules proposed in the therapeutic usage. For the evaluation of pharmacokinetic parameters for molecules A-E we used the Lipinski's Rule of Five, where molecular properties are closely related to the oral bioavailability of a drug [20], wherein compounds should not violate more than one rule. In this rule, the compounds should present logP no more than 5, molecular weight of 500 Daltons, number of hydrogen bond acceptors (nON) of 10, number of hydrogen bond donors (nOHNH) of 5, and number of rotatable bonds (nrotb) no more than 10. The proposed molecules have been designed in the Molinspiration Online Property Calculation Software Toolkit [21] to evaluate the criteria discussed above. The Molinspiration Online Property Calculation Software Toolkit [21] was used to analyze drug likeness (Lipinski's Rule of Five) and the results are shown in Table 4. According to the data in Table  4, no compound violated the Lipinski's Rule of Five.

7.894
The Molinspiration Online Property Calculation Software Toolkit [21] was used to analyze drug likeness (Lipinski's Rule of Five) and the results are shown in Table 4. According to the data in Table 4, no compound violated the Lipinski's Rule of Five.

Biological Data
In order to build QSAR models, 83 Plasmodium falciparum inhibitors were retrieved from Leatherbarrow et al. [15]. Twenty compounds (25%) were randomly selected to compose the test set (external validation). Two test groups were chosen. The first (Test Set I) has the following molecules :  1, 3, 5, 6, 12, 16, 20, 30, 33, 39, 40, 50, 56, 57, 61, 65, 66, 69, 76, and 80; Test Set II has the following molecules: 3, 6, 9, 13, 20, 21, 27, 28, 31, 32, 40, 56, 57, 58, 64, 70, 73, 76, 78, and 82 (Table 5).         The biological activities of these compounds were reported as the negative logarithm of concentration capable of inhibiting 50% of the enzyme activity (IC50), measured using an adapted version of the sensitive fluorescence-based assay based on detection of CoA by 7-diethylamino-3-(4maleimido-phenyl)-4-methylcoumarin [15]. Furthermore, in an effort to eliminate the potential noise that might have been introduced by the pooling of data sets from different sources, all pharmacological data are obtained from the same laboratory. The IC50 (μM) values were converted into molar units and then expressed in negative logarithmic units (pIC50), and are depicted in Table  4. The range of pIC50 values for the training and test set spans at least three orders of magnitude (4.00 to 7.30), and the biological activity values show a regular distribution over the whole range.

Molecular Dynamic Simulation (MDS)
The three-dimensional structure from 83 analogues (Table 5) were optimized in vacuum, without any restriction, and the partial atomic charges assigned using RM1 semiempirical Hamiltonian [22]. The MDS was carried out at 300 K, close to the temperature assays, with a simulation sampling time of 100 ps, and intervals of 0.001 ps. Thus, a total sample of 100,000 conformations of each compound was produced. MDS have been performed using the GROMACS The biological activities of these compounds were reported as the negative logarithm of concentration capable of inhibiting 50% of the enzyme activity (IC50), measured using an adapted version of the sensitive fluorescence-based assay based on detection of CoA by 7-diethylamino-3-(4maleimido-phenyl)-4-methylcoumarin [15]. Furthermore, in an effort to eliminate the potential noise that might have been introduced by the pooling of data sets from different sources, all pharmacological data are obtained from the same laboratory. The IC50 (μM) values were converted into molar units and then expressed in negative logarithmic units (pIC50), and are depicted in Table  4. The range of pIC50 values for the training and test set spans at least three orders of magnitude (4.00 to 7.30), and the biological activity values show a regular distribution over the whole range.

Molecular Dynamic Simulation (MDS)
The three-dimensional structure from 83 analogues (Table 5) were optimized in vacuum, without any restriction, and the partial atomic charges assigned using RM1 semiempirical Hamiltonian [22]. The MDS was carried out at 300 K, close to the temperature assays, with a simulation sampling time of 100 ps, and intervals of 0.001 ps. Thus, a total sample of 100,000 conformations of each compound was produced. MDS have been performed using the GROMACS The biological activities of these compounds were reported as the negative logarithm of concentration capable of inhibiting 50% of the enzyme activity (IC50), measured using an adapted version of the sensitive fluorescence-based assay based on detection of CoA by 7-diethylamino-3-(4maleimido-phenyl)-4-methylcoumarin [15]. Furthermore, in an effort to eliminate the potential noise that might have been introduced by the pooling of data sets from different sources, all pharmacological data are obtained from the same laboratory. The IC50 (μM) values were converted into molar units and then expressed in negative logarithmic units (pIC50), and are depicted in Table  4. The range of pIC50 values for the training and test set spans at least three orders of magnitude (4.00 to 7.30), and the biological activity values show a regular distribution over the whole range.

Molecular Dynamic Simulation (MDS)
The three-dimensional structure from 83 analogues (Table 5) were optimized in vacuum, without any restriction, and the partial atomic charges assigned using RM1 semiempirical Hamiltonian [22]. The MDS was carried out at 300 K, close to the temperature assays, with a simulation sampling time of 100 ps, and intervals of 0.001 ps. Thus, a total sample of 100,000 conformations of each compound was produced. MDS have been performed using the GROMACS The biological activities of these compounds were reported as the negative logarithm of concentration capable of inhibiting 50% of the enzyme activity (IC50), measured using an adapted version of the sensitive fluorescence-based assay based on detection of CoA by 7-diethylamino-3-(4maleimido-phenyl)-4-methylcoumarin [15]. Furthermore, in an effort to eliminate the potential noise that might have been introduced by the pooling of data sets from different sources, all pharmacological data are obtained from the same laboratory. The IC50 (μM) values were converted into molar units and then expressed in negative logarithmic units (pIC50), and are depicted in Table  4. The range of pIC50 values for the training and test set spans at least three orders of magnitude (4.00 to 7.30), and the biological activity values show a regular distribution over the whole range.

Molecular Dynamic Simulation (MDS)
The three-dimensional structure from 83 analogues (Table 5) were optimized in vacuum, without any restriction, and the partial atomic charges assigned using RM1 semiempirical Hamiltonian [22]. The MDS was carried out at 300 K, close to the temperature assays, with a simulation sampling time of 100 ps, and intervals of 0.001 ps. Thus, a total sample of 100,000 conformations of each compound was produced. MDS have been performed using the GROMACS 6 The biological activities of these compounds were reported as the negative logarithm of concentration capable of inhibiting 50% of the enzyme activity (IC50), measured using an adapted version of the sensitive fluorescence-based assay based on detection of CoA by 7-diethylamino-3-(4maleimido-phenyl)-4-methylcoumarin [15]. Furthermore, in an effort to eliminate the potential noise that might have been introduced by the pooling of data sets from different sources, all pharmacological data are obtained from the same laboratory. The IC50 (μM) values were converted into molar units and then expressed in negative logarithmic units (pIC50), and are depicted in Table  4. The range of pIC50 values for the training and test set spans at least three orders of magnitude (4.00 to 7.30), and the biological activity values show a regular distribution over the whole range.

Molecular Dynamic Simulation (MDS)
The three-dimensional structure from 83 analogues (Table 5) were optimized in vacuum, without any restriction, and the partial atomic charges assigned using RM1 semiempirical Hamiltonian [22]. The MDS was carried out at 300 K, close to the temperature assays, with a simulation sampling time of 100 ps, and intervals of 0.001 ps. Thus, a total sample of 100,000 conformations of each compound was produced. MDS have been performed using the GROMACS The biological activities of these compounds were reported as the negative logarithm of concentration capable of inhibiting 50% of the enzyme activity (IC 50 ), measured using an adapted version of the sensitive fluorescence-based assay based on detection of CoA by 7-diethylamino-3-(4-maleimido-phenyl)-4-methylcoumarin [15]. Furthermore, in an effort to eliminate the potential noise that might have been introduced by the pooling of data sets from different sources, all pharmacological data are obtained from the same laboratory. The IC 50 (µM) values were converted into molar units and then expressed in negative logarithmic units (pIC 50 ), and are depicted in Table 4. The range of pIC 50 values for the training and test set spans at least three orders of magnitude (4.00 to 7.30), and the biological activity values show a regular distribution over the whole range.

Molecular Dynamic Simulation (MDS)
The three-dimensional structure from 83 analogues ( Table 5) were optimized in vacuum, without any restriction, and the partial atomic charges assigned using RM1 semiempirical Hamiltonian [22]. The MDS was carried out at 300 K, close to the temperature assays, with a simulation sampling time of 100 ps, and intervals of 0.001 ps. Thus, a total sample of 100,000 conformations of each compound was produced. MDS have been performed using the GROMACS 5.1 package [23].

Alignment Definition
As the compounds are structural analogs, we will assume that all molecules bind to the receptor in a similar mode. In general, the alignments are chosen to span the common framework of the molecules in the training and test sets [24][25][26]. In this work, ten alignments were performed using atoms of the (common) benzene ring. Three-ordered atom trial alignments were selected: (1) a-b-c, (9) d-a-f, and (10) a-b-e ( Figure 10). The order of the three ordered-atoms is important: the first atom specified for a molecule might be expected to occupy a similar location in space as the first atom specified for the second molecule. The conformational ensemble profile (CEP) for each compound obtained after the MDS step was overlaid onto a cubic lattice with grid cell size of 1A.

Interaction Pharmacophore Elements
According to the 4D-QSAR methodology, atoms of each compound are defined by seven types of interaction pharmacophore elements (IPEs). IPEs correspond to the interactions that may occur between ligand and the active site: (i) any type (any); (ii) nonpolar (np); (iii) polar-positive charge density (p+); 9iv) polar-negative charge density (p−); (v) hydrogen bond acceptor (hba); (vi) hydrogen bond donor (hbd); and (vii) aromatic systems (ar). The occupancy of the grid cells by each IPE type are recorded over the conformational assembly profile, and forms the set of grid cell occupancy descriptors (GCOD) to be utilized as the pool of trial descriptors in the model building and optimization process [16]. The idea underlying a 4D-QSAR analysis is that variations in biological responses were related to differences in the Boltzmann average spatial distribution of molecular shape with respect to the IPE. Thus, the normalized grid cell absolute occupancy, defined as the number of times that a cell was occupied by an atom type during the MDS, divided by the size of the CEP (1000 conformations), was used to define the GCODs. 4D-QSAR model calculation. In order to exclude data noise from databases generated by the alignments, partial least-squares (PLS) regression analysis was performed as a data reduction fit between the observed dependent variable measures and the corresponding set of GCOD values. Additionally, PLS identifies the most highly weighted GCODs from data set of local grid cells [16].

Interaction Pharmacophore Elements
According to the 4D-QSAR methodology, atoms of each compound are defined by seven types of interaction pharmacophore elements (IPEs). IPEs correspond to the interactions that may occur between ligand and the active site: (i) any type (any); (ii) nonpolar (np); (iii) polar-positive charge density (p+); 9iv) polar-negative charge density (p−); (v) hydrogen bond acceptor (hba); (vi) hydrogen bond donor (hbd); and (vii) aromatic systems (ar). The occupancy of the grid cells by each IPE type are recorded over the conformational assembly profile, and forms the set of grid cell occupancy descriptors (GCOD) to be utilized as the pool of trial descriptors in the model building and optimization process [16]. The idea underlying a 4D-QSAR analysis is that variations in biological responses were related to differences in the Boltzmann average spatial distribution of molecular shape with respect to the IPE. Thus, the normalized grid cell absolute occupancy, defined as the number of times that a cell was occupied by an atom type during the MDS, divided by the size of the CEP (1000 conformations), was used to define the GCODs. 4D-QSAR model calculation. In order to exclude data noise from databases generated by the alignments, partial least-squares (PLS) regression analysis was performed as a data reduction fit between the observed dependent variable measures and the corresponding set of GCOD values. Additionally, PLS identifies the most highly weighted GCODs from data set of local grid cells [16].
The two hundred GCODs with the highest weight from the data reduction were chosen to form the trial descriptor basis sets for model optimization by genetic function approximation (GFA) analysis [16]. Optimizations were initiated using 100 randomly generated models and 10,000-100,000 crossover operations. Mutation probability over the crossover optimization cycle was set at 10-30%. The smoothing factor, the variable that specifies the number of descriptors in the QSAR models, was varied between 1.0 and 3.0, in order to determine equations with no more than twelve terms. Each alignment was evaluated using the procedure described above.
The best models, resulting from the 4D-QSAR study were based on different criteria [18,[26][27][28]: (1) Coefficient of determination (r 2 ): is a measure of how well the regression line represents the data.
(2) Adjusted cross-validated squared correlation coefficient (q 2 adj ): allows the comparison between models with different number of variables.
(3) Correlation coefficient of external validation set (R 2 pred ): reflects the degree of correlation between the observed (Y Exp(test) )and predicted (Y Pred(test) ) activity data of the test set: where Y Training is average value for the dependent variable for the training set. (4) Modified r 2 (r 2 m(test) ) equation determining the proximity between the observed and predicted values with the zero axis intersection: (5) Y-randomization (R 2 r) consists of the random exchange of the independent variable values. Thus, the R 2 r value must be less than the correlation coefficient of the non-randomized models. (6) R 2 p penalizes the model R 2 for the difference between the squared mean correlation coefficient (R 2 r ) of randomized models and the square correlation coefficient (r 2 ) of the non-randomized model:

Conformational Selection
In the 4D-QSAR method, the conformation of each compound can be postulated as the lowest-energy conformer state from the set sampled for each compound, which predicted the maximum activity using the optimum 4D-QSAR model [16,[29][30][31][32].

Conclusions
In summary, 4D-QSAR models for NMT inhibitors were built and evaluated. Two test groups were evaluated for the ten tested alignments. The best model was obtained from Alignment B3, and generated an equation with seven descriptors, six of which have positive coefficients and one a negative coefficient. Model B3 showed a satisfactory statistical quality and predictive abilities as shown by r 2 = 0.757, q 2 = 0.702, q 2 adjusted = 0.634, R 2 pred = 0.746, R 2 m = 0.716, and R 2 p = 0.609. Furthermore, it showed low values of R 2 r = 0.110, and ∆R 2 m(test) = 0.133. 4D-QSAR analysis indicated an important role of acceptor hydrogen bonding groups and aromatic groups, allowing to propose five structures. These have proved more active than compound 81, in addition to being assessed by the Lipinski's Rule. Accordingly, these molecules may be considered promising prototypes against malaria. Funding: This research was funded by Conselho Nacional de Desenvolvimento Científico e Tecnológico (307837/2014-9).