Hologram QSAR Models of a Series of 6-Arylquinazolin-4-Amine Inhibitors of a New Alzheimer’s Disease Target: Dual Specificity Tyrosine-Phosphorylation-Regulated Kinase-1A Enzyme

Dual specificity tyrosine-phosphorylation-regulated kinase-1A (DYRK1A) is an enzyme directly involved in Alzheimer’s disease, since its increased expression leads to β-amyloidosis, Tau protein aggregation, and subsequent formation of neurofibrillary tangles. Hologram quantitative structure-activity relationship (HQSAR, 2D fragment-based) models were developed for a series of 6-arylquinazolin-4-amine inhibitors (36 training, 10 test) of DYRK1A. The best HQSAR model (q2 = 0.757; SEcv = 0.493; R2 = 0.937; SE = 0.251; R2pred = 0.659) presents high goodness-of-fit (R2 > 0.9), as well as high internal (q2 > 0.7) and external (R2pred > 0.5) predictive power. The fragments that increase and decrease the biological activity values were addressed using the colored atomic contribution maps provided by the method. The HQSAR contribution map of the best model is an important tool to understand the activity profiles of new derivatives and may provide information for further design of novel DYRK1A inhibitors.

disease, which has led some research groups to synthesize and evaluate new compounds as potential inhibitors of this protein [10].
There are different classes of DYRK1A inhibitors, some of them are natural products or derivatives and other are synthetic compounds. Among the natural products, harmine, an alkaloid isolated from the South American plant Banisteriopsis caapi, and epigallocatechin gallate, a polyphenol present in green tea, were the first compounds shown to be potent and relatively selective inhibitors of DYRK1A [11]. Other natural products are quinalizarine [12]; flavonoids alcalinol A and B [13]; benzocoumarines [14]; and indolocarbazoles, such as staurosporine and rebeccamycin [15]. Among the synthetic compounds are: pirazolidine-3,5-diones [16]; meriolins [17]; meridianins [18], cromenoindoles [19]; and 6-arylquinazolin-4-amines [20]. All those compounds are still being tested in vitro, and no clinical tests have been conducted so far.
2D and 3D quantitative structure-activity relationship (QSAR) studies are widely employed to develop models, which are capable to explain the biological activity of a series of compounds and to predict the biological activity of new compounds [21][22][23][24][25][26]. 2D-QSAR methods use 2D-fragments and its physicochemical properties to generate predictive quantitative models. Examples of these methods are the fragment-based QSAR (FB-QSAR) [27,28] and hologram QSAR (HQSAR) [29].
As others 2D-QSAR methods, HQSAR is independent of the receptor (e.g., enzyme) structure and uses molecular holograms from 2D molecular fragmentation. In this 2D-QSAR method, each molecule is described by a molecular hologram called bin, which in turn is derived from molecular fragmentation and fragment arrangement, generating a molecular fingerprint. The descriptors used in HQSAR codify linear, branched or overlapped topological fragments, but additional 3D information, such as hybridization and chirality, may also be codified. The main advantage of this 2D-QSAR technique, over the current 3D-QSAR methods, is the fact that there is no need to generate the so-called "bioactive" conformations and molecular alignments. Only the compounds structures and their respective biological activity (or other properties) values are required for the application of this method [29].
In general, QSAR models can be classified as local or global [30]. A local model is derived from a small and similar set of chemical compounds, while a global model, from a chemically diverse large set [30]. Local models reflect the classical approach to QSAR [31], which are often used for drug design purposes when a common mode of action is known. Global models are often used for toxicity screening of pharmaceuticals for regulatory purposes [32].
Therefore, the main purpose of this work is to develop local HQSAR models for a series of 6-arylquinazolin-4-amine inhibitors of DYRK1A [20,33], which may be used to design novel and potent derivatives as potential drugs for the treatment of AD.

HQSAR Model Development
At first, the hologram sizes were set as the prime numbers available in the HQSAR program in order to minimize the probability of bad fragment collisions. Then, maintaining the default fragment size values (4-7 atoms), the maximum number of components (NC) was set to fifteen, which is smaller than half the number of training set compounds (N = 36). Finally, various fragment distinction (FD) parameters were tested, obtaining sixteen different models (Table 1). According to Table 2, all the HQSAR models were acceptable, since the lowest cross-validated correlation coefficient (q 2 ) is 0.640. However, considering only models showing q 2 values higher than 0.730, there were four best models, i.e., A/B/C/Ch/DA (q 2 = 0.743), A/C/Ch/DA (q 2 = 0.742), C/H (q 2 = 0.740), and A/B (q 2 = 0.732), which were used to evaluate the influence of fragment size on model quality.
In order to improve the previously calculated models, eight new templates were generated to each of the four best models, considering different fragment sizes, starting from two to twelve atoms, varying in four units each fragment (2-5, 3-6, 4-7, 5-8, 6-9, 7-10, 8-11, and 9-12 atoms). Only the statistical indexes obtained for the models using the A/B/C/Ch/DA (Table 2) and A/B (Table 3) parameters are shown, since the statistical indexes obtained for the models using the C/H and A/C/Ch/DA parameters did not show improvement. The fragment size variation improved the q 2 and R 2 values and minimizes the SE values, resulting in two best models (Tables 2 and 3).  The best model of the fragment distinction parameter A/B/C/Ch/DA contains 3-6 atoms per fragment (Table 2), while the best model of the fragment distinction parameter A/B contains 7-10 atoms per fragment (Table 3). It is worthy to note that the best model is the one containing five fragment distinction parameters (A/B/C/Ch/DA) and a fragment size of 3-6 atoms ( Table 2), which means that the biological activity of this series of compounds seems to be better explained by a varied set of parameters in a fragment of reduced size. Thus, removing any of these parameters in the model leads to significant loss of information.
The Y-randomization test was carried out in order to analyze the robustness of the best models obtained (Tables 2 and 3). In this test, the biological activity values were randomized and new HQSAR runs were performed (Table 4). According to Table 4, all models obtained by the Y-randomization test were very poor (the highest q 2 value was 0.211) and this result reinforced the robustness of the original models, since there were low probability that the observed correlation occurred by chance. Table 4. Summary of the HQSAR statistical indexes in the Y-randomization test using the default fragment size (4-7 atoms) for the 6-arylquinazolin-4-amine derivatives (N = 36).  After generation and internal validation of the best model, the external validation was carried out in order to access its ability to predict the biological activity values for the test set compounds, i.e., those compounds excluded from the training set used for model generation. The predictive ability of the HQSAR model is expressed by predictive R 2 values, which are similar to cross-validated R 2 (q 2 ), and calculated using Equation (1).
The experimental (pIC50Exp) and predicted (pIC50Pred) biological activities, and residuals (pIC50Exp − pIC50Pred) of the 6-arylquinazolin-4-amine derivatives obtained by the best HQSAR models from the fragment distinction parameters A/B/C/Ch/DA and A/B are reported in Tables 5 and 6, respectively. The comparison plots between the pIC50Exp and pIC50Pred values of both training and test sets of the best HQSAR models from the fragment distinction parameters A/B/C/Ch/DA and A/B are shown in Figures 1 and 2, respectively.     A comprehensive analysis also involves the interpretation of the corresponding HQSAR colored diagrams (contribution maps) in which the colors represent positive (yellow-to-green), neutral (white), and negative (orange-to-red) contributions to the biological activity. Figure 3 shows the colored diagrams for the most (24) and least (6) active compounds for the two best models (A/B/C/Ch/DA and A/B), where the common backbone is colored in cyan.
Considering only the HQSAR contribution maps of 24 (most active, Figure 3), both models are able to identify fragments which increase the biological activity, since in both models there are fragments colored in yellow and green. However, in the case of 6 (least active, Figure 3), only the A/B/C/Ch/DA model is able to identify fragments that decrease the activity, since only in this model is there at least one fragment colored in red. On the other hand, the A/B model of 6 ( Figure 3) shows only fragments colored in white (neutral contribution) and cyan (common backbone), featuring fragments without correlation with the biological activity variation. Consequently, the A/B/C/Ch/DA model seems to be the most able to distinguish among the most and least active compounds, and thus, it is the most useful in the medicinal chemistry context.  (Figure 3), is the presence of a green colored fragment that corresponds to the nitrogen atom of the thiazolyl group (R3 substituent, Table 7). Since only this model has the H-bond donor/acceptor (DA) fragment distinction parameter, this feature highlights the importance of this atom as an H-bond acceptor in a potential H-bonding interaction in the ligand-enzyme complex. Moreover, it also reinforces the A/B/C/Ch/DA model as the best model. Therefore, only this model will be discussed from this point forward.
The contribution map of 24 (Figure 3), according to the best HQSAR model, shows three substituents, namely R1, R2, and R3 (Table 7), which significantly influence the biological activity of this series. The benzodioxol (R1), methyl (R2), and thiazolyl (R3) groups are present in the most active compounds, such as 24, 26, and 27. In fact, all these groups have fragments (at least one atom) colored in green or yellow, highlighting their positive contributions to biological activity.     [33], and 1 to 41 are from [20]; c Compounds 32, 33, and 34 (all from the test set) have one chiral center and their biological activities are from their respective racemic mixture.
The contribution map of 6 ( Figure 3), according to the best HQSAR model, shows one atom colored in red located on the ortho-chloro-phenyl group (R1), which is detrimental to the biological activity, probably because the chlorine atom at the ortho position would prevent higher co-planarity between the two aromatic groups, a feature which may be important in the ligand-protein interaction. Besides, the presence of a fragment colored in red, the lack of green or yellow colored fragments also contributes to the low activity of 6, such as the replacement of methyl (R2) by hydrogen and thiazolyl (R3) by thiophenyl.
Some of these results are in agreement with those presented by Pan et al. [34] in an atom-based 3D-QSAR modeling study, using this same series of 6-arylquinazolin-4-amines. They observed that the inhibitory activity increases when R1 is a phenyl ring substituted with a hydrophilic and electron-withdrawing group, R3 is a heterocyclic ring substituted with a hydrophobic group, and the nitrogen atom of the amine group is substituted with a bulky hydrophobic group. On the other hand, the inhibitory activity decreases when R2 is a hydrogen atom and R1/R3 are hydrophobic groups [34].

Chemical and Biological Data Series
The data set comprises 46 compounds from a series 6-arylquinazolin-4-amines and their biological activities, i.e., the half-maximal inhibitory concentrations (IC50, nM), which were collected from the literature [20,33]. The IC50 values were expressed in negative logarithmic scale, i.e., pIC50 (−LogIC50, M). Table 7 shows the chemical structures and pIC50 values of this series.
For the HQSAR analysis, the data set were divided in training (36 compounds, including the most and the least active compounds) and test (10 compounds) sets. The training set is used for model development and internal validation (cross-validation), while the test set is used only in the external validation of the best models. The division was not entirely random because it was necessary to ensure chemical and biological diversity for both sets. Compounds 32, 33, and 34, containing one chiral center, were included in the test set because their biological activity values were from the racemic mixture. Therefore, they were modeled separately as each of the two enantiomeric forms (R and S).

Molecular Modeling
The chemical structures of these 46 derivatives were built up using the commercial Spartan software (version 10, Wavefunction, Inc., Irvine, CA, USA) [35]. All structures were submitted to the default systematic conformational analysis, using the AM1 semi-empirical method, available in Spartan.
The HQSAR models were generated using the partial least squares (PLS) analysis, while the internal validation procedure was performed by the leave-one-out (LOO) cross-validation approach. Subsequently, the best HQSAR models were selected based on various statistical parameters, including the squared correlation coefficient (R 2 ) and the LOO cross-validated R 2 (q 2 ) values.
In order to evaluate the risk of fortuitous correlation, the Y-randomization (also called Y-scrambling or response randomization) test, an additional validation procedure, in which the biological activity values are randomized and the HQSAR analysis is carried out again for the same training set [37] was performed.
An external validation was carried out, using the test set compounds, which were not considered for the HQSAR model development. The predictive capacity of the models was investigated by calculating the predictive R 2 values (R 2 pred) values, defined according to Equation (1).
In Equation (1), SD is the sum of squared deviations between the biological activity of the test set and the mean activity of the training set molecules, and PRESS is the sum of squared deviations between the observed and the predicted activity values for every molecule in the test set [38].
Importantly, those models are based on a receptor independent QSAR method, i.e., the enzyme structure was not considered, but information about the binding site of the target enzyme is available online in the Protein Data Bank (http://www.rcsb.org/), since there are crystal structures of some inhibitors bound to the same binding site of human DYRK1A [39][40][41][42]. In addition, it is also important to emphasize that user-friendly and publicly accessible web-servers pointed out in [43] are useful tools to share information with the scientific community. However, all softwares used in the current work are commercial and have patent protection, thus they could not be provided in a web-server.

Conclusions
HQSAR (2D fragment-based) models were developed for 46 6-arylquinazolin-4-amines (N training = 36; N test = 10), a series of inhibitors for DYRK1A, an enzyme associated with Alzheimer's disease. The best model, namely A/B/C/Ch/DA (q 2 = 0.757; SEcv = 0.493; R 2 = 0.937; SE = 0.251; R 2 pred = 0.659), contains 3-6 atoms per fragment and encodes atoms, bonds, connectivity, chirality, and donor/acceptor atoms as fragment distinctions. It presents high goodness-of-fit (R 2 > 0.9), as well as high internal (q 2 > 0.7) and external (R 2 pred > 0.5) predictive power, which indicate the reliability of the constructed model. According to the Y-randomization test (q 2 ≤ 0.211), the observed correlation is not due to chance. The HQSAR colored diagrams display the contributions of the fragments in the increase or decrease of the biological activity of the compounds. The positive and negative contributions of the fragments addressed by those diagrams are in accordance with a previously performed 3D-QSAR characterization and thus may be helpful to design new 6-arylquinazolin-4-amine derivatives with enhanced DYRK1A inhibitory activity.