Nucleic Acid Quadratic Indices of the “Macromolecular Graph’s Nucleotides Adjacency Matrix”. Modeling of Footprints after the Interaction of Paromomycin with the HIV-1 Ψ-RNA Packaging Region

This report describes a new set of macromolecular descriptors of relevance to nucleic acid QSAR/QSPR studies, nucleic acid’s quadratic indices. These descriptors are calculated from the macromolecular graph’s nucleotides adjacency matrix. A study of the interaction of the antibiotic Paromomycin with the packaging region of the RNA present in type-1 HIV illustrates this approach. A linear discriminant function gave rise to excellent discrimination between 90.10% (91/101) and 81.82% (9/11) of interacting/noninteracting sites of nucleotides in training and test set, respectively. The LOO cross-validation procedure was used to assess the stability and predictability of the model. Using this approach, the classification model has shown a LOO global good classification of 91.09%. In addition, the model’s overall predictability oscillates from 89.11% until 87.13%, when varies from 2 to 3 in leave--out jackknife method. This value stabilizes around 88.12% when was > 3. On the other hand, a linear regression model predicted the local binding affinity constants [log (10M)] between a specific nucleotide and the aforementioned antibiotic. The linear model explains almost 92% of the variance of the experimental log (R = 0.96 and s = 0.07) and LOO press statistics evidenced its predictive ability ( = 0.85 and s = 0.09). These models also permit the interpretation of the driving forces of the interaction process. In this sense, developed equations involve short-reaching ( ≤ 3), middle-reaching (4 < < 9) and far-reaching ( = 10 or greater) nucleotide’s quadratic indices. This situation points to electronic and topologic nucleotide’s backbone interactions control of the stability profile of Paromomycin-RNA complexes. Consequently, the present approach represents a novel and rather promising way to chem & bioinformatics research.


Introduction
High throughput genome sequencing projects are producing an enormous amount of raw sequence data.All this data begs for methods that are able to synthesize the information into biological knowledge [1].Public databases such as GenBank are growing in size at an exponential rate [2].A significant proportion of the data corresponds to genomic sequences containing the structures not only of many genes but also of RNA.
The amount of new genome data has dramatically increased in recent years and it has once again brought to the forefront the question of protein and nucleic acid functions [3].In this respect, the use of footprint techniques has proven to be an important experimental method for the discovery of significant processes in molecular biology and the field of genomics [4][5][6][7][8].These experimental techniques permit quantitatively analyze D(R)Nase footprinting data for drugs interacting with D(R)NA obtaining apparent binding constants from the spot intensities appearing on the footprinting autoradiogram [9].The study of the interactions of drugs with biomolecules is now the hot topic in modern bioinformatics.This kind study constitutes a significant step towards rational drug design.
The interactions between aminoglycosides and the packaging region of type-1 HIV (Human Immunedeficiency Virus) appear to represent a promising route for antiviral discoveries [10].Aminoglycoside drugs are cationic natural products that interact with RNA [11].The bactericidal effects inherent in these compounds stem from their ability to block protein synthesis by binding to the A-site on ribosomal RNA [12].In fact, aminoglycoside analogues can be used to treat certain diseases.For example, the genetic information in human immunodeficiency virus and various tumour viruses is in the form of RNA [13].Since the genomes of these viruses are likely to have unique structures, it may be possible to design agents that selectively block virus proliferation by targeting a specific site on RNA [14].
One of the present authors has recently introduced the novel computer-aided molecular design scheme TOMOCOMD (acronym of TOpological MOlecular COMputer Design).It calculates several new 2D/3D families of total and local (atom and atom-type) topologic and stochastic molecular descriptors, such as quadratic and linear indices; defined by analogy with the quadratic and linear mathematical maps [15,16].This point of view was very recently successfully applied to the prediction of physical properties and Caco-2 permeability of organic compounds and drugs, respectively [15][16][17][18].Interestingly, molecular quadratic indices can be generalized to allow the codification of 3D-structural features [19].
Therefore, describing an extended TOMOCOMD-CANAR approach to account for RNA structure constitutes the main aim of this paper.In the present study, we propose a total and local definition of nucleic acid quadratic indices of the "macromolecular graph's nucleotides adjacency matrix".The other objective of the present work focused on deriving quantitative structure property relationships to predict the probability and the affinity with which paromomycin bind to the HIV-1 Ψ-RNA packaging region.

Computational Methods
A nucleic acid is a long, unbrached polynucleotide -that is, a polymer consisting of nucleotides.Each nucleotide has the three following components: 1) A cyclic five-carbon sugar, 2) a purine o pyrimidine base attached to the 1'-carbon atom of sugar by N-glycoside bond, and 3) A phosphate attached to the 5'-carbon of the sugar by a phosphoester linkage.The nucleotides in nucleic acids are covalently linked by a second phosphoester bond that joins the 5'-phosphate of one nucleotide and the 3'-OH group of the adjacent nucleotides.The purine and pyrimidine bases are not engaged in any covalent bonds to each other.Thus, a polynucleotide consists of an alternating sugar-phosphate backbone and each nucleotide is characterized by the base attached to it, which can be either adenine (A), cytosine (C), guanine (G) or thymine (T) [RNA molecule contains the base uracil (U) instead of T].Consequently, a RNA molecule is uniquely determined by the sequence of bases along its chain, and it has a definite orientation [20][21][22][23].
In particular, a typical RNA is the single-stranded polyribonucleotide.This macromolecule has a folded 3D conformation that is held together in part by noncovalent base-pairing interactions like those that hold together the two stands of the DNA helix.In the single-stranded RNA molecule, however, the complementary bases pairs form between nucleotides residues in the same chain, which causes the RNA molecule to fold up in a unique way that is important for its biochemical activity.In this sense, the RNA structure contains several sets of unpaired nucleotide residues.Most of the weak interactions (hydrogen bonds) form between Watson-Crick complementary bases (between pairs of nonconsecutive bases), i.e., between A and U and between C and G, but a far from negligible amount of bonds also form between other pairs of bases, as for example the G .U wobble pairs [20][21][22][23].
On the other hand, the general principles of the molecular quadratic indices of the "molecular pseudograph`s atom adjacent matrix" for small-to-medium sized organic compounds have been explained in some detail elsewhere [15][16][17][18][19].However, this work gives an extended overview of this approach.
First, in analogy to the molecular vector X used to represent organic molecules, we introduce here the macromolecular vector (X m ).The components of this vector are numeric values, which represent a certain nucleotide residues (DNA-RNA bases) properties.These properties characterize each kind of nucleotides (purine and pyrimidine bases) within the nucleic acid, because the only uncommon part of these nucleotides is these bases.Such properties can be experimental molar absorption coefficient Є 260 at 260 nm and PH = 7.0, first (∆E 1 ) and second (∆E 2 ) single excitation energies in eV, and first (f 1 ) and second (f 2 ) oscillator strength values (of the first singlet excitation energies) of the nucleotide DNA-RNA bases, and so on [24].For instance, the f 1(B) property of the DNA-ARN bases B takes the values f 1(A) = 0.28 for adenine, f 1(G) = 0.20 for guanine, f 1(U) = 0.18 for uracil and so on [24].Table 1 depicts nucleotides (bases) descriptors properties for the DNA-RNA bases.

Purine and pyrimidine bases (RNA/ADN)
For a given nucleic acid composed of nucleotides (vector of ℜ n ), the "macromolecular vector" (X m ) is constructed and the k th nucleic acid's total quadratic indices, q k (x m ) are calculated as quadratic forms as shown in Eq. 1: where, k a ij = k a ji (symmetric square matrix), n is the number of nucleotides of the nucleic acid, and m X 1 ,…, m X n are the coordinates or components of the macromolecular vector (X m ) in a system of canonical basis vectors of ℜ n .In this case, the canonical ('natural') base of ℜ n {e 1 ,…,e n } is used as the form's base.Thereafter, the coordinates of any vector X m coincide with the components of this vector.For that reason, such coordinates can be considered as weights of the vertices (ADN-ARN bases) of the graph of the nucleic acid's backbone.The coefficients k a ij are the elements of the k th power of the macromolecular matrix M(G m ) of the nucleic acid's graph (G m ).Here, M(G where n is the number of bases (nucleotides) in sugar-phosphate's backbone.The elements a ij are defined as follows: In the definition of X m , as macromolecular   where, E(G m ) represents the set of edges of G m and P ij is the number of edges among the vertices (nucleotides) v i and v j .In this adjacency matrix M(G m ) the row i and column i correspond to vertex v i from G m .The element a ij of this matrix represents a bond between a nucleotide i and other j.Here, we consider only covalent interaction (phosphodiester bond) and hydrogen bond interaction (between complementary bases).As a first approximation, we considered both interactions equivalent.The matrix M k (G m ) provides the number of walks of length k linking the nucleotides i and j.Equation (1) for q k (x m ) can be written as the single matrix equation: (3) where [ m X] is a column vector (a nx1 matrix), [ m X] t the transpose of [ m X] (a 1xn matrix) and M k (G m ) the k th power of the matrix M(G m ) of the macromolecular pseudograph G m (mathematical quadratic form's matrix).Table 2 exemplifies the calculation of q k (x m ) for a secondary structure RNA fragment.
In addition to total quadratic indices, computed for the whole-macromolecule, local-fragment (nucleotide and nucleotide-type) formalisms can be developed.These descriptors are termed local nucleic acid's quadratic indices, q kL (x m ).The definition of these descriptors is as follows: where m is the number of nucleotides of the fragment of interest and k a ijL is the element of the file i and column j of the matrix M k L (G m ).This matrix is extracted from M k (G m ) and contains information referred to the vertices of the specific nucleic acid fragments (F R ) and also of the molecular environment.The matrix M k L (G m ) = [ k a ijL ] with elements k a ijL is defined as follows: where, the k a ij are the elements of the k th power of M(G m ).These local analogues can also be expressed in matrix form by the expression: ) Note that for any partition of a nucleic acid into Z macromolecular fragments there will be Z local macromolecular-fragment matrices.That is to say, if a nucleic acid is partitioned into Z macromolecular fragments, the matrix M k (G m ) can be partitioned into Z local matrices M k L (G m ), L = 1,... Z.The k th power of the matrix M(G m ) is exactly the sum of the k th power of the local Z matrices, and the total nucleic acid's quadratic indices are the sum of the macromolecular quadratic indices of the Z molecular fragments (see Table 2), Any local nucleic acid's quadratic index has a particular meaning, especially for the first values of k, where the information about the structure of the fragment F R is contained.Higher values of k relate to the environment information of the fragment F R considered within the macromolecular graph (G m ).In any case, a complete series of indices performs a specific characterization of the chemical structure.The generalization of the matrices and descriptors to "superior analogues" is necessary for the evaluation of situations where only one descriptor is unable to bring a good structural characterization [25].The local macromolecular indices can also be used together with total ones as variables for QSAR/QSPR (Quantitative Structure-Activity/Structure Relationship) modeling for properties or activities that depend more on a region or a fragment than on the macromolecule as a whole.

Footprinting Data
The data set of footprinted and binding nucleotides was extracted from the literature [9]. Figure 1 depicts the secondary structure of the HIV-1 Ψ-RNA packaging region as well as the binding sites of Paromomycin.A representation of the Ψ-RNA appears along with a summary of binding/enhancement information for Paromomycin.The RNA consists of the 'main stem', positions 213-238 and 361-388; SL-1, which contains the dimmer initiation site; SL-2, having the 5' splice donor site; SL-3, and SL-4, the latter contains the start codon (AUG) for the gag gene.

TOMOCOMD-CANAR Software
TOMOCOMD is an interactive program for molecular design and bioinformatics research [26].In this paper we outline salient features concerning with only one of these subprograms: CANAR.This subprogram bases on a user-friendly philosophy without prior knowledge of programming skills.The calculation of total and local macromolecular quadratic indices for any nucleic acids was implemented in the TOMOCOMD-CANAR software [26].The following list briefly resumes the main steps for the application of this method in QSAR/QSPR: 1. Draw the macromolecular graphs (G m ) for each RNA/ADN of the data set, using the software's drawing mode.Selection of the active nucleotide symbol carries out this procedure.Here, we consider only covalent interaction (phosphodiester bond) and hydrogen bond interaction (between complementary bases).
2. Use appropriated purine and pyrimidine bases weights in order to differentiate the residues in each nucleotide.This work uses as nucleotide weights five properties of DNA-RNA bases (see Table 1) [24].This parametrization is done using the properties of U, T, A, G, and C only, because the only uncommon part of these nucleotides are these bases.
3. Compute the nucleic acid quadratic indices of the "macromolecular graph's nucleotides adjacency matrix".They can be performed in the software calculation mode, which you can select the DNA-RNA bases properties and the family descriptor previously to calculate the macromolecular indices.This software generates a table in which the rows and columns correspond to the compounds and the q k (x m ), respectively.
4. Find a QSPR/QSAR equation by using statistical techniques, such as multilinear regression analysis (MRA), Neural Networks (NN), Linear Discrimination Analysis (LDA), and so on.That is to say, we can find a quantitative relation between a property P and the q k (x m ) having, for instance, the following appearance, P = a 0 q 0 (x m ) + a 1 q 1 (x m ) + a 2 q 2 (x m ) +….+ a k q k (x m ) + c (10) Where P is the measurement of the property, q k (x m ) [or q kL (x m )] is the k th total [or local] macromolecular quadratic indices, an the a k 's are the coefficients obtained by the statistical analysis.
5. Test the robustness and predictive power of the QSPR/QSAR equation by using internal and external cross-validation techniques, 6. Develop a structural interpretation of the obtained QSAR/QSPR model using macromolecular quadratic indices as molecular descriptors.

Statistical Analysis
Based on the discussion above, two simple linear models were proposed to either discriminate between footprinted and interacting (binding) nucleotides or to predict drug-nucleotide affinity.Linear Discrimination Analysis (LDA) and Linear Multiple Regression (LMR) were used to obtain quantitative models, respectively.These statistical analyses were carried out with the STATISTICA software package [27].TOMOCOMD-CANAR model used for both statistical procedures the first 10 q kL (x m ) [from q 0L (x m ) to q 9L (x m )] for each nucleotides in RNA.
Forward stepwise was fixed as the strategy for variable selection.The tolerance parameter (proportion of variance that is unique to the respective variable) used was the default value for minimum acceptable tolerance, which is 0.01.
LDA is used in order to generate the classifier function on the basis of the simplicity of the method [28].To test the quality of the discriminant functions derived we used the Wilks' λ and the Mahalanobis distance.The Wilks' λ statistic for overall discrimination can takes values in the range of 0 (perfect discrimination) to 1 (no discrimination).The Mahalanobis distance indicates the separation of the respective groups.It shows whether the model possesses an appropriate discriminatory power for differentiating between the two respective groups.The classification of cases was performed means of the posterior classification probabilities, which is the probability that the respective case belogs to a particular group, i.e., footprinted or interacting (binding) nucleotides (see Figure 1).In developing this classification function the values of -1 and 1 were assigned to these groups, respectively.The quality of the ADL model also was determined by examining the percentage of good classification and the proportion between the cases and variables in the equation.Validation of the discriminant function was corroborated by means of leave-n-out cross validation procedures.
In addition, external prediction (test) sets assess the robustness and predictive power of the found model.This type of model validation is very important, if we take into consideration that the predictive ability of a QSAR model can only be estimated using an external test set of compounds that was not used for building the model [29,30].The quality of the LMR model was determined examining the statistic parameters of multivariable comparison of regression and cross-validation procedures.In this sense, the quality of models was determined by examining the regression coefficients (R), determination coefficients (R 2 ), Fisher ratio's p-level [p(F)], standard deviations of the regression (s) and the leave-one-out (LOO) press statistics (q 2 , s cv ) [30].In recent years, the LOO press statistics (e.g., q 2 ) have been used as a means of indicating predictive ability.Many authors consider high q 2 values (for instance, q 2 > 0.5) as indicator or even as the ultimate proof of the high predictive power of a QSAR model.

Development of the Discrimination Function: Local (Nucleotide) quadratic indices and the probability of footprinting after RNA-Paromomycin interaction.
The best equation found to discriminate between footprinted and binding nucleotides was: Binding = 1.10836 +93.6133 f1 q 0L (x m ) -5.4682 f1 q 3L (x m ) +0.1356 f1 q 5L (x m ) (11) N = 101 λ = 0.43 D 2 = 6.0 F(3.97) = 43.342ρ = 10.1 p < 0.000 where N is the number of nucleotides, λ is the Wilks's statistic, D 2 is the squared Mahalanobis distance, F is the Fisher ratio and p is the p-level (probability of error).The coefficient ρ was used to control the ratio of the adjustable parameters in the model with respect to the number of variables [31].These statistics indicate that model ( 11) is appropriate for the discrimination of footprinted and nonfootprinted nucleotides studied here.It classifies correctly 95.52% (61/64) of footprinted nucleotides and 79.41% (20/27) of binding nucleotides in training set, for a global good classification of 90.10% (91/101).In Table 3 we give the classification of nucleotides in training set together with their posterior probabilities calculated from the Mahalanobis distance.
LOO cross-validation procedure assessed the predictability of the model obtained by LDA.This methodology systematically removed one data point at a time from the data set.A QSAR model was then constructed on the basis of this reduced data set and subsequently used to predict the removed data point.This procedure was repeated until a complete set of predicted was obtained.Using this approach, the model ( 11) has shown a LOO global good classification of 91.09%.
Secondly, to assess the predictability of the classification model ( 11), a leave-n-out crossvalidation was performed.This model shown an 89.11 and 87.13% of global good classification when n varied from 2 to 3 in the leave-n-out cross validation procedures.The model stabilizes around 88.12% when n was > 3 (see Figure 2).
The most important criterion for the acceptance or not of a discriminant model, such model ( 11), bases on the statistics for the test set.Equation 11 classifies correctly 81.82% (9/11) of both druginteracting nucleotides and footprinted ones.In Table 4, we give the classification of nucleotides in test set.If we considered the data set and the test set (full set) the percentage of good classification was 88.62% (109/121).

RNA
A model such as equation ( 11) may prove to be very useful in predicting the probability of the occurrence of an interaction between a drug and a specific site on the RNA chain.predicted by model (11); ∆P% = [P(interaction) -P(non-interaction)]x100; where P is probability with which the nucleotide is predicted as non-footprinted or footprinted in each group.b Percentage of probability with which the nucleotide is predicted as footprinted or non-footprinted in each groups using LOO cross validation procedures.However, any picture of the drug-RNA interaction is not complete unless the strength of each interaction is also known.With the aim of addressing this issue, a quantitative linear model was developed in order to predict the interaction constants, when they occur.The local affinity constant values [log K(10 -4 M -1 )] were obtained from the same source as the former binding/footprinting data [9].
In the development of the quantitative model for the Log K description of the calibration data set, one nucleotide (A276) stands outs as a statistical outlier.Outlier detection was performed using the following standard statistical test: residual, standardized residuals, Studentized residual and Cooks distance.
Two of present authors reported a similar equation using MARCH-INSIDE descriptors [32].They additionally make use of a dummy variable RNAse, which has the values RNAse = 1 for experiments carried out in the presence of RNAse I and RNAse = -1 for RNAse T1 [32]: Both equations have very similar statistical parameters.Statistical parameters in Eq. 12 suggest a high quality of the found model.The correlation coefficient R is 0.96 and standard deviation is only 0.07x10 -4 M -1 .The squared correlation coefficient (R 2 ) was 0.92 for Eq. 12, so, this model explained more than 92% of the variance for the experimental Paromomycin affinity constant by HIV-1 RNA.
Predictability and stability of the model (12) to data variation is tested here by means of LOO cross validation.The model shows a cross validation standard error of only 0.09.In Table 5, we depict the observed, predicted and predicted (after LOO cross-validation procedures) values of Log K obtained from Eq. 12 and Eq. 13.One on the main problems concerning the application of TIs to QSPR/QSAR studies is that many descriptors are collinear.Therefore, there will be much redundancy of information.Problems with redundancy of information, and collinearity, have been illustrated with the use of TIs, such as the molecular connectivities [33,34].
For a better statistical interpretation of the QSPR/QSAR models (in order to understand which effects cannot be separated), where inter-related indices are considered (such as topologic or topographic indices based on the same graph-theoretical invariant), the inclusion in the model of strongly interrelated variables should be avoided.It is necessary to consider the above-mentioned criterion because an interrelation among different descriptors produces a highly unstable correlation coefficient and makes it difficult to know the real contribution of each variable included in the model [35].To solve this problem Randić proposed a procedure of orthogonalization of molecular descriptors that have been applied with much success to QSPR and QSAR studies [36,37].For the present paper, to alleviate the collinearity between variables in investigated data set, an interrelation study among the nucleic acid quadratic indices was performed, using correlation matrices.The acceptable level of collinearity to avoid is a more subjective issue.In this sense, reports of acceptable correlation coefficients between variables have range from less than 0.4 to 0.9 in the literature.In the view of the Cronin and Schultz [34], the collinearity of the variables should be as low as possible, but must be significantly lower than the statistical fit of the QSPR/QSAR itself.In Table 6, the correlation matrix for this equation shows that there is low collinearity among these variables.

Table 6.
The squared correlation matrix showing covariance (r 2 ) among the macromolecular topological descriptors [local (nucleotide) nucleic acid quadratic indices] used in the regression analysis.
In both model this indices have a positive contribution [ .f1q 0L (x m ) and ∆E1 q 0L (x m ) in models (11) and (12), respectively].This is a logical result, because this indices have a high values for purine nucleotides, which present more probability of drug interaction than pyrimidine ones.This situation means that the probability of binding increased with the consequently increase of electron density of RNA bases, due to this possibility the hydrogen bond and/or electrostatic interaction of amino groups/protonated amine groups with sites on RNA.
Three RNA-quadratic indices of the third order (k = 3) of involved in the early stages of Paromomycin-nucleotide interaction.Such a behavior may be explained by taking into consideration the fact that the electronic and/or topologic changes in the nucleotide backbone, which are necessary for the drug-nucleotide interaction, the more marked structural changes in the ±3-vicinity of the nucleotide.Consequently, two of these indices had a negative contribution in LDA [ f1 q 3L (x m )] and LMR [ ∈250 q 3L (x m )] model.The contribution of the middle-to-high reaching, ±5 and ±10-vicinities of the nucleotide, in both equations show that the interaction between Paromomycin and a nucleotide of RNA depends on the electro-topologic environment of this nucleotide.These results are in relation to the factor that control binding specificity for aminoglycosides' interaction.In general, the Paromomycin prefers to bind bulged or other non-Watson-Crick secondary RNA elements, in consequence this drug is too large to fit into the grooves of regular A-form RNA structure [9].

Concluding Remarks
This study presents a new set of macromolecular descriptors relevant to nucleic acid QSAR/QSPR studies.These descriptors are calculated from the macromolecular graph's nucleotide adjacency matrix.Their derivation is straightforward, and it is easy to interpret the QSARs/QSPRs which include them.The local (nucleotide) quadratic indices, LDA, and LMR have been used to predict the probability and the affinity of Paromomycin binding by the packing HIV-1 region.The resulting quantitative models are significant from a statistical point of view.A LOO cross-validation procedure (internal validation) and an external predicting series (external validation) revealed that the QSAR models had a good predictability.
The models found to describe the interaction profile include nucleotide's quadratic indices accounting for electronic and topologic features of each nucleotide in RNA molecule.These models not only are good enough to predict the interaction parameters, but also permit the interpretation of the driving forces of such interaction processes.In this sense, developed equations involve short-reaching (k ≤ 3), middle-reaching (4 < k ≤ 9) and far-reaching (k = 10 or greater) nucleotide's quadratic indices.This situation points to that the interaction between Paromomycin and a nucleotide of RNA depends on the electro-topologic environment of the nucleotides.
The approach described here represents a novel and rather promising way to chem & bioinformatics research.We would expect computational nucleic acid science to have a similar effect on the search for new vaccines, receptors, drugs, and so on as molecular modeling and QSAR have had on search for new drugs.

Table 2 .
A close up to the mathematical definition of total (RNA fragment) and local (nucleotide) nucleic acid quadratic indices of the "macromolecular graph's nucleotide adjacency matrix" of a RNA fragment.
vector, the symbol of the bases is used to indicate the corresponding AND-RNA bases property, for instance, f 1 .That is: if we write A it means f 1(A), adenine first oscillator strength values or some bases property, which characterizes each nucleotide in the nucleic acid molecule.So, if we use the canonical bases of ℜ 13 , the coordinates of any macromolecular vector X m coincide with the components of that macromolecular vector.[X m ] t = [0.20 0.28 0.13 0.18 0.20 0.20 0.18 0.20 0.28 0.20 0.18 0.28 0.13] [X m ] t : Transposed of [X m ] and it means the vector of the coordinates of X m in Canonical base of ℜ 13 (a row matrix) [X m ]: vector of the coordinates of X m in Canonical base of ℜ 13 (a columns matrix) The program is composed by four subprograms, each one of them dealing with drawing structures (drawing mode) and calculating 2D and 3D molecular descriptors (calculation mode).The modules are named CARDD (Computed-Aided 'Rational' Drug Design), CAMPS (Computed-Aided Modeling in Protein Science), CANAR (Computed-Aided Nucleic Acid Research) and CABPD (Computed-Aided Bio-Polymers Docking).

Figure 1 .
Figure 1.HIV-1 Ψ-RNA packaging region represented on the TOMOCOMD-CANAR interface.Nucleotides involved in binding and enhancement (structural changes) for RNAse I are shown as filled circles and triangles, respectively (open symbols indicates the use of RNAse T1).

Figure 2 .
Figure 2. Behavior of the global or total percentage of good classification in different nfold cross-validation analysis.

Table 3 .
Training Set Classification results.
a Nucleotide-Paromomycin interaction

Table 4 .
Test set classification results.This is very important information for the study of the mechanism of action of potential drugs with RNA as the target.

Table 5 .
Observed, predicted and predicted (alter LOO cross-validation procedures) values of Log K obtained from Eq. 11 and Eq.12.
NUC: Nucleotide.The values are a Observed, b y d Predicted, and c y f Predicted by LOO procedures for log K (10 -4 M -1 ) (affinity constant of Paromomycin for RNA), by Eq. 12 and Eq. 13, respectively.