Introducing Catastrophe-QSAR. Application on Modeling Molecular Mechanisms of Pyridinone Derivative-Type HIV Non-Nucleoside Reverse Transcriptase Inhibitors

The classical method of quantitative structure-activity relationships (QSAR) is enriched using non-linear models, as Thom’s polynomials allow either uni- or bi-variate structural parameters. In this context, catastrophe QSAR algorithms are applied to the anti-HIV-1 activity of pyridinone derivatives. This requires calculation of the so-called relative statistical power and of its minimum principle in various QSAR models. A new index, known as a statistical relative power, is constructed as an Euclidian measure for the combined ratio of the Pearson correlation to algebraic correlation, with normalized t-Student and the Fisher tests. First and second order inter-model paths are considered for mono-variate catastrophes, whereas for bi-variate catastrophes the direct minimum path is provided, allowing the QSAR models to be tested for predictive purposes. At this stage, the max-to-min hierarchies of the tested models allow the interaction mechanism to be identified using structural parameter succession and the typical catastrophes involved. Minimized differences between these catastrophe models in the common structurally influential domains that span both the trial and tested compounds identify the “optimal molecular structural domains” and the molecules with the best output with respect to the modeled activity, which in this case is human immunodeficiency virus type 1 HIV-1 inhibition. The best molecules are characterized by hydrophobic interactions with the HIV-1 p66 subunit protein, and they concur with those identified in other 3D-QSAR analyses. Moreover, the importance of aromatic ring stacking interactions for increasing the binding affinity of the inhibitor-reverse transcriptase ligand-substrate complex is highlighted.


Introduction
Among the mathematical theories that model open-system dynamics, Thom's theory of catastrophes has acquired much popularity for its simple yet valuable description of the system-environment interaction that includes phenomena such as steady state equilibrium and life cycles [1]. In particular, biological systems come first under catastrophe modeling because they display a causal action-reaction response to various natural or imposed constraining limits. As an example, the reactions of organisms to vital toxicological threats were developed into the survival attractor concept by employing butterfly bifurcation phenomenology, which is closely related to the cusp catastrophe, thus revealing the close connection with the turning points around singularity points of the fundamental central field laws of attraction [2]. The cusp catastrophe was further implemented in the physiological processes of predation and generation, thus giving mathematical support to Heidegger's philosophical concept of entity and having the major consequence of translating the ontological entities into computer language [3]. Following this line of application, Jungian psychology entered the topological approach phase through modeling personal unconscious and conscious states using the swallowtail catastrophe [4]. As a consequence, neuro-self-organization was advanced by reduction to cusp synergetics as an archetypal precursor of epileptic seizures [5]. Nevertheless, in chemistry the catastrophe approach enters through the need to unitarily characterize elementary processes such as chemical bonding, leading to the so-called bonding evolution theory and reformulation of the electronic localization functions [6,7]. In the last decade, catastrophe theory was successfully grounded on Hilbert space modeling with the density matrix and non-linear evolution as specific tools for the non-commutative (quantum) systems [8]. At this point, the interesting connection with the linear superposition of quantum states may be generalized in a non-linear manner with direct correspondence for widespread quantitative structure-activity relationship (QSAR) treatments of the "birth and death of an organism".
In this context, the present contribution provides in silico assistance to clinical efforts in current antiretroviral therapy by contributing to the development of a given class of actual anti-HIV-1 compounds and identifying their viral inhibitory mechanisms and influential structural factors. Continuous efforts both in theory and in clinical practice are made to provide new and valid data for HIV infection management. Note that acquired immunodeficiency deficiency syndrome (AIDS) was first recognized in 1981. Only 25 compounds have been approved for use in HIV infected patients, and they are distributed among several classes of antiretroviral drug types [9,10]: nucleoside reverse transcriptase inhibitors (NRTIs); nucleotide reverse transcriptase inhibitors (NtRTIs); non-nucleoside reverse transcriptase inhibitors (NNRTIs); protease inhibitors (PIs); cell entry (or fusion) inhibitors (FIs); co-receptor inhibitors (CRIs); and integrase inhibitors (INIs). Among these, it is well known that most NNRTIs have a low genetic barrier to resistance, i.e., high viral resistance may be induced by a single mutation at the NNRTI binding site [11]. It is this particular feature that makes NNRTIs so well adapted for a comprehensive catastrophe theory application. Although NNRTIs are an open battlefield for research, being highly active in naïve and drug-resistant HIV infected patients [12], QSAR methods are cost-effective approaches to developing new and potent molecules with increased anti-HIV activity [13][14][15][16][17][18][19][20][21][22][23]. As a viable alternative to the available 3D-QSARs, the present endeavor makes the first steps toward generalizing multi-linear QSAR to non-linear catastrophe QSAR analysis and toward providing a conceptual-computational framework in which both the interactions occurring between the pyridinone derivatives and the NNRTI binding site and the structural domains influential for HIV-1 RT inhibitory activity are accounted for [24].

QSAR Phenomenology
The fundamental problem of structure-activity analysis may be described as follows: given a congener set of N-compounds/molecules with measured/observed activity (A) one searches for the best correlation of it with the structural (intrinsic, internal) molecular information quantified by M-properties (such as hydrophobicity, polarization, total energy), classically presented in multi-linear form [25][26][27][28][29][30][31]: ... ... 1 1 0 (1) Equation (1) has some basic features, namely:  Y stands for the computed activity, not the observed activity, from the statistical characteristics of the present approach; thus the validation of Equation (1) should be done for another (preferably external or testing) set of compounds with which the predictive power of Equation (1) is tested.  Because the right side of Equation (1) unfolds as a linear summation of the structural characteristics, it corresponds in fact with the quantum superposition principle, which provides a global Eigen-solution for a quantum system from its particular realization in orthogonal or projective sub-space; from where the need arises for structural indices X 1 , ..., X M to be either linearly independent or orthogonal in algebraic space built from their associate vectors presented in Table 1.

Observed Activity Structural Predictor Variables
However, in order for the chemical structure be correlated with bio-, eco-, or pharmacological activity in an analytical manner (from where the name Quantitative Structure-Activity Relationship arises) that has sense for the ligand-receptor interaction under study, the Organization for Economic Cooperation and Development (OECD) developed the so-called QSAR-OECD principles, which have already been adopted by the EU Parliament as the official guidelines for further regulation of compounds in the European Union. They are, in short [32]:  QSAR 1: a defined endpoint  QSAR-2: an unambiguous algorithm  QSAR-3: a defined domain of applicability  QSAR-4: appropriate measures of goodness-of-fit, robustness and predictivity  QSAR-5: a mechanistic interpretation, if possible Put differently, they express the essence of the chemical modeling of biological effects while relaying (Husserl-Russell) knowledge phenomenology in a more general manner [33]: Therefore, although the backbone of QSAR modeling is based on equation (1), one should be aware that it represents, despite the innumerable extant studies, only one type of model-the multi-linear type. It is therefore worth refreshing QSAR studies by exploring other ways of combining the structural parameters that cause the observed biological activity. However, although it is clear that non-linear QSAR is the next generation of correlations, one should not search arbitrarily or randomly while having at hand a well-designed theory of non-linear modeling of natural phenomena: Thom's catastrophe theory, the basic assumptions and main working tools of which are presented next.

Thom's Catastrophe Theory
René Thom's catastrophe theory basically describes how, for a given system, continuous action on the control space (C k ), parameterized by C k 's, provides a sudden change in its behavior space (I m ), described by x m variables through stable singularities of the smooth map [34,35] with η(c k , x m ) called the generic potential of the system. Therefore, catastrophes are given by the set of critical points (c k , x m ) for which the field gradient of the generic potential vanishes or, more rigorously: a catastrophe is a singularity of the map M km → C k .
Next, depending on the number of parameters in space C k (also called the co-dimension, k) and on the number of variables in space I m (also called the co-rank, m), Thom classified the generic potentials (or maps) given by Equation (2) as seven unfolding elementary (in the sense of universal) catastrophes, i.e., providing the multi-variable (with the co-rank up to two) and multi-parametrical (with the co-dimension up to four) polynomials listed in Table 2. Going to the higher derivatives of the generic potential (the fields), the control parameter c k * for which the Laplacian of the generic potential vanishes gives the bifurcation point. Consequently, the set of control parameters c # for which the Laplacian of a critical point is non-zero defines the domain of stability of the critical point. It is clear now that small perturbations of η(c*, x) bring the system from one domain of stability to another; otherwise, the system is located within a domain of structural stability.  Remarkably, the cases described above correspond to the equilibrium limit of the dynamical (non-equilibrium) evolution of an open system 0 ,... ) ; ( ); ; where the behavior space is further parameterized by the temporal paths x m (c k , t). The connection with equilibrium is recovered through the stationary time regime imposed on the critical points. In this way, the set of points giving a critical point in the stationary   t regime (the so-called ω-limit) corresponds to an attractor, and it forms a basin, whereas the stationary regime   t (the so-called α-limit) describes a repellor. In this way, the catastrophe polynomials may be regarded either as an asymptotic solution of a dynamical evolutionary system or as a steady state solution allowing the quasi-equilibrium of the ligand-receptor or inhibitor-organism interactions to be described. However, in complex binding systems with multiple evolutionary phases, e.g., the HIV-1 life cycle, the possibility of "linking" the various classes of catastrophes themselves may provide a striking analytical approach to the dynamics and mutational sensitivity of the studied interaction that starts with the actual catastrophe-QSAR method.

Catastrophe-QSAR Method
Aiming to construct a QSAR rationale from the elementary catastrophes, the next steps are implemented: i. Assuming the vectorial form of activities and of associated QSARs are according to Table 2, Table 3 showing catastrophe-QSAR is thereby formed.
ii. Determine the norms for each model iii. Calculate the algebraic correlation factor for each model [31] iv. Calculate the so-called "statistical relative power" index for each model with each set of descriptors where the components are defined as follows:  relative index of correlation: v. Determine the generalized Euclidian distances between corresponding type-I and type-II models employing different descriptors and establish formal matrices for the models' differences for single descriptors, respectively (14) and for pair descriptors  (15) vi. Identify all minimum paths across all differences The combination of descriptors that fulfills this system provides the molecular mechanism of the interaction. The correlation models involved are ordered according to their relative statistical power within the same molecular mechanism, thereby providing the best models. Because pair-descriptors are primarily involved in the present analysis, one can consider the first two such "waves" and their best correlation models up to the second order minimum paths, as in Equation (16). vii. For selected correlation models, in either structure-driven or molecular mechanistic "waves," one employs them to compute the associated predicted activities for test molecules and to provide the statistics regarding the observed activity. If the obtained relative statistical power is close to those characteristic for the trial set of molecules, then these models may be validated for the specific eco-, bio-, or pharmacological problem. Moreover, further insight will be provided by the analysis of the catastrophe shape of the models involved and discussed accordingly.
Nevertheless, more Catastrophe Theory insights and the natural consequence on statistical (Pearson) correlation behavior may be found in Appendix.

Input Data
As a working molecular series, the interesting series of pyridinone derivatives in Table 4 is herein employed [24] because of their potential for improving and complementing the currently available four NNRTIs that have been approved by the U.S. FDA for HIV/AIDS treatment (Nevirapine-Viramune ® , Delavirdine-Rescriptor ® , Efavirenz-Sustiva ® , Etravirine-Intelence ® ), all of which bind to the hydrophobic pocket of HIV-1 reverse transcriptase [38]. The pyridinone derivatives were divided into a training set of 23 compounds and a test set of 9 compounds according to the methods of normal/Gaussian (G) and non-normal/non-Gaussian (NG) fitted activity [39][40][41] (Figure 1).  Table 4 grouped into trial and test congener series.  Table 4. Actual working reverse transcriptase pyridinone inhibitors grouped in Gaussian (G) and non-Gaussian (NG) molecular congeneric sets with their structural information (hydrophobicity, Log P; molecular polarizability POL [Å 3 ] and total optimized energy of formation H [kcal/mol]) computed upon the semi-empirical PM3 method [42], along with their observed activity A = Log (1/IC50) [24].

Results and Discussion
The catastrophe-QSAR algorithm of Section 3 was applied to the molecules of Table 4, and the trial results are presented in Tables 5-9. Table 5. Correlation equations for the Group-I models of Table 3 and the molecular structures and data of Table 4.   (7); (c) computed from Equation (9); (d) computed from Equation (10) (7); (c) computed from Equation (9); (d) computed from Equation (10) Table 9. Single-structure matrices of the Euclidean distances II  of the QSAR and catastrophe models' relative statistics of Table 6 employing Equation (12); note that for the degenerate models of Table 6 that one is employed that displays higher relative statistical power (  ). For the trial set of molecules from Figure 1 and Table 4, the results in Tables 5 and 6 can be interpreted as follows: -First, it is clear that consideration of the catastrophe (polynomial) correlations is an improvement over the old multi-linear QSAR statistics (see also Appendix-A2). -The hydrophobicity indicator gives generally low correlations with any polynomial (linear, multilinear or catastrophe) approach, being a quite irrelevant linear QSAR descriptor (Table 5) but improving up to twice its influence within the swallow tail and butterfly phenomenologies once its fifth and sixth power involvement are considered. Nevertheless, this provides a sign of the value of catastrophe-QSAR for achieving a deeper understanding of the molecular mechanics of specific interactions when the normal multi-linear QSAR does not assign transport descriptors with much predictive power. -The relative statistical power, as defined by Equation (8), does not always parallel the Pearson coefficient or the relative correlation factors, as is evident from Tables 5 and 6. However, because it includes more statistical information, we consider a model as relevant when it has greater individual output of this newly introduced statistical index. In particular, neither the linear nor the multilinear QSAR framework provides a good fit between the statistical correlation and the relative statistical power using the structural parameter combinations considered. Instead, parabolic catastrophe correlations, the cusp and butterfly models, are revealed to be quite relevant, in particular regarding the formation energy (H) for which they show the highest Pearson correlation and relative statistical power values in comparison with the other descriptors plugged into these models. Unfortunately, for the two-variable descriptor models of Table 6, no consistency was found between the highest Pearson value and the relative statistical power apart from a few degenerate cases of descriptors for the parabolic models where the highest relative statistical power value corresponds with the highest Pearson correlation. Note that for the degenerate cases of Table 6, when two mixed descriptors can be combined in two distinct ways, the working model is considered to have maximum relative statistical power.
However, because the two-fold aim of the present research is to find the best predictive model and the molecular mechanism of action for the given set of molecules, the statistical indices of Tables 5  and 6 are employed to compute the first-and second-order differences (or distances) in relative statistical power as described by Equations (12)(13)(14)(15) of Section 3. They correspond to the inter-descriptor/inter-modeling paths of molecular actions, whose minimum values are identified according to the prescription of Equation (16).
Through this minimal relative statistical power path recipe, once the models and descriptors predicted to be on the forefront of the structure-action interaction are selected, they are then further filtered with the testing set to finally identify the best predictive model and reveal the mechanism of action by means of the structural descriptors considered.
In the present case of the HIV inhibitors in Table 4, the data computed from Tables 5 and 6 provide  the results for Tables 7-9, to be discussed herein: - Table 7: At the individual descriptor level, the cusp and butterfly models are very close to each other for Log P and the forming energy H, which is even more relevant for the hydrophobicity, because for the forming energy it transpires from Table 5 that the butterfly model practically reduces to the cusp model because the sixth contribution virtually vanishes. However, for the structural influence on polarizability (POL) the butterfly and swallow tail are the closest models. When one considers the hierarchy of the individual descriptors according to their QSAR-I models in Table 5 in terms of the reduction in relative statistical power through combining it with the catastrophes involved in Table 7, one correspondingly obtains the evolution cycle of the models: Table 8: When the second order distance difference is considered between the individual intermodeling paths of Table 7, it can nevertheless be considered through the further variations of paths of Table 7. Also, the QSAR-I and the fold (F) catastrophe model intervene in changing the influence on specific interactions from POL to H. Therefore, by counting the minimum hierarchy of these paths, the distance ordering is obtained as follows: which, remarkably, confirms the descriptors' cycles of influence in accordance with the first order prescription of Equation (17a). However, a more detailed succession is recorded for the inter-model evolution: When comparing cycles (18b) with (17b), it seems that the QSAR-I and Fold models appear in (18b) at the second cycle after the first one is performed on the prescription of (17b). For this reason also, the direct second order inter-descriptor-inter-models analysis is undertaken, and the results are reported in Table 9, to be discussed hereafter.
- Table 9: Interestingly, in terms of the two structural descriptors, the QSAR model is present even though its individual statistics are not the highest in Table 6; however, judging by the ordering of minimum paths recorded, the coupling descriptors hierarchy is established as: which is associated with the models' evolution One should make "contact" between the descriptor hierarchies [(17a), (18a), (19a)] and the models' cycles [(17b), (18b) and (19b)] by means of the predictivity powers of the models along the minimum paths identified in Tables 7 and 9 with the single and double descriptors, respectively, for the non-Gaussian (NG) molecules of Table 4 and Figure 1. The results are systematically presented in Tables 10 and 11.  Table 10. Predicted activity as computed for the non-Gaussian molecules of Table 4 with the models of Table 5 founded along the minimum paths of Table 7; for each predicted model, its correlation with the observed activity is indicated at the bottom of the  Table 11. Predicted activity as computed for the non-Gaussian molecules of Table 4 with the models of Table 6 founded along the minimum paths of Table 9; for each predicted model, its correlation with the observed activity is indicated at the bottom of the The results of correlation tests in Table 10 indicate the structure index-model activity hierarchy: Somehow the influences of POL and H are reversed relative to the prescription by trial succession of Equation (17a), revealing hydrophobicity as the main influential factor. However, due to the fact that the predicted activities of POL in Table 10 are all in the "opposite evolution direction" with respect to the activities recorded in Table 4, i.e., they are all negative, the uni-parametric tests and their associated hierarchy (20) are discarded, and one looks toward the second class of QSAR and catastrophe algorithms.
Instead, the test correlations of Table 11 provide the structure-activity ordering for the bi-parameter-models  (21) Remarkably, the hierarchy (21) starts with the QSAR model, which is revealed to be at the top of the validated catastrophe models with statistical performance even higher than through the predicted equation of Table 6 and the trial set of Table 4. Moreover, the QSAR-II model involves parameters (Log P & H) that are followed by the hyperbolic umbilic (HU) model in terms of (Log P & POL) parameters, in this way recovering the original mono-structural influences as anticipated by Equations (17a) and (18a). Thus, the series of models in Equation (21) is validated, and it will be further employed to establish the models' successions and the molecular structural pattern of inhibiting anti-HIV-1 drug resistance. To this end, apart from the first and last models of Equation (21), which are associated with the maximum (0.778) and minimum (0.057) test performance, the middle catastrophe models provide closely related performance in the range (0.431, 0.468). Their graphical 3D-representation of the parametric domains Log P: (−1.50, 2.72), POL: (27.87, 38.48) and H: (−63.299, 17.808) of all (trial and test) structures in Table 4 are displayed in Figure 2. Next, it is apparent that they can be coupled according to the same spanned domains, thus forming the activity models' differences  (22) one may reach the following important conceptual-computational conclusions:  The HIV-1 inhibitory activity is triggered by a hydrophobic interaction followed by energetic stabilization of the ligand/substrate (pyrididone derivative/viral protein) interaction here modeled by the heat of molecular formation and eventually completed by the ionic field influence herein represented by the polarizability descriptor.  Although the QSAR multi-linear model should not be excluded from the molecular modeling of complex bio-chemical interactions, it should be complemented with other polynomial correlational catastrophe-type models that produce significant results comparable to those of other 3D-modeling procedures such as docking-based comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) [24]. However, the issue remains of establishing the molecular structure most suitable for HIV-1 inhibitory activity among the considered pool of pyridinone derivatives in Table 4. To this end, the representations in Figure 3 are synergistically employed to identify the molecular structural domains that optimally promote binding of the pyridine derivative to the hydrophobic pocket in the p66 subunit of HIV-1 through searching for joint fulfillment of the following structural parameters and inter-model evolutionary generic principles:  Log P: For positive values, the compound behaves hydrophobically and requires dissolution in an organic solvent; by contrast, for negative values the compound is hydrophilic and can be dissolved directly in an aqueous buffer. For Log P equal to 0, the compound partitions at a 1:1 organic-to-aqueous phase ratio, meaning that it is likely soluble in both organic and aqueous solvents and in cellular environments; thus, values of Log P equal to or greater than zero are selected to achieve hydrophobicity and suitability for the cellular environment [43,44], while characterizing the stacking bonding of aromatic rings [45];  H: Because the formation of a compound from its elements usually is an exothermic process, most heats of formation are negative, and this is also a characteristic of the dynamic equilibrium of ligand-substrate interactions [46]; note that the advantage of using heat of formation as QSAR descriptor resides in the following: it thermodynamically relates with the free energy ln eq G RT K    by the equilibrium constant eq K which parallels the recorded activity at thermodynamic level [24]; it nevertheless expands the Gibbs free energy from the hydrogen to covalent bonding strength [45];  PO: It is expected that "the natural direction of evolution of any system is towards a state of minimum polarizability" [47], while accounting for the dipolar interaction [45];  Activity Models: Represent the same chemical-biological process providing their differences with respect to structural domains are minimized to zero.  Table 11 in the range of the structural indicators (Log P, Pol, H) as abstracted from Table 4. Figure 3. Determination of the structural domains of pyridinone-derivative type non-nucleoside reverse transcriptase inhibitors in the same range of structural descriptors by employing the principles of hydrophobicity, minimum polarizability, binding energy, and the minimum difference between the polynomial activity models of Figure 2; the hydrophobic pocket was identified in the p66 subunit of HIV-1-rt of specific transferase R221239 [48,49].
These principles are applied to the activity models' differences at the top of Figure 3, and they lead to the identification of the structural domain (and even points) characteristic of the pyridinone derivative most well-adapted to inhibiting the HIV-1 life cycle. The graphical results in Figure 3 suggest that the ordering of the structural indicators is: The "solution" of system (23) gives the actual molecules in Table 4  . Most impressively, these molecules were also predicted by the much more sophisticated methods of CoMFA and CoMSIA as having increased binding affinity between the aromatic ring (or wing 2 of the pyridinone derivative) and amino acid Tyr181 of the first molecule and Tyr188 of the last two. These two amino acids are very important in the inhibition of RT by NNRTIs because the most common mutations are Tyr181Cys and Tyr188Cys, and they are responsible for the emergence of viruses resistant to pyridinone derivatives. Therefore, designing pyridinone compounds that allow aromatic ring stacking interactions with Tyr181 and Tyr 188 may prevent these mutations and increase the activity of these anti-HIV drugs.
Overall, the QSAR presented here combined with catastrophe polynomial structure activity relationships provides a reliable conceptual and computational tool for identifying the mechanisms underlying ligand-subtract interactions and the structural domains best able to promote them. Consequently, this method should be further integrated into automated data processing and tested on other complex open systems with bio-or eco-toxicological relevance, especially where evolutionary life-cycles are present.

Conclusions
One of the most challenging battlefields in metabolic virology focuses on the complete and sustained inhibition of the HIV life cycle at its various levels. Thus: "an ideal anti-HIV agent should stop the virus' progress and also the infection of healthy host cells, with no toxicity against normal cell physiology" [50]. Moreover, the ideal anti-HIV agent should avoid the drug-resistance phenomenon of HIV mutant variants. QSAR techniques are cost-effective computer-assisted drug design methods that can be used to obtain potential anti-HIV compounds with powerful biological effects and the lowest possible levels of side-effects and toxicity.
As the predictive roles of modeling and quantitative-structure-activity relationships (QSAR) in medicinal chemistry and drug synthesis are now recognized [51,52], thereby corroborating recent intriguing reports on the modest performance of direct statistical multilinear correlations in genotoxic carcinogenesis modeling of covalent drug binding to DNA followed by mutagenesis [53], the present study advances the idea of non-linear polynomial fits of observed/experimentally available   2 1 , X X f Activity  , with X 1 , X 2 being structural physicochemical parameters (usually hydrophobicity, polarizability and/or forming heat energy in accordance with the basic recommendation of Hansch) [54] under the seven polynomial forms inspired by Thom's catastrophe theory [1] (see Table 3).
As an application of the emerging catastrophe-QSAR analysis to a recently reported set of pyridinone derivatives with non-nucleoside reverse transcriptase inhibitor activity, [24] all the modeling stages required by the OECD-QSAR principles [32] are implemented here in a synergistic manner, namely: (i) A defined endpoint: The hydrophobic binding of the inhibitor in the pocket of the p66 subunit of reverse-transcriptase was confirmed herein through the identification of hydrophobicity as the major influence among all the mono-nonlinear catastrophes employed; see Equation (17). (ii) An unambiguous algorithm: The Spectral-SAR minimum path principle [31,[55][56][57] is here generalized to include relevant combination of statistical information (e.g., the correlation factor R, Student's t-test, Fischer's F-test) to provide an equal footing multi-dimensional Euler distance [see Equations (8)(9)(10)(11)(12)(13)(14)(15)(16)], thus avoiding the previously identified discrepancy in judging the mid-range performance in terms of correlation or other statistical factors [56]. (iii) A defined domain of applicability: By performing linear vs. non-linear QSARs, the present strategy allows for the identification of recommended applicable structural domains through setting their difference to zero via inter-model activity minimization, which is equivalent to assuring the "smoothness" of the inhibitor-protein binding evolution towards the final steric inhibition output. (iv) Appropriate measures of goodness-of-fit, robustness and predictivity: The trial results were evaluated by external validation employing a testing set, which was selected by means of Gaussian vs. non-Gaussian distributions of the compounds' activities, an improvement over the earlier arbitrariness of sampling the compounds only within a certain activity range. For instance, for linear QSAR the predicted correlation was superior to the tested correlation, thus confirming the reliability of this validation technique. (v) A mechanistic interpretation: The selected succession of catastrophe-QSARs indicates that the inhibitor-HIV protein binding mutations that are involved in "birth and death" processes are associated with "waves" of induced activity in certain structural domain variants (see Figure 2). Moreover, the flat QSAR hypersurface should be complemented with catastrophe analysis to determine the specific structural domains for optimum interactions (see Figure 3) and for the associated molecular structure design of NNRT inhibitors.
Because the catastrophe-QSAR approach was found to successfully identify the molecular compounds with the most anti-HIV-1 potency as predicted by other 3D-QSAR methods, these results encourage further applications and implementations of Thom's non-linear correlations with the goal of analytically modeling complex dynamic ligand-receptor interactions, especially on the molecular fragment or structural alert level [41], on a chemometric basis.