Next Article in Journal
Origins of Systems Biology in William Harvey’s Masterpiece on the Movement of the Heart and the Blood in Animals
Next Article in Special Issue
QSAR Analysis of 2-Amino or 2-Methyl-1-Substituted Benzimidazoles Against Pseudomonas aeruginosa
Previous Article in Journal
Neuronal Aneuploidy in Health and Disease:A Cytomic Approach to Understand the Molecular Individuality of Neurons
Previous Article in Special Issue
Quantum-SAR Extension of the Spectral-SAR Algorithm. Application to Polyphenolic Anticancer Bioactivity
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

The Interplay between QSAR/QSPR Studiesand Partial Order Ranking and Formal Concept Analyses

1
Awareness Center, Hyldeholm 4, Veddelev, DK-4000 Roskilde, Denmark
2
Department of General and Applied Chemistry, Kazakh-British Technological University, Тоlе Bi str. 59, Almaty 050000, Kazakhstan
Int. J. Mol. Sci. 2009, 10(4), 1628-1657; https://doi.org/10.3390/ijms10041628
Submission received: 17 February 2009 / Revised: 10 April 2009 / Accepted: 14 April 2009 / Published: 17 April 2009
(This article belongs to the Special Issue Recent Advances in QSAR/QSPR Theory)

Abstract

:
The often observed scarcity of physical-chemical and well as toxicological data hampers the assessment of potentially hazardous chemicals released to the environment. In such cases Quantitative Structure-Activity Relationships/Quantitative Structure-Property Relationships (QSAR/QSPR) constitute an obvious alternative for rapidly, effectively and inexpensively generatng missing experimental values. However, typically further treatment of the data appears necessary, e.g., to elucidate the possible relations between the single compounds as well as implications and associations between the various parameters used for the combined characterization of the compounds under investigation. In the present paper the application of QSAR/QSPR in combination with Partial Order Ranking (POR) methodologies will be reviewed and new aspects using Formal Concept Analysis (FCA) will be introduced. Where POR constitutes an attractive method for, e.g., prioritizing a series of chemical substances based on a simultaneous inclusion of a range of parameters, FCA gives important information on the implications associations between the parameters. The combined approach thus constitutes an attractive method to a preliminary assessment of the impact on environmental and human health by primary pollutants or possibly by a primary pollutant well as a possible suite of transformation subsequent products that may be both persistent in and bioaccumulating and toxic. The present review focus on the environmental – and human health impact by residuals of the rocket fuel 1,1-dimethylhydrazine (heptyl) and its transformation products as an illustrative example.

1. Introduction

In recent years there has been an increasing focus on the possible negative effects to the environment and to the human health from xenobiotics accidentally or deliberately released into our environment. Consequently, the assessment and regulation of chemicals has over the years developed to a major issue in relation to assuring the human health as well as to protect our environment. However, due to an apparent significant lack – or unavailability – of both physico-chemical and toxicological data the vast majority of the chemicals available on the market today has not been properly assessed and regulated. Further, a comprehensive assessment may in many cases be hampered by the fact that only the primary pollutant is assessed whereas the possible multitude of potentially hazardous transformation products escape the assessment simply due to the lack of data. For a discussion of data availability see, e.g., [13].
Deriving data based on Quantitative Structure-Activity Relationships/Quantitative Structure-Property Relationships constitutes as an attractive supplement or even alternative to an experimental data generation, the latter being both time consuming and costly.
In the present review the application of QSAR/QSPR methodologies to investigation the environmental and human health impact of residual rocket fuel, 1,1-dimethylhydrazine (1), as well as a series of its transformation products will be used as an illustrative example [46]. This example further constitutes an illustration of the above mentioned problem associated with primary and secondary pollutants.
Although applying a suite of appropriate QSAR/QSPR models will lead to the required data for the single substances, a further analysis of the mutual relations between the single substances under investigation may appear appropriate. Partial order ranking methodologies appear in this connection as a highly attractive point of departure as this method allows mutual ranking of, e.g., a series of chemical substances based on a simultaneous inclusion of several parameters, like, e.g., persistence, bioaccumulation and toxicity [7,8]. In the present paper the mutual ranking of 1,1-dimethylhydrazine (1) and its transformation products simultaneously based on calculated probabilities for being carcinogenic, mutagenic, teratogenic and embryotoxic will illustrate the principle.
In a proper assessment of the chemical substance, not only the physico-chemical and toxicological characteristics, as the PBT characteristics should be taken into account. Also a series of additional factors may advantageously be considered. Thus, parameters like production tonnage [9], specific release scenarios [9,10], and geographical and site-specific factors in addition to various substance dependent parameters should be taken into account. Further socio-economic factors may be taken into consideration as being illustrated in a series of previous papers [1115]. The more elaborate hierarchical partial order ranking (HPOR) [16] where a larger variety of parameters, e.g. originating from various sources and subsequently combined are taken into account have been applied to give a more comprehensive picture of the human health impact originating from a possible exposure to residual rocket fuel and its transformation products.
To further uncover possible linkages among objects and the describing parameters and thus disclose possible synergisms or antagonisms of the parameters formal concept analysis (FCA) [17,18] appears as the appropriate method. The methodology is closely linked to partial order theory and will in this review be illustrated in a further study on the environmental and human toxicological effects of rocket fuel transformation products.

2. Results and Discussion

The obvious lack of data when talking about the assessment and eventually the regulation of compounds hazardous to the environment and/or to man unequivocally constitutes an incentive to look for alternative and more rapid ways to obtain the required data. A further incentive to look for alternatives to the conventional experimental methods would be the possibility of reducing the consumption of experimental animals. Classification of chemical compounds based on test involving experimental animals typical requires a significant number of animals for each compound. A reduction in the use of experimental animals is strongly desirable.
Apparently, the main problem to be faced apparently is the dilemma between the fact that decisions must be made, the necessary data to do so, however, are lacking. Theoretically based methods turn up as an obvious possibility. Thus, an attractive alternative appears to be the application of Quantitative Structure and Quantitative Property Activity Relationships (QSAR/QSPR) models for deriving data that may substitute for the lack of experimental data, the basic concept being that molecules that are structurally closely related will display similar properties. This is expressed as the ‘Similar Property Principle’ stating that ‘Structurally similar molecules will exhibit similar physicochemical and biological properties’ [19].
Since as early as around 1860 a number of researchers [2023] have applied the inherent notations of the QSAR concept. However, the fatherhood of the QSAR concept as applied today can be ascribed to Hansch [24] through his epoch-making since the beginning of the 50’ties.
Today a major field for application of QSARs is within the field of drug design [2528]. The application of QSAR techniques enables researchers to screen a significant number of potential drug candidates within a rather short time. Thus, the economic benefits are overwhelming.
Within the last 15–20 years the application of QSAR/QSPR in environmental science has increased [2933]. Thus, a wide variety of QSAR/QSPR has been developed to predict environmentally crucial physico-chemical parameters such as solubility, distribution, partition, sorption and bioaccumulation as well as ecotoxicological properties (endpoints) [33]. However, also modeling associated to human health, i.e. toxicological endpoints has been further developed and QSAR/QSPR with a high predicting power in these areas now are available [33].
Hence, today a wide variety of QSAR/QPSR models are available [33], the vast majority of these being available on a commercial basis only. However, free models of high quality are available. The QSAR/QSPR data derived for the studies covered by the present review are obtained using such models as the EPI Suite from the USEPA [34] for the prediction of physico-chemical parameters as well as ecotoxicological data and the PASS software from the Academy of Medical Sciences, Moscow [35] and the ADME/Tox WEB from Pharma Algoritms [36], the latter being a free web version of the commercially available ADME Boxes and ToxBoxes. In the following data derived from our recent studies on residual rocket fuel, 1,1-dimethylhydrazine (1) and a series of its transformation products serve as an illustrative example [46].

2.1. Residual Rocket Fuel and its Transformation products

The Baikonur Cosmodrome in Kazakhstan has over the years been an important site for rocket launching with more than two thousand launches of different rocket-carriers up to now. Today heavy equipment to the International Space Station (ISS) is transported by ‘Proton’ carriers, the propellant used for these rockets being unsymmetrical 1,1-dimethylhydrazine (1), also known as “heptyl”.
The area northeast of the Cosmodrome functions as dropping zone for burned-out rocket fuel containers of the first rocket stage separated in a height of 50 and 100 km (Proton carriers). The fuel containers at this point still contains approx. 0.6 to 4 tons of unburned 1 and about 4 tons of nitrogen oxidants [37]. Significant amounts of residual rocket fuel reach the ground, the actual amount being dependent of the season and are subsequently spread over the surface, where it either evaporates and/or penetrates into the soil [37,38]. Hence, it has been estimated that significant amounts of unburned fuel are being spread over several square kilometers of land.
In addition to the pollution with the primary pollutant 1, a series of so-called secondary pollutants being developed in soil samples polluted by 1 has recently been disclosed [5,6]. This group of compounds constitutes both transformation products that are formed directly from 1 as well as compounds that are formed in various consecutive and possibly surface catalyzed processes. In Figure 1 the the major transformation products disclosed are summarized.

2.2. Environmental Behavior of Rocket Fuel and its Transformation Products

In recent year there has been a special focus on compounds being persistent, bioaccumulating and toxic (PBT’s) or very persistent and very bioaccumulating (vPvB’s) [39], as such compounds obviously are of major environmental concern. Further, hazard properties for bulk chemicals are typically linked to the physical-chemical properties such as molecular weight, aqueous solubility, Henry Law constant, vapor pressure, and octanol-water partition constant and the biodegradation probability [40]. In Table 1 a selection of EPI Suite derived physico-chemical data for the 1 and its transformation products is given together with experimental data when available. The good agreement calculated and experimentally obtained values was found noted [5].
Rather high water solubility, log SW, and correspondingly low octanol-water partition coefficients, log KOW were found and not surprisingly low to very low Henry Law Constants, log HLC for all substances. A high migration potential for these substances was further substantiated through low water-organic carbon partition, log KOC [5].
The majority of the compounds possess acid-base characteristics that may cause a strong affinity to mineral soil particles and thus less susceptible for biodegradation. Thus, Adushkin [41] found 1 to be very persistent in dry soils, suggesting a self-remediation period of certain soils from 1 of about 34 years.
The relatively high vapor pressures, log VP, found [4,5] were associated with an only limited evaporation from an aqueous phase, whereas evaporation from top layers of dry soils could be significant thus reducing a possible terrestrial pollution. In addition also biodegradation should be taken into account. From Table 2 is seen that all compounds apparently rapidly are degraded, the ultimate biodegradation half lives being within weeks, apart from 5 and 13. Furthermore half of the compounds were predicted to be anaerobically degradable. In Table 2 further the calculated residence times in “standard” rivers and lakes [5] (cf. Section 3.1.2) are collected.
A deeper discussion on the implications of the above figures is outside the scope of the present review and the reader is advised to consult the original papers by Carlsen et al. [4,5]. However, for completeness it should be mentioned that none of the compounds possess any significant bioaccumulation potential.

2.3. Ecotoxicology of Rocket Fuel and its Transformation Products

The environmental toxicity of the compounds were derived [4,5] applying the ECOSAR module of the EPI Suite leading to non-polar base line toxicity and polar acute toxicities towards fish, daphnids and green algae as summarized Table 3. Further the chronic toxicities and in certain cases the toxicities towards earthworms were predicted (data not shown here) [5].
From the figures given in Table 3 Carlsen et al. {5] concluded that apart from the primary pollutant, 1, and for the compounds 710 the investigated compounds apparently will not constitute any significant toxicity towards neither aquatic nor terrestrial organisms.
To further analyze the above data for acute toxicity a formal concept analysis was conducted in order possibly to reveal possible synergisms or antagonisms with the group of compounds [42]. In Figure 2 is displayed the line lattice diagram for ecotoxicological effects by 1 and its transformation products as derived by EcoSAR, the behind lying context table being given as Appendix 1.
Obviously the diagram contains a significant number of trivial information, like if the toxicity towards fish for a given compound < 1 mg/L it is also < 10, 100 and 1000 mg/L, respectively. However in addition to such information a series of implication sets and association rules pointed to the fact that for several of the compounds in the study toxicological effects on several species prevailed. Thus, from the FCA it was concluded [42] that for 7 compounds displaying acute toxicities towards fish at concentrations below 100 mg/L (F < 100) also displayed toxicities to daphnids below 100 mg/L (D < 100). Likewise implications were disclosed that for 5 compounds with F < 10 mg/L then D < 10 and A < 1 mg/L, for 4 compounds with F < 5 mg/L then D < 10 and A < 1 mg/L and for 5 compounds with D < 10 mg/L then F < 10 and A < 1 mg/L, respectively.
Further it was disclosed that for six out of seven compounds (86%) with D < 100 mg/L then A < 1 mg/L, for five out of six compounds (83%) with A < 1 mg/L then F < 10 and D < 10 mg/L and for four out of five compounds (80%) with A < 1, F < 10, and D < 10 mg/L then F < 5 mg/L, respectively [42].

2.4. Human Health Impact by Rocket Fuel and its Transformation Products

In a further study Carlsen et al. [6] investigated the possible human health impact of 1 and its transformation products (cf. Figure 1). Thus, the probabilities for the substances to carcinogenic, mutagenic, teratogenic and/or embryotoxic were elucidated using the QSAR/QSPR software PASS (Prediction of Activity Spectra for Substances) [35], whereas absorption, distribution, metabolism and excretion (ADME) characteristics and toxicology, e.g., the probabilities for adverse organ specific health effects were disclosed using the ADME Boxes and ToxBoxes [36].
In Table 4 the results of the ADME calculations are shown. It should be noted that neither an active absorption nor any significant 1st pass metabolism was noted for the compounds apart from 13 [6]. For a detailed discussion of the data the original study by Carlsen et al [6] should be consulted as it is outside the scope of the present review.
In the study by Carlsen et al. [6] also predicted acute toxicities of the rocket fuel and its transformation products (cf. Figure 1) were calculated (data not shown; the original reference should be consulted [6]) are compared to available experimental data.
Carlsen et al. [6] found that in some cases, e.g., in the case of 6 the predicted acute toxicities are significant overestimated, whereas in other cases, like 1 and 15 ToxBoxes apparently underestimates the toxicities. In other cases the agreement was found to be acceptable. For a more elaborate discussion the original reference [6] should be consulted.
Based on the above ADME results it was concluded [6] that the compounds apparently would move freely throughout the body and thus travelling in and out of tissues the compounds may perpetrate its biological effects. Based on calculations applying ToxBoxes (Pharma Algorithms1) the probabilities for adverse organ specific health effects (the blood, the cardiovascular and gastrointestinal systems, the kidneys, the liver and the lungs) were elucidated (Table 5). Based on these data it was concluded [6] that the most likely adverse effects are typically predicted to be in the gastrointestinal system.
Based on the above data (Table 5) the overall assessment of the adverse organ specific health effects immediately turns into a multicriteria problem as several parameters simultaneously had to be taken into account. Hence, Carlsen et al [6] advantageously applied partial order ranking [1113,44] for the subsequent data analyses as this method allows simultaneous inclusion of several parameters. In Figure 3 the Hasse diagram constructed based on predicted adverse organ specific health effects, as derived from the ToxBoxes [36] including the gastrointestinal system (GAS), the liver (LIV) and the lungs (LUN), respectively, the more hazardous compounds being located in the top of the diagram. Thus, on a cumulative basis it was concluded [6] that compounds 4, 5 and 8 were those of major concern followed by the compounds (level 2) 1, 2, 9, 10, 12 and 13. The less hazardous compounds, 11 and 18, are found in the bottom of the diagram.
Carlsen et al. [6] further screened the 18 compounds (cf. Figure 1) for possible adverse biological effects applying the web version of the PASS software (PASS1) with the specific focus at carcinogenicity, mutagenicity, teratogenicity and embryotoxicity. In Table 6 the predicted probabilities for the studied substances being carcinogenic, mutagenic, teratogenic and embryotoxic, respectively are summarized. Only probabilities higher than 0.5 were considered.
Analogously to the above ranking of the compounds based on the adverse organ specific effects the compounds were subsequently ranked according to their probabilities ofbeing carcinogenic (CAR), mutagenic (MUT), teratogenic (TER) and embryotoxic (EMB), respectively (Figure 4).
Comparing the two figures (Figures 3 and 4) obviously some differences prevail although a series of the same compounds appear in the top levels of the diagrams. Thus, based on the PASS predictions Carlsen et al [6] found that 5 appeared as the most dangerous substances followed by the compounds (level 2) 1, 4, 6, 8 and 10, respectively. The compounds 13 and 1518 are found in the bottom of the diagram as equivalent elements in agreement with the fact that these compounds all displayed probabilities less than 0.5 for the parameters studied (cf. Table 6).
Since the compounds at the same level in the diagram cannot immediately be compared Carlsen et al. [6] calculated the averaged rank of the suite of compounds studied using eqn. 4 (see Section 3) resulting in a linear rank of all compounds. In Table 7 the calculated averaged rank of the 17 compounds based on a) GAS, LIV and LUN and b) CAR, MUT, TER and EMB, respectively are given. Obviously, compounds located in the top level (level 1) in the Hasse diagrams (Figures 3 and 4) are calculated to have the top averaged ranks followed by the compounds found at the subsequent levels in the diagrams.
Subsequently an overall assessment of the human health impact by the rocket fuel 1 and its transformation products was estimated applying the Hierarchical Partial Order Ranking (HPOR) approach [6,16]. Hence, the averaged ranks given in Table 7 were adopted as so-called metaparameters [16] denoting the predicted impact according to the ToxBoxes and the PASS calculations, respectively. Subsequently a further Hasse diagram using these meta-descriptors was constructed (Figure 5) the eventual averaged rank elucidating the overall assessment of the 17 compounds with respect to their adverse human health effects are displayed in Table 8.
From Figure 5 and Table 8 Carlsen et al. [6] concluded that in addition to compounds 5 and 4, the major risk apparently would be associated with the hydrazines and the hydrazine derivatives, 1, 8, 9,10, and 12. This conclusion appeared to be parallel to the one drawn looking at the possible environmental impact (vide supra) apart from the fact that the tetrazene, 4, apparently does not appear to exhibit major risk in relation to environmental impact [5].
The here presented results (Figure 4 and 5 and Table 7 and 8) are a nice illustration of the usefulness of partial ordering methodologies in attempts to carry out assessments of, e.g., a group of xenobiotics or as studied by Carlsen et al. [6] of a group of substances consisting of a primary pollutant and a series of transformation products. Hence, through this assessment it was clearly demonstrated that some of the transformation products could lead to adverse health effects at the same or even higher level than the primary pollutant.
To further analyze the above data for adverse human health effects a formal concept analysis was conducted in order possibly to reveal possible synergisms or antagonisms with the group of compounds [42]. In Figure 6 is displayed the line lattice diagram for the probabilities of 1 and its transformation products being CAR, MUT, TER and EMB, respectively as derived by PASS, the behind lying context table being given as Appendix 2.
As in the case of the ecotoxicological data also here the diagram display a series of trivial information like, e.g. for compounds 5 the probability of being carcinogenic > 90 % (C > 90) it is of course also higher that 80 (C > 80), 70 (C > 70), 60 (C > 60), and 50 % (C >50), respectively.
However, in addition to this trivial information a series of implication sets and association rules pointed to the fact that for several of the compounds in the study a multitude of adverse human health effects prevail. In Tables 9 and 10 selected implications sets and association rules are summarized [42]. The notation like M > 70 denotes that the probability of the compounds to be mutagenic being higher that 70%. C, M, T, and E denoted carcinogenicity, mutagenicity, teratogenicity and embryotoxicity, respectively.
Although the above presented FCA studies include only a limited number of substances it nicely illustrates the possibilities to combine QSAR/QSPR generated data with formal concept analyses and thus retrieving important comprehensive information concerning the possible multitude of effects of a group of compounds.

3. Methodology

The basic methodology applied for assessing chemical substances is partial order ranking and formal concept analyses based on QSAR/QSPR generated data. Thus, in the following a description of the applied QSAR/QSPR models will be given. The basic concepts of partial order ranking (POR), including deriving linear extensions (LE), ranking probability and averaged ranks are summarized. Further the more elaborate partial order ranking methodologies, i.e., hierarchical partial order ranking (HPOR) and accumulating partial order ranking (APOR) are described as is the principles and ideas about formal concept analyses (FCA).

3.1. Quantitative Structure-Activity/Property Relationships (QSAR/QSPR)

QSAR/QSPR modeling can in the simplest form be expressed as the development of correlations between a given physico-chemical property or biological activity (endpoint), P, and a set of parameters (descriptors), Di, that are inherent characteristics for the compounds under investigation
P = f ( D i )
The properties (endpoints), P that has been subjected to QSAR/QSPR modeling comprises physicochemical properties and biological activities in the environment as well in the human beings.
In general models that describe/calculate key properties of chemical compounds take into account three types of inherent characteristics of the molecule, i.e., structural, electronic and hydrophobic characteristics. Depending on the actual model few or many of these descriptors may be taken into account. Thus, eqn. 1 can be rewritten as
P = f ( D structural , D electronic , D hydrophobic , D x ) + e
The descriptors reflecting structural characteristics may, e.g., be element of the actual composition and 3-dimensional conFiguration of the molecule, whereas descriptors reflecting the electronic characteristics may, e.g., be HOMO/LUMO energies, charge densities, dipole moment etc. The descriptors reflecting the hydrophobic characteristics are related to the distribution of the compound between a biological, hydrophobic phase, and an aqueous phase. A further, fourth type of characteristics, Dx, (cf. eqn. 2) accounts for possible underlying characteristics that may be known or unknown, such as environmental or experimental parameters as, e.g., temperature, salt content etc. The data may often be associated with a certain amount of systematic and non-quantifiable variability in combination with uncertainties. These unknown variations are expressed as “noise”. Thus, the parameter, e, account for possible noise in the system, i.e., the variation in the property that cannot be explained by the model.
In the studies presented in the present review paper a series of freely available QSAR/QSPR models has been applied. Thus, physico-chemical data, environmental persistence and environmental toxicities have been obtained applying the EPI Suite [32]. The interaction with the human organism has been elucidated through absorption, distribution, metabolism and excretion data derived by ADME Boxes [36] and the human toxicological effects by ToxBoxes [36] and by PASS (Prediction of Activity Spectra for Substances) [35].

3.1.1. Physico-chemical data

The EPI Suite has been applied as the primary tool for generating physico-chemical endpoints [34]. This software package includes a variety of submodules to estimate, e.g., water solubility (log SW) calculated by the submodule WSKOW, octanol-water partition (log KOW) calculated by the submodule KOWWIN, vapor pressure (log VP) calculated by the submodule MPBPWIN, and Henry’s Law constants (log HLC) calculated by the submodule HENRY. Sorption to organic carbon was calculated using the submodule PCKOCWIN. The log KOW values generated in this way are subsequently used to generate bioconcentration factors (log BCF) [43] calculated by the submodule BCF program. Substances with log BCF < 3.0 were regarded as non-bioaccumulating. Substances exhibiting log BCF values of > 3.0, but < 3.70 are assigned a medium bioconcentration potential whereas substances with log BCF > 3.70 were assigned a high bioconcentration potential. [34].

3.1.2. Environmental persistence

Through the BioWin module [34] persistence predictions were obtained. The submodule BDP3 provides estimates of a substance’s environmental biodegradation rate by calculating the degradation probabilities. The lower the probability the higher the persistence. Eventually BDP3 returns the biodegradation potential as hours, hours to days, days, days to weeks, weeks, weeks to months and months, respectively, depending on the approximate amount of time needed for a “complete” biodegradation [34,45].
BDP3Predicted Half-Lives (days)
Hours0.17
Hours to Days1.25
Days2.33
Days to Weeks8.67
Weeks15
Weeks to Months37.5
Months60
Recalcitrant180
Substances with half lives >180 days are assigned high persistence potential, the corresponding BDP3 value being <1.75, whereas substances a half-life in the predominant compartment of ≥ 60 and ≤ 180 days are assigned medium persistence potential, the corresponding BDP3 value being > 1.75 and < 2.0 [45].
The fate in the aquatic media is, in addition to the biodegradation estimated as the potential for volatilization from water. In the present study volatilization from rivers (water depth 1m, wind velocity 5 m/s and current velocity 1 m/s) and from lakes (water depth 1m, wind velocity 0.5 m/s and current velocity 0.05 m/s) was calculated using the WVOLWin module in EPI Suite [34].

3.1.3. Environmental toxicity

Toxicities of the investigated substances have been obtained using the ECOSAR [46] that calculates the toxicity of chemicals discharged into water. Both acute (short-term) toxicities and chronic (longterm or delayed) toxicities are calculated by ECOSAR, the calculations being based on the octanolwater partition (log KOW). ECOSAR can run independently or as an integrated part of the EPI Suite
ECOSAR return the acute as well as chronic toxicities of the substance under investigation to fish (both fresh and saltwater), water fleas (daphnids), and green algae. In some cases also other effects, e.g., toxicity to earthworms are returned. The acute toxicities are calculated as LC50 values.

3.1.4. Absorption, Distribution, Metabolism and Excretion (ADME)

Predictions for the absorption, distribution, metabolism and excretion (ADME) and Toxicology are obtained using freely and commercially available in silico expert systems, i.e., the web version of the ADME Boxes software [36] based on ADME Boxes ver. 3.5. ADME Boxes is modulized software that allows calculation of selected physico-chemical data, oral bioavailability (human), human intestinal absorption, transport, distribution including volume of distribution and plasma bound fraction based on the chemical structure. The software modules are based on exacting data analyses and expert models for calculating the vital properties.
Calculations on the concentration of the single compounds in the plasma as a function of time are generated using the ADME Boxes ver. 4.1 [47] as this feature is currently not implemented in the free web version.

3.1.5. ToxBoxes

Acute toxicity towards mouse and rat as well as the probability of adverse organ specific health effects affecting the blood, the cardiovascular- and gastrointestinal systems, the kidneys, the liver and the lungs, respectively and a positive response in an Ames test is derived using the web version of the ToxBoxes software [36] based on ToxBoxes ver. 2.0. ToxBoxes is modulized software that allows calculation of toxic effects of molecules solely from the chemical structure in combination with expertise in organic chemistry and toxicology.
The validation of the ADME Boxes and ToxBoxes software has been carried out as a validation of the single modules. Overall it can be stated that the accuracy of the ADME Boxes and the ToxBoxes are high. Thus, in the case of Ames test the accuracy was found to be in the order of 95% based on a validation set of ca. 1,700 substances [48]. Typical values for the various modules comparing experimental and predicted values for a series of compounds not being involved in the model development (validation set) were R2 higher than 0.8.

3.1.6. Prediction of Activity Spectra for Substances (PASS)

The computer program PASS (Prediction of Activity Spectra for Substances) developed by the Academy of Medical Sciences, Moscow, predicts the biological activity for a compound on the basis of its structural formula [35].
The freely available internet version of PASS allows the prediction of 2,468 pharmacological effects as well as mechanisms of action [49]. For the studies referred to in this review PASS has been used to derive probabilities for the invested compounds to carcinogenic, mutagenic, teratogenic and embryotoxic. In the case of carcinogenicity the highest value predicted (male/female mice, male/female rats) were applied. The PASS training set includes approx. 46,000 biologically active compounds, comprising about 16,000 already launched drugs and 30,000 drug-candidates currently under clinical or advanced preclinical testing. [50]. The accuracy of the PASS predictions has been reported to be approx. 86% [51,52], Thus the maximum error of prediction has been estimation to be approx. 15, 13, 21 and 20% for prediction of carcinogenicity, mutagenicity, teratogenicity and embryotoxicity, respectively [51]. For all compounds referred to in present review, rocket fuel and transformation products, the number of new descriptors are 0, 1 or, at a maximum, 2, respectively, and thus complying with the limitations of the method [53].

3.2. Partial Order Ranking (POR)

The theory of partial order ranking is presented elsewhere [44,54]. In brief, Partial Order Ranking is a simple principle, which a priori includes “≤” as the only mathematical relation. If a system is considered, which can be described by a series of descriptors pi, a given site A, characterized by the descriptors pi(A) can be compared to another site B, characterized by the descriptors pi(B), through comparison of the single descriptors, respectively. Thus, site A will be ranked higher than site B, i.e., B ≤ A, if at least one descriptor for A is higher than the corresponding descriptor for B and no descriptor for A is lower than the corresponding descriptor for B. If, on the other hand, pi(A) > pi(B) for descriptor i and pj(A) < pj(B) for descriptor j, A and B will be denoted incomparable. Obviously, if all descriptors for A are equal to the corresponding descriptors for B, i.e., pi(B) = pi(A) for all i, the two sites will have identical rank and will be considered as equivalent, i.e., A = B. In mathematical terms this can be expressed as
B A p i ( B ) p i ( A ) for all i
It further follows that if A ≥ B and B ≥ C then A ≥ C. If no rank can be established between A and B these sites are denoted as incomparable, i.e., they cannot be assigned a mutual order. Therefore POR is an ideal tool to handle incommensurable attributes.
In partial order ranking – in contrast to standard multidimensional statistical analysis – neither any assumptions about linearity nor any assumptions about distribution properties are made. In this way the partial order ranking can be considered as a non-parametric method. Thus, there is no preference among the descriptors. However, due to the simple mathematics outlined above, it must be emphasized that the method a priori is rather sensitive to noise, since even minor fluctuations in the descriptor values may lead to non-comparability or reversed ordering.
A main point is that all descriptors have identical orientations, i.e., “high” and “low”. As a consequence of this, it may be necessary to multiply some descriptors by −1 in order to achieve identical directions. As an example bioaccumulation and toxicity can be mentioned. In the case of bioaccumulation, the higher the number the higher a chemical substance tends to bioaccumulate and thus the more problematic the substance, whereas in the case of toxicity, the lower the Figure the more toxic the substance. Thus, in order to secure identical directions of the two descriptors, one of them, e.g., the toxicity Figures, has to be multiplied by −1. Consequently, both in the case of bioaccumulation and in the case of toxicity higher Figures will now correspond to more problematic sites.
The graphical representation of the partial ordering is often given in a so-called Hasse diagram [5558]. In practice the partial order rankings are performed using the WHasse software [58]. An alternative to the WHasse software is the DART (Decision Analysis by Ranking Techniques) that comprises different kinds of order ranking methods, roughly classified as total - and partial order ranking methods [59] or the PyHasse software currently being developed by R. Brüggemann [60].

3.2.1. Linear extensions and ranking probabilities

The number of incomparable elements in the partial ordering constitutes a limitation in the attempt to rank, e.g., a series of chemical substances based on their potential environmental or human health hazard. To some extent this problem can be remedied through the application of the so-called linear extensions of the partial order ranking [61,62]. A linear extension is a total order, where all comparabilities of the partial order are reproduced [54,55]. Due to the incomparabilties in the partial order ranking, a number of possible linear extensions correspond to one partial order. If all possible linear extensions are found, a ranking probability can be calculated, i.e., based on the linear extensions the probability that a certain element has a certain absolute rank can be derived. If all possible linear extensions are found it is possible to calculate the averaged ranks of the single elements in a partially ordered set [63,64].

3.2.2. Averaged ranks

Based on the linear extensions the averaged rank of the single elements can be established. The averaged rank is simply the averaged of the ranks in all the linear extensions. On this basis the most probable rank for each element can be obtained leading to the most probably linear rank of the elements studied.
The generation of the averaged rank of the single element in the Hasse diagram can be obtained through deriving a large number of randomly generated linear extensions [6567]. The random linear extension approach allows in addition to the determination of the averaged ranks of the single elements also the determination of the ranking probability distribution of the single elements (cf. [14,15]).
Alternatively the generation of the averaged rank of the single sites in the Hasse diagram is obtained applying the simple relation recently reported by Brüggemann et al [68]. The simple relation can obtain the averaged rank of a specific element, ci.
R k a v ( c i ) = ( N ( c i ) + 1 ) ( S ( c i ) + 1 ) × ( N ( c i ) + 1 ) / ( N ( c i ) + 1 U ( c i ) )
where N(ci) is the number of elements in the diagram, S(ci) the number of successors, i.e., comparable element located below, to ci and U(ci) the number of elements being incomparable to ci [68]. It is immediate seen that in the ranking according to eqn. 4 the lower the number the higher the levels. Thus, the highest level will be “1”. This is reversed compared to the original approach [68].

3.2.3. Hierarchical POR

Based on the linear extensions the averaged rank of the single elements can be established. The averaged rank is simply the averaged of the ranks in all the linear extensions. On this basis the most probable rank for each element can be obtained leading to the most probably linear rank of the elements studied. These linear ranks can be regarded as meta-descriptors. If a series of such metadescriptors are generated from a set of partial order rankings they subsequently may constitute the basis for further ranking in a second stage, i.e., a consecutive POR.
By this process the number of descriptors is significantly reduced and the ranking based on metadescriptors may, in contrast to a simultaneous inclusion of all original descriptors, lead to development of a robust model [69] that in principle will contain all information based on the original set of descriptors [16].
Since the meta-descriptors, as the descriptors, are ordered with the highest rank being denoted “1”, the meta-descriptors must all be multiplied by −1 in order to make sure that the elements with the highest rank, i.e., with the lowest attributed number, will be ranked in the top of the Hasse diagram as a result of the ranking based on the meta-descriptors. In Figure 7 a graphical representation of the HPOR approach is depicted.

3.3. Formal Concept Analysis (FCA)

Formal concept analysis (FCA) is a methodology to derive linkages between a set of objects, e.g., chemicals, and a set of associated parameters, e.g., the properties of these chemicals [17,18]. Thus, in short FCA can be as a system consisting of three parts, a context, or a triple (C,P,L), where C are the set of objects (chemicals) and P the set of parameters. L is the relation between the two sets C and P. Thus, if a chemical, c, belongs to the set C and c a parameter, p, belonging to the set P, (c,p) is said to belong to L.
The set of parameters that are associated with a given object, chemical, can be regarded as a set of binary, i.e., on/off statements. Either the chemical has a given parameter, e.g. being carcinogenic, or not.
Typically a context will be seen as arranged in matrix form with the single objects as rows and the associated parameters as columns. Hence, an “X” in this table will indicate that a given object has the given parameter (on-status) whereas an empty space indicates that this parameter is not associated with the given object (off-status). Examples of contexts are given in Table XX and YY (vide supra).
For the studies referred to in this review the software ConExp [70] was applied to generate the lattice line diagrams as well as the implication sets and association rules.

3.3.1. Line diagrams

The lattice line diagram consists of circles, lines and the names of all objects/chemicals (given in white boxes) and parameters of the context (given in grey boxes) where the circles represent the concepts. Blue filled upper semi-circle indicates that there is an attribute attached to this concept. Black filled lower semi-circle indicates that there is an object attached to this concept.
From the diagram the information of a context can be read as: a chemical (object), c, has a parameter (characteristic), p, attached only if there is an upward line from the circles with the label c to a circle with the label p.

4. Conclusions

In the present study the interplay between QSAR/QSPR and partial order ranking and formal concept analyses reviewed. It has been demonstrated that QSAR/QSPR models advantageously can be used to generate physico-chemical and ecotoxicological data (EPI Suite) as well as data to elucidate possible adverse human health effects (ADME/Tox Boxes and PASS). It has further been demonstrated, using residual rockets fuel, 1,1-dimethylhydrazine, and a series of its transformation products as an illustrative example that a further data treatment advantageously can be carried out applying partial order ranking (POR) methodologies as well as formal concept analysis (FCA). Whereas the partial order ranking methodologies lead to a prioritization of the studied chemicals simultaneous taking a multitude of parameters into account, the formal concept analysis leads to valuable information on possible links between the studied chemicals and the associated parameters. As such the combination QSAR/QSPR – POR – FCA constitutes a highly effective decision support tool.

Acknowledgments

The author is grateful to Dr. Rainer Brüggemann, Leibniz Inst. Freshwater Ecology and Fisheries, Berlin, Germany and to Drs. Bulat N. Kenessov and Svetlana Ye. Batyrbekova, Center of Physico-Chemical Methods of Investigations and Analysis of al-Farabi Kazakh National University, Almaty, Kazakhstan, for valuable discussions in relation to previously published joint papers referred to in the present study.

Appendix

Appendix 1. Context table for ecotoxicological effects by 1,1-dimethyl hydrazine and its transformation products as derived by EcoSARa [46].
Appendix 1. Context table for ecotoxicological effects by 1,1-dimethyl hydrazine and its transformation products as derived by EcoSARa [46].
No.F1000F100F10F5F1D1000D100D10D5D1A1000A100A10A5A1
1XXXXXXXXXXX
2XXXXX
3XXXXX
4X
5XXX
6
7XXXXXXXXXXXX
8XXXXXXXXXXXXX
9XXXXXXXXXXXXX
10XXXXXXXXXXXX
11XXXX
12XXXXXXXXX
13XXXXX
14XXXX
15
16
17
18X
a F1000, F100, F10, F5 and F1 denote EcoSAR derived toxicities towards fish being higher than 1,000, 100, 10, 5 and 1 mg/L, respectively. Analogously for D (daphnids) and A (algae).

Appendix 2. Context table for carcinogenic, mutagenic, teratogenic and embryotoxic action by 1,1-dimethylhydrazine and its transformation products as derived by PASSa [35].
Appendix 2. Context table for carcinogenic, mutagenic, teratogenic and embryotoxic action by 1,1-dimethylhydrazine and its transformation products as derived by PASSa [35].
No.C5C6C7C8C9M5M6M7M8M9T5T6T7T8T9E5E6E7E8E9
1XXXXXXXXXXXX
2XXXXXXXXXXXXXX
3X
4XXXXXXXXXXXXXXXX
5XXXXXXXXXXXXXXXXXXX
6XXXXXXXXXX
7XXXXXXXXX
8XXXXX
9XX
10XXXXXXXXXXXXX
11XX
12XXXXXX
13
15
16
17
18
a C5, C6, C7, C8 and C9 denote PASS predicted probabilities for the compound being carcinogenic higher than 50, 60, 70, 80 and 90%, respectively. Analoguously for M (mutagenic), T (teratogenic) and E (embryotoxic).

References and Notes

  1. Voigt, K; Brüggemann, R; Pudenz, S. Chemical databases evaluated by order theoretical tools. Anal. Bioanal. Chem 2004, 380, 467–474. [Google Scholar]
  2. Voigt, K; Brüggemann, R; Pudenz, S. A multi-criteria evaluation of environmental databases using the Hasse Diagram Technique (ProRank) software. Environ. Modell. Softw 2006b, 21, 1587–1597. [Google Scholar]
  3. Voigt, K; Brüggemann, R; Pudenz, S. Information quality of environmental and chemical databases exemplified by high production volume chemicals and pharmaceuticals. Online Inf. Rev 2006, 30, 8–23. [Google Scholar]
  4. Carlsen, L; Kenesova, OA; Batyrbekova, SE. A preliminary assessment of the potential environmental and human health impact of unsymmetrical dimethylhydrazine as a result of space activities. Chemosphere 2007, 67, 1108–1116. [Google Scholar]
  5. Carlsen, L; Kenessov, BN; Batyrbekova, SYe. A QSAR/QSTR study on the environmental health impact by the rocket fuel heptyl and its transformation products. Environ Health Insights. 2008, 1, pp. 11–20.
  6. Carlsen, L; Kenessov, BN; Batyrbekova, SYe. A QSAR/QSTR study on the human health impact of the rocket fuel 1,1-dimethyl hydrazine and its transformation products. Multicriteria hazard ranking based on partial order methodologies. Environ. Toxicol. Pharmacol 2009, 27, 415–423. [Google Scholar]
  7. Ivanciuc, T; Ivanciuc, O; Klein, DJ. Posetic Quantitative Superstructure/Activity Relationships (QSSARs) for Chlorobenzenes. J. Chem. Inf. Model 2005, 45, 870–879. [Google Scholar]
  8. Klein, DJ; Ivanciuc, T; Ryzhov, A; Ivanciuc, O. Combinatorics of Reaction-Network Posets. Comb. Chem. High Throughput Scr 2008, 11, 723–733. [Google Scholar]
  9. Verdonck, FAM; Boeije, G; Vandenberghe, V; Comber, M; de Wolf, W; Feijtel, T; Holt, M; Koch, K; Lecloux, A; Siebel-Sauer, A; Vanrolleghem, PA. A rule-based screening environmental risk assessment tool derived from EUSES. Chemosphere 2005, 58, 1169–1176. [Google Scholar]
  10. Sørensen, PB; Brüggemann, R; Carlsen, L; Mogensen, BB; Kreuger, J; Pudenz, S. Analysis of monitoring data of pesticide residues in surface waters using partial order ranking theory. Environ. Toxicol. Chem 2003, 22, 661–670. [Google Scholar]
  11. Carlsen, L. Giving molecules an identity. On the interplay between QSARs and Partial Order Ranking. Molecules. 2004, 9, pp. 1010–1018.
  12. Carlsen, L. A QSAR Approach to physico-chemical data for organophosphates with special focus on known and potential nerve agents. Internet Electron J Mol Des. 2005, 4, pp. 355–366.
  13. Carlsen, L. Partial order ranking of organophosphates with special emphasis on nerve agents, MATCH-Commun. Math. Comput. Chem 2005, 54, 519–534. [Google Scholar]
  14. Carlsen, L. Interpolation schemes in QSAR. In Partial Order in Environmental Sciences and Chemistry; Brüggemann, R, Carlsen, L, Eds.; Springer: Berlin, 2006; pp. 163–180. [Google Scholar]
  15. Carlsen, L. A combined QSAR and partial order ranking approach to risk assessment. SAR QSAR Environ. Res 2006, 17, 133–146. [Google Scholar]
  16. Carlsen, L. Hierarchical partial order ranking. Environ. Pollut 2008, 155, 247–253. [Google Scholar]
  17. Burmeister, P. Formal concept analysis with ConImp: Introduction to the basic features.
  18. Formal Concept Analysis: Foundations and Applications; Ganter, B; Stumme, G; Wille, R (Eds.) Springer: Berlin, Germany, 2005.
  19. Johnson, MH; Maggiora, GM. Concepts and Applications of Molecular Similarity; Wiley: New York, USA, 1990. [Google Scholar]
  20. Crum-Brown, A; Frazer, T. On the connection between chemical constitution and physiological action. Codeia, Morphia and Nicotia Trans Royal Soc. Edinburgh 1868/1869, 25, 257–274. [Google Scholar]
  21. Meyer, H. Zur Theorie der Alkoholnarkose, welche Eigenschaft die Anästhetica bedingt ihre narkotische Wirkung. Arch Exp Pathol Pharmakol 1899, 42, 109–118. [Google Scholar]
  22. Overton, CE. Studien über die Narkose. In Zugleich ein Beitrag zur Allgemeine Pharmakologie; Gustav Fischer Verlag: Jena, Germany, 1901. [Google Scholar]
  23. Ferguson, J. The use of chemical potentials as indicators for toxicity. Proc. Royal Soc. B 1939, 127, 387. [Google Scholar]
  24. Hansch, C; Leo, A. Exploring QSAR Fundamentals and Applications in Chemistry and Biology; ACS: Washington DC, USA, 1995. [Google Scholar]
  25. Bevan, DR. QSAR and Drug Design.
  26. 3D-QSAR in drug design. In Theory Methods and Applications; Kubinyi, H (Ed.) Kluwer/Escom: Dordrecht, The Netherlands, 1998.
  27. Neural Networks in QSAR and Drug Design; Devilliers, J (Ed.) Academic Press: London, UK, 1996.
  28. Kurup, A; Garg, R; Carini, DJ; Hansch, C. Comparative QSAR: Angiotensin II antagonists. Chem. Rev 2001, 101, 2727–2750. [Google Scholar]
  29. Jaworska, J; Comber, M; Auer, C; Leeuwen, CJ. Summary of a workshop on regulatory acceptance of (Q)SARs for human health and environmental endpoints. Env. Health Perspect 2003, 111, 1358–1360. [Google Scholar]
  30. Eriksson, L; Jaworska, J; Worth, AP; Cronin, MTD; McDowell, RM; Gramatica, P. Methods for reliability and uncertainty assessment, and for applicability evaluations of classification- and regression-based QSARs. Env. Health Perspect 2003, 111, 1361–1375. [Google Scholar]
  31. Cronin, MTD; Walker, JD; Jaworska, JS; Comber, MHI; Watts, CD; Worth, AP. Use of QSARs in international decisions-making frameworks to predict ecological effects and environmental fate of chemical substances. Env. Health Perspect 2003, 111, 1376–1390. [Google Scholar]
  32. Cronin, MTD; Walker, JD; Jaworska, JS; Comber, MHI; Watts, CD; Worth, AP. Use of QSARs in international decisions-making frameworks to predict health effects of chemical substances. Env. Health Perspect 2003, 111, 1391–1401. [Google Scholar]
  33. Worth, A. QSC QSAR and Combinatorial. Science 2008, 27, 1–132. [Google Scholar]
  34. EPA Estimation Program Interface (EPI) Suite.
  35. PASS1. Prediction of Activity Spectra for Substances.
  36. Pharma Algorithms 1. ADME/Tox WEB.
  37. Nauryzbaev, MK; Batyrbekova, SE; Tassibekov, KhS; Kenesov, BN; Vorozheikin, AP; Proskuryakov, YuV. Ecological problems of central Asia resulting from space rocket debris. In History and Society in Central and Inner Asia, Toronto Studies in Central and Inner Asia; No 7, Asian Institute: University of Toronto, Toronto, 2005; pp. 327–349. [Google Scholar]
  38. System analysis of environmental objects in the territories of Kazakhstan, which suffered negative influence through Baikonur space port activity, , Final technical Report of ISTC K451.2, Center of Physical-Chemical Methods of Analysis, al-Farabi Kazakh National University in Almaty, Kazakhstan, 2006.
  39. European Commission. Regulation (EC) No 1907/2006 of the European Parliament and of the Council of 18 December 2006 concerning the Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH), establishing a European Chemicals Agency, amending Directive 1999/45/EC and repealing Council Regulation (EEC) No 793/93 and Commission Regulation (EC) No 1488/94 as well as Council Directive 76/769/EEC and Commission Directives 91/155/EEC, 93/67/EEC, 93/105/EC and 2000/21/EC.
  40. European Commision. Technical Guidance Document on Risk Assessment in support of Commission Directive 93/67/EEC on Risk Assessment for new notified substances Commission Regulation (EC) No 1488/94 on Risk Assessment for existing substances Directive 98/8/EC of the European Parliament and of the Council concerning the placing of biocidal products on the market Part I EUR 20418 EN/1. 2003. European Communities. 2003.
  41. Adushkin, VV; Kozlov, CI; Petrov, AV. Ecological Problems and Risks of Rocket-Space Techniques Impacts on Environment; Ankil: Moscow, Russia, 2000. [Google Scholar]
  42. Carlsen, L. Formal concept analyses of the ecotoxicological and health effects of residual rocket fuel and its transformation products. 2009, in press. [Google Scholar]
  43. Ivanciuc, O; Ivanciuc; Klein, DJ. Modeling the Bioconcentration Factors and BioaccumulationFactors of Polychlorinated Biphenyls with Posetic Quantitative Super– Structure/Activity Relationships (QSSAR). Mol. Diversity 2006, 10, 133–145. [Google Scholar]
  44. Partial Order in Environmental Sciences and Chemistry; Brüggemann, R; Carlsen, L (Eds.) Springer: Berlin, Germany, 2006.
  45. Walker, JD; Carlsen, L. QSARs for Identifying and Prioritizing Substances with Persistence and Bioconcentration Potential. SAR QSAR Environ. Res 2002, 13, 713–726. [Google Scholar]
  46. .
  47. .
  48. Sazonovas, A. Descriptions, incl. validation of the sigle ADME Boxes and ToxBoxes modules. Personal communication from A. 2009. [Google Scholar]
  49. PASS2. List of predicted activities.
  50. PASS3. Quality of prediction.
  51. PASS4. Basic elements of PASS.
  52. Poroikov, VV; Filimonov, DA; Ihlenfeldt, W-D; Gloriozova, TA; Lagunin, AA; Borodina, YuV; Stepanchikova, AV; Nicklaus, MC. PASS Biological activity spectrum predictions in the enhanced open NCI database browser. J. Chem. Inform. Comput. Sci 2003, 43, 228–236. [Google Scholar]
  53. PASS5. Limitations.
  54. Davey, BA; Priestley, HA. Introduction to lattices and Order; Cambridge University Press: Cambridge, UK, 1990. [Google Scholar]
  55. Brüggemann, R; Halfon, E; Welzl, G; Voigt, K; Steinberg, CEW. Applying the concept of partially ordered sets on the ranking of near-shore sediments by a battery of tests. J. Chem. Inf. Comput. Sci 2001, 41, 918–925. [Google Scholar]
  56. Halfon, E; Reggiani, MG. On the ranking of chemicals for environmental hazard. Environ. Sci. Technol 1986, 20, 1173–1179. [Google Scholar]
  57. Hasse, H. Über die Klassenzahl abelscher Zahlkörper; Akademie Verlag: Berlin, Germany, 1952. [Google Scholar]
  58. Brüggemann, R; Halfon, E; Bücherl, C. Theoretical base of the program “Hasse”, GSF-Bericht 20/95, Neuherberg, Germany, 1995. The software may be obtained by contacting Dr. R. Brüggemann, Institute of Freshwater Ecology and Inland Fisheries, Berlin ([email protected]).
  59. European Chemical Bureau. DART. Decision Analysis by Ranking Techniques.
  60. Brüggemann, R; Restrepo, G; Voigt, K. Towards a New and Advanced Partial Order Program: PyHasse. In Multicriteria Ordering and Ranking: Partial Orders, Ambiguities and Applied Issues; Systems Research Institute Polish Academy of Sciences: Warsaw; pp. 11–33, The software may be obtained by contacting Dr. R. Brüggemann, Institute of Freshwater Ecology and Inland Fisheries, Berlin ([email protected]).
  61. Fishburn, PC. On the family of linear extensions of a partial order. J. Combinat. Theory 1974, 17, 240–243. [Google Scholar]
  62. Graham, RL. Linear Extensions of Partial Orders and the FKG Inequality. In Ordered Sets; Rival, I, Ed.; Reidel Publishing Company: Dordrecht, The Netherlands, 1982; pp. 213–236. [Google Scholar]
  63. Winkler, PM. Average height in a partially ordered set. Discrete Math 1982, 39, 337–341. [Google Scholar]
  64. Winkler, PM. Correlation among partial orders. Siam. J. Alg. Disc. Meth 1983, 4, 1–7. [Google Scholar]
  65. Sørensen, PB; Lerche, DB; Carlsen, L; Brüggemann, R. Statistically approach for estimating the total set of linear orders. A possible way for analysing larger partial order sets. In Order Theoretical Tools in Environmental Science and Decision Systems; Brüggemann, R, Pudenz, S, Lühr, H-P, Eds.; Berichte des IGB, Leibniz-Institut of Freshwater Ecology and Inland Fisheries: Berlin, Heft 14, Sonderheft IV, 2001; pp. 87–97. [Google Scholar]
  66. Lerche, D; Brüggemann, R; Sørensen, P; Carlsen, L; Nielsen, OJ. A comparison of partial order technique with three methods of multi-criteria analysis for ranking of chemical substances. J. Chem. Inf. Comput. Sci 2002, 42, 1086–1098. [Google Scholar]
  67. Lerche, D; Sørensen, PB; Brüggemann, R. Improved estimation of ranking probabilities in partial orders using random linear extensions by approximation of the mutual ranking probability. J. Chem. Inf. Comput. Sci 2003, 43, 1471–1480. [Google Scholar]
  68. Brüggemann, R; Lerche, D; Sørensen, PB; Carlsen, L. Estimation of average ranks by a local partial order model. J. Chem. Inf. Comput. Sci 2004, 44, 618–625. [Google Scholar]
  69. Sørensen, PB; Mogensen, BB; Carlsen, L; Thomsen, M. The influence of partial order ranking from input parameter uncertainty. Definition of a robustness parameter. Chemosphere 2000, 41, 595–601. [Google Scholar]
  70. Yevtushenko, SA. System of data analysis “Concept Explorer”. Proceedings of the 7th National Conference on Artificial Intelligence KII-2000, Russia; 2000; pp. 127–134. [Google Scholar]
Figure 1. Transformation of 1,1-dimethylhydrazine in soil and water [5,6].
Figure 1. Transformation of 1,1-dimethylhydrazine in soil and water [5,6].
Ijms 10 01628f1
Figure 2. Lattice line diagram for ecotoxicological effects by 1,1-dimethylhydrazine and its transformation products as derived by EcoSAR [46].
Figure 2. Lattice line diagram for ecotoxicological effects by 1,1-dimethylhydrazine and its transformation products as derived by EcoSAR [46].
Ijms 10 01628f2
Figure 3. Hasse diagram constructed based on the parameters GAS, LIV and LUN [6].
Figure 3. Hasse diagram constructed based on the parameters GAS, LIV and LUN [6].
Ijms 10 01628f3
Figure 4. Hasse diagram constructed based on the parameters CAR, MUT, TER and EMB [6]. For calculation purposes probabilities < 0.5 (denoted NE in Table 6) are for ranking purposes arbitrarily set to 0.25 [6].
Figure 4. Hasse diagram constructed based on the parameters CAR, MUT, TER and EMB [6]. For calculation purposes probabilities < 0.5 (denoted NE in Table 6) are for ranking purposes arbitrarily set to 0.25 [6].
Ijms 10 01628f4
Figure 5. Hasse diagram constructed based on the meta descriptors originating from the ToxBoxes and the PASS calculations, respectively, cf. Table 7 [6].
Figure 5. Hasse diagram constructed based on the meta descriptors originating from the ToxBoxes and the PASS calculations, respectively, cf. Table 7 [6].
Ijms 10 01628f5
Figure 6. Lattice line diagram for human health effects by 1,1-dimethyl hydrazine and its transformation products as derived by PASS [35].
Figure 6. Lattice line diagram for human health effects by 1,1-dimethyl hydrazine and its transformation products as derived by PASS [35].
Ijms 10 01628f6
Figure 7. Graphical representation of the hierarchical partial order ranking [16].
Figure 7. Graphical representation of the hierarchical partial order ranking [16].
Ijms 10 01628f7
Table 1. Calculated and experimentally determined physico-chemical parameters for the investigated substancesa [5].
Table 1. Calculated and experimentally determined physico-chemical parameters for the investigated substancesa [5].
NoLog SW mg/LLog KOWLog KOCLog HLC atm m3 mole−1Log VP mmHg

11×106 (1×106)−1.191.296.95×10−81.68×102 (1.57×102)
21×106 (8.9×105)0.04 (0.16)1.171.28×10−4 (1.04×10−4)1.69×103 (1.61×103)
31×106 (1.7×106)−0.17 (−0.38)1.121.81×10−5 (1.77×10−5)1.52×103 (1.47×103)
41×1060.691.031.96×10821.3
59.6×105 (1×106)−0.64 (−0.57)1.582.02×10−6 (1.82×10−6)4.3 (2.70)
61×106 (1×106)−0.93 (−1.01)0.387.38×10−8 (7.39×10−8)3.49 (3.87)
71×106−0.521.537.39×10−71.31×102
84.5×1050.401.855.91×10−580.3
97.78×1050.681.584.45×10−53.30×102
101×106−0.731.451.53×10−71.45×102
114.77×105 (1×106)−0.17 (−0.34)0.186.78×10−5 (6.67×10−5)9.10×102 (9.02×102)
121×106−1.700.653.08×10−100.14
131×106−0.441.001.52×10−87.12
143.02×104 (4.8×105)0.23 (−1.38)1.163.45×10−635.2 (7.51×103)
153.1×105 (1×106)−0.69 (−0.25)0.432.42×10−2 (1.33×10−4)7.32×102 (7.42×102)
162.0×1050.332.373.60×10−53.78
175.7×105−0.212.163.26×10−510.5
187.3×1040.61 (0.23)1.207.88×10−511.5
a Values given in parentheses are experimental values as provided by the database associated with the EPI Suite.
Table 2. Calculated persistence of the investigated structures in the environment [5].
Table 2. Calculated persistence of the investigated structures in the environment [5].
No.BDP3aUltimate biodegradation half life withinFast Anaerobic biodegradation?Residence half life in riversbResidence half life in lakesb

13.0664WeeksYes272 d8.1 y
22.8137WeeksNo5.1 h5.0 d
33.1240WeeksYes22.9 h12.8 d
42.9425WeeksYes3.67 y40.1 y
52.6503Weeks to MonthsYes11.6 d130 d
62.9834WeeksNo282 d8.4 y
73.0044WeeksYes31.0 d340 d
83.0088WeeksNo10.1 h7.9 d
93.0398WeeksYes12.0 h8.4 d
103.0354WeeksYes137 d4.1 y
113.1241WeeksYes6.5 h5.3 d
123.0045WeeksYes204 y2220 y
132.6761Weeks to MonthsNo4.0 y44.0 y
143.1615WeeksNo6.7 d6.7 d
153.1394WeeksYes2.8 h3.1 d
162.9097WeeksNo17.0 h11.2 d
173.0155WeeksYes17.3 h11.1 d
183.0177WeeksNo7.7 h6.6 d
aBDP3:Biodegradation potential for ultimate biodegradation [34]
bh: hours, d: days, y: years. Biodegradation not taken into account
Table 3. ECOSAR derived baseline and acute toxicity of the investigated compounds (values above 100 are rounded) [5].
Table 3. ECOSAR derived baseline and acute toxicity of the investigated compounds (values above 100 are rounded) [5].
No.LC50 (mg/L)EC50 (mg/L)

FishaFishbDaphnidscGreen algae

1485005.96.20.53d
2405029016.816.1b
3470030017.015.1b
4216014701450830b
5198001000520039.8b
635000308002700014200b
7185004.46.10.67d
828501.73.50.53d
913501.12.50.42d
10238004.65.80.59d
11460017.848.81820b
1220000014.412.20.88d
131510085045.537.0b
14800580550310b
158000677560253225b
163700267525501450b
179400735067753725b
18180012251200690b
a Baseline (non polar) toxicity (14 day’s test);
b polar toxicity 96 hrs;
c polar toxicity 48 hrs;
d polar toxicity 144 hrs
Table 4. ADME results (n/a: calculations not available) [6].
Table 4. ADME results (n/a: calculations not available) [6].
NoPassive absorption (Human intestinal)aAbsorption rate constant (min−1)PPB%bBinding constant log KaHSAVdc (L/kg)P-Glycoprotein inhibitorP-Glycoprotein substrate

194 (10/90)0.01214.711.700.940.0030.031
299 (53/47)0.01913.961.802.000.0020.002
399 (18/82)0.01811.741.652.290.0040.014
4100 (98/2)0.04417.272.381.130.0030.008
5100 (88/12)0.0227.252.191.010.0030.006
694 (76/24)0.0123.842.000.960.0040.007
7100 (95/5)0.03117.492.241.220.0030.010
8100 (98/2)0.05026.732.411.240.0090.006
9100 (93/7)0.03119.852.221.140.0050.005
1093 (34/66)0.01120.171.811.220.0040.039
11100 (74/26)0.0225.672.101.030.0040.006
1289 (76/24)0.0097.332.050.940.0040.011
1399 (91/9)0.01922.822.141.020.0090.006
14n/an/an/an/an/an/an/a
1599 (50/50)0.0213.872.020.990.0050.005
16100 (94/6)0.02712.512.471.040.0050.009
1799 (85/15)0.0178.902.331.010.0030.008
18100 (94/6)0.03212.642.491.140.0050.009
a Values correspond to maximum passive absorption. Values in parentheses denote the respective transcellular/paracellular contributions
b Plasma Protein Bound fraction
c Volume of distrution
Table 5. Predicted probabilities for the compounds to exhibit adverse organ specific health effects (n/a denotes that calculated values are not available) [6].
Table 5. Predicted probabilities for the compounds to exhibit adverse organ specific health effects (n/a denotes that calculated values are not available) [6].
NoProbability for adverse health effectsa
BloodCardiovascularGastrointestinalKidneyLiverLungs

10.570.400.65 T0.280.48 T0.34 T
20.440.340.800.200.180.27
30.200.310.260.110.20 T0.20 T
40.790.070.920.570.850.74
50.760.060.970.75 T0.93 T0.71 T
60.270.120.650.140.050.40
70.520.330.830.190.100.17 T
80.630.060.840.310.050.75
90.320.080.900.420.070.72
100.530.640.660.140.29 T0.29 T
110.190.080.250.090.040.04 T
120.480.140.710.150.280.42
130.470.210.890.180.120.47
14n/aN/an/an/an/an/a
150.100.080.810.090.050.27
160.140.020.460.030.060.04
170.120.020.460.070.020.05
180.080.020.360.040.020.05
a T denotes that tumors have been found in experimental studies
Table 6. PASS predictions of selected biological activitiesa [6].
Table 6. PASS predictions of selected biological activitiesa [6].
NoCarcinogenicMutagenicTeratogenicEmbryotoxic

10.955 (0.002)0.762 (0.006)0.689 (0.031)0.672 (0.016)
20.619 (0.001)NENE0.527 (0.043)
3NENE0.563 (0.062)NE
40.894 (0.003)0.792 (0.005)0.946 (0.006)0.816(0.007)
50.980 (0.001)0.969 (0.002)0.952 (0.005)0.866 (0.005)
60.951 (0.002)NE0.614 (0.048)0.795 (0.009)
70.827 (0.006)0.539 (0.010)0.698 (0.030)0.604 (0.026)
80.980 (0.002)NENENE
90.683 (0.012)NENENE
100.923 (0.006)0.619 (0.007)0.811 (0.012)0.681 (0.015)
110.628 (0.011)NENENE
120.897 (0.003)0.524 (0.011)0.530 (0.072)NE
13NENENENE
14n/an/an/an/a
15NENENENE
16NENENENE
17NENENENE
18NENENENE
a Values given are the calculated probability for the compounds to exhibit the effect (only values above 0.5 is given). Values in parentheses are the calculated probabilities for the compounds for not exhibiting the effect. NE indicates that if the compound exhibit the effect the probability will be below 0.5.
b n/a: PASS results not available for this compound
Table 7. Averaged rank calculated according to eqn. 4 (na: calculations not available) [6].
Table 7. Averaged rank calculated according to eqn. 4 (na: calculations not available) [6].
NoRkav According to ToxBoxesaRkav According to PASSb

16.02.8
26.89.7
313.59.7
41.12.8
51.21.0
611.53.0
78.05.1
82.63.6
94.010.1
106.02.6
1116.911.3
125.46.0
134.917.0
14nana
1510.817.0
1615.017.0
1715.617.0
1816.817.0
a Rkav based on GAS, LIV, LUN
b Rkav based on CAR, MUT, TER, EMB
Table 8. Averaged rank calculated according to eqn. 4 Based on Tux Boxes and PASSS (HPOR approach) (n/a: calculations not available) [6].
Table 8. Averaged rank calculated according to eqn. 4 Based on Tux Boxes and PASSS (HPOR approach) (n/a: calculations not available) [6].
NoRkav

15.1
29.0
312.0
41.1
51.1
68.2
78.3
83.6
96.5
102.8
1116.6
126.0
139.0
14n/a
1513.2
1614.8
1715.9
1816.9
Table 9. Selected implication sets from the formal concept analysis of human health effects by 1,1-dimethyl hydrazine and its transformation products as derived by PASS [35].
Table 9. Selected implication sets from the formal concept analysis of human health effects by 1,1-dimethyl hydrazine and its transformation products as derived by PASS [35].
No of compoundsIfThen

7M > 50C > 60T > 50
5M > 60C > 60T > 60E > 60
1M > 90C > 90T > 90E > 80
7T > 60C > 60E > 60
4T > 80C > 60M > 60E > 60
3T > 90C > 60E > 80
4E > 70C > 60T > 60
3E > 80C > 60M > 70T > 90
Table 10. Selected association rules from the formal concept analysis of human health effects by 1,1-dimethyl hydrazine and its transformation products as derived by PASS [35].
Table 10. Selected association rules from the formal concept analysis of human health effects by 1,1-dimethyl hydrazine and its transformation products as derived by PASS [35].
No of compoundsPctIfThen

7 / 888C > 60T > 50T > 60E > 60
7 / 888C > 60T > 50M> 50
7 / 888C > 80T > 50
6 / 786C > 60T > 60E > 60M > 50
6 / 786C > 60T > 60E > 60C > 80
6 / 786C > 60 M > 50T > 50C > 80
5 / 683C > 60M > 50T > 60E > 60M > 60
5 / 683C > 60M > 50T > 60E > 60C > 80
4 / 580C > 60M > 60T > 60E > 60T > 80
4 / 580C > 60M > 50T > 60E > 60M > 70
4 / 580C > 60M > 50T > 60E > 60C > 80
4 / 580C > 90T > 60 E > 60

Share and Cite

MDPI and ACS Style

Carlsen, L. The Interplay between QSAR/QSPR Studiesand Partial Order Ranking and Formal Concept Analyses. Int. J. Mol. Sci. 2009, 10, 1628-1657. https://doi.org/10.3390/ijms10041628

AMA Style

Carlsen L. The Interplay between QSAR/QSPR Studiesand Partial Order Ranking and Formal Concept Analyses. International Journal of Molecular Sciences. 2009; 10(4):1628-1657. https://doi.org/10.3390/ijms10041628

Chicago/Turabian Style

Carlsen, Lars. 2009. "The Interplay between QSAR/QSPR Studiesand Partial Order Ranking and Formal Concept Analyses" International Journal of Molecular Sciences 10, no. 4: 1628-1657. https://doi.org/10.3390/ijms10041628

Article Metrics

Back to TopTop