Affinity of Compounds for Phosphatydylcholine-Based Immobilized Artificial Membrane—A Measure of Their Bioconcentration in Aquatic Organisms

The BCF (bioconcentration factor) of solutes in aquatic organisms is an important parameter because many undesired chemicals enter the ecosystem and affect the wildlife. Chromatographic retention factor log kwIAM obtained from immobilized artificial membrane (IAM) HPLC chromatography with buffered, aqueous mobile phases and calculated molecular descriptors obtained for a group of 120 structurally unrelated compounds were used to generate useful models of log BCF. It was established that log kwIAM obtained in the conditions described in this study is not sufficient as a sole predictor of bioconcentration. Simple, potentially useful models based on log kwIAM and a selection of readily available, calculated descriptors and accounting for over 88% of total variability were generated using multiple linear regression (MLR), partial least squares (PLS) regression and artificial neural networks (ANN). The models proposed in the study were tested on an external group of 120 compounds and on a group of 40 compounds with known experimental log BCF values. It was established that a relatively simple MLR model containing four independent variables leads to satisfying BCF predictions and is more intuitive than PLS or ANN models.


Introduction
Immobilized artificial membrane (IAM) chromatography is a valuable technique used to predict the behavior of compounds towards biological membranes. IAM stationary phases based on phosphatidylcholine (PC) covalently linked to aminopropyl silica are able to mimic the natural membrane bilayer [1]. Thanks to this ability, they have become widely recognized tools for modeling drug distribution in vitro, with applications in medicinal chemistry including estimation of lipophilicity (a key feature characterizing the biological distribution of compounds), prediction of the ability of compounds to cross biological membranes (skin absorption, blood-brain barrier permeability, oral/human intestinal absorption) and estimation of other biomimetic properties (e.g., volume of distribution or Caco-2 permeability) [2,3]. More recently, immobilized artificial membrane chromatography has attracted the attention of environmental chemists, who used IAM chromatography to study the bioconcentration of pharmaceuticals [4], ecotoxicity of pesticides (expressed as LC 50 ) [5] and mobility of substances in soil [6]. Applications of IAM chromatography and other phospholipid-based in vitro techniques (liposome partitioning and chromatography on unbound phosphatidylcholine stationary phases) in the studies of drug-biomembrane interactions are presented in reviews [2,3].
Anthropogenic compounds enter the aquatic environment via a number of routes, pose a threat to aquatic organisms, accumulate in their tissues and affect their fertility. The risks associated with the exposure of aquatic organisms to chemical compounds released to the environment by humans have been studied extensively, e.g., for organic sunscreens [7][8][9][10][11], per-and polyfluoroalkyl compounds [12], polycyclic aromatic hydrocarbons [13] or antibiotics [14].
There is a need to identify compounds that are potentially hazardous-bioaccumulative, persistent and toxic in the environment. The fish bioconcentration factor (BCF) is the ratio of the chemical concentration in the organism (C B ) and water (C W ), accounting for the absorption via the respiratory route (e.g., gills) and skin. It is commonly used to screen chemicals for their bioaccumulation potential [15], especially in the absence of the bioaccumulation factor (BAF),which accounts for dietary, dermal and respiratory exposures. When neither BAF nor BCF data are available, lipophilicity expressed as the octanol-water partition coefficient K ow is used as a surrogate measure of compounds' ability to bioaccumulate. The criteria of bioaccumulation differ depending on regulatory agency; it is accepted that compounds that bioaccumulate have a BCF > 5000 or BCF > 2000 [16]. If no BCF or BAF data are available, it may be assumed that bioaccumulative compounds are those with log K ow > 5 [16,17], >4.5 [18] or >3.3 [19]. Measured and evaluated bioaccumulation data are also used to assign chemicals to three bioaccumulation categories: not significantly bioaccumulative (BCF or BAF < 1000), bioaccumulative (BCF or BAF between 1000 and 5000) and highly bioaccumulative (BCF or BAF > 5000) [20].
Experimental toxicity data exist for just a fraction of relevant compounds, and in vivo measurements of such data require a lot of time and effort. According to Weisbrod et al., the collection of environmental toxicity data for 1240 potentially bioaccumulative compounds from the Canadian Domestic Substance List would take 82 years [15], and, as estimated in 2013, the average cost of experimental BCF determination is EUR 35,000 per compound, with more than 100 fish being sacrificed during tests lasting at least one month [21]. With the difficulties related to experimental BCF determination in mind, attention has turned to in vitro or in silico BCF models. Log BCF can be predicted using descriptors related to the partitioning of molecules between water and lipids, e.g., aqueous solubility [22,23]. However, in the majority of computational BCF models, the key descriptor governing the ability of compounds to bioconcentrate is the octanol-water partition coefficient log K ow (Equations (1)-(6)) [22,[24][25][26][27][28].
Other authors studied the influence of molecular size descriptors on the bioconcentration and bioaccumulation processes. In their opinion, molecular weight or molar volume should be incorporated in the BCF models along with log K ow to account for the reduced uptake of both large and highly lipophilic molecules (Equation (10) [34]): log BCF = 3.036 log K ow − 0.197 (log K ow ) 2 − 0.808 V M (n = 28, R 2 = 0.817) (10) where V M -molar volume. Dimitrov reported that the threshold value of 1.5 nm for the maximal cross-section diameter discriminates between compounds with log BCF> and <3.3 [35]. Further research by Dimitrov was concerned with the influence of chemicals' metabolism in fish liver on their ability to bioconcentrate (BCF was calculated using K ow as the most important descriptor, with molecular size and ionization taken into account and a simulator for fish liver used to reproduce the fish metabolism) [36].
In search for models capable of addressing the hydrophobicity cutoff problem observed for highly lipophilic molecules, QSAR BCF studies were reported by several authors [18,21,[37][38][39][40]. The most widely recognized models accounting for this phenomenon are:
The objective of this study was to develop useful and easy-to-use predictive models of the bioconcentration factor of structurally diverse solutes based on their affinity for phosphatydylocholine-based artificial membranes. Novel models proposed in this study were generated using multiple linear regression (MLR), partial least squares (PLS) and artificial neural network (ANN) techniques. It is the first report on PLS and ANN approaches to bioconcentration studies involving chromatographic and calculated physico-chemical data.

Compounds, IAM Chromatographic Data, Reference BCF Values
The first stage of this study was intended to involve 175 compounds, whose IAM chromatographic retention factors obtained for purely aqueous mobile phases (log k w IAM ) were compiled by Sprunger et al. [52]. Because of the lack of experimental BCF data for the whole group of 175 compounds, log BCF (denoted later as log BCF EPI ) was calculated using the commonly accepted computational approach (EPI Suite TM , BCFBAF module v. 3.02) [42] based on Meylan's model [41]. A large number of compounds considered at this stage of the study, however, were molecules with arbitrarily assigned log BCF = 0.50. The majority of such compounds were excluded from the training set because it was suspected that their theoretical log BCF value may not truly reflect their ability to bioconcentrate [4], and the models generated in this study were finally based upon a solute set containing 120 compounds from different chemical families (1 to 120). The excluded compounds were later combined with solutes, whose log k w IAM values were reported by other authors [53,54], to form an external test set also containing 120 compounds (121 to 240) with and without known experimental values of log BCF (log BCF vivo ). Reliable reference log BCF values (log BCF EPI ) were available for compounds 1 to 187, and, for the compounds 188 to 240, log BCF was calculated de novo. The external set of compounds included more lipophilic molecules, whose log k w IAM could not be measured directly by chromatography with 100% aqueous mobile phase and could only be calculated by extrapolation of log k IAM vs. the ϕ plots obtained for a series of chromatographic experiments with mobile phases containing different concentrations ϕ of a water-miscible organic solvent, usually according to the linear Soczewiński-Wachmeister Equation (19) [55]: The values of log BCF vivo were taken from the literature sources and the EPISuite TM database [4,29,36,42]. The reference log BCF EPI and the experimental log BCF vivo values for compounds 1 to 240 (where available) are given in Table S1 (Supplementary Materials); the IAM chromatographic retention factors are given in Table S2 (Supplementary Materials).

Partial Least Squares Approach
Multiple linear regression (MLR) is a common approach used in QSAR studies.It is based on the assumption that the effect of a set of molecule's properties on its activity is additive, and the properties are (almost) independent. The conditions that must be satisfied to generate reliable MLR models are severe-standard regression techniques based on the least squares estimation give unstable and unreliable results when independent variables are colinear, and the number of cases must exceed the number of variables (ideally, it should be at least five times greater). In order to overcome the colinearity problem, partial least square (PLS) regression was developed. PLS replaces the original variables with "components"-linear combinations of the variables based on the correlation between the dependent variable and the independent variable(s) [59,60].
Regression models based on PLS estimation must be optimized in terms of the number of components-if too many are used, a model is over-fitted (it perfectly fits the training dataset, but it gives poor prediction results for new cases); if too few components are used, the model is under-fitted (it is not sufficiently large to capture the important data variability). Models based on the same number of components can be compared using RSS (residual sum of squares) or R 2 , but these parameters are unsuitable for models with different numbers of components. PLS models are often evaluated using RMSEP calculated for a separate test set and/or using cross-validation-RMSEP usually decreases as more variables are added to a small model, then it stabilizes around the optimum number of components, and it increases when the model becomes over-fitted [61].

Statistical Tools
Multiple linear regression (MLR) models were generated using Statistica v. 13 by StatSoft Polska, Kraków, Poland, stepwise forward regression mode. Partial least squares (PLS) models were generated using Statistica v. 13, NIPALS algorithm with auto-scaling. Multilayer Perceptron(MLP) artificial neural networks (ANNs), with the number of inputs the same as the number of variables, the varying number of hidden units and one output unit, were generated using Statistica v. 13 (regression mode, Automated Network Search-ANS module, 1000 networks to train, 50 networks to retain). The neuron activation functions were selected from the following group: identity, logistic, hyperbolic tangent and exponential. The BFGS (Broyden-Fletcher-Goldfarb-Shanno) algorithm was used to train the network together with the sum of squares (SOS) error function.
The models considered in this study were evaluated using the following procedures and statistical parameters: • K-fold cross-validation, with n compounds from the initial training set split into k even subsets, (k − 1) of which were used to train a new model and the remaining one to test it; the procedure was repeated k times, each time using a different subset  (20): The overall root mean squared error of k-fold cross-validation (RMSECV) is calculated as follows: In this study, n = 120, k = 5 and N = 24; y i pred and y i ref are log BCF pred and log BCF EPI , respectively.

•
Relationship between the predicted log BCF pred values (computed for the external test set of 67 compounds 121 to 187 that were not used to build models) with the reference values log BCF EPI -using root mean squared error of prediction (RMSEP ext ), calculated according to Equation (20); • Comparison of the predicted log BCF pred values (calculated for 40 compounds, whose experimental log BCF vivo data are available), and these data-using squared coefficient of determination (R 2 vivo ) and root-mean-squared error of prediction (RMSEP vivo ), calculated according to Equation (20).

Multiple Linear Regression (MLR) Models
The values of log BCF EPI , calculated for compounds 1 to 120 using EPISuite TM software [42], were plotted against the IAM retention factors obtained for aqueous mobile phases log k w IAM and compiled by Sprunger [52]. The linear relationship between log BCF EPI and log k w IAM (Equation (22), MLR1, Figure 1) accounted for 80% of total log BCF EPI variability. The results of log BCFEPI modeling using a single chromatographic descriptor (log kw IAM ) obtained in this study (Equation (22), MLR1) are similar to those reported by Tsopelas [4] (R 2 = 0.74, n = 77). Log kw IAM accounted for ca. 74% of variability of log BCFvivo data (which proves the importance of the chromatographic parameter), but it was hoped The results of log BCF EPI modeling using a single chromatographic descriptor (log k w IAM ) obtained in this study (Equation (22), MLR1) are similar to those reported by Tsopelas [4] (R 2 = 0.74, n = 77). Log k w IAM accounted for ca. 74% of variability of log BCF vivo data (which proves the importance of the chromatographic parameter), but it was hoped that the model can be improved by incorporating some additional independent variables expected to influence the ability of compounds to be absorbed by aquatic animals from the surrounding water via the respiratory route and skin. It is likely that, similarly to pharmacokinetic processes of compound absorption and distribution in humans, the key features responsible for the ability of molecules to bioconcentrate in aquatic organisms are their lipophilicity (which, indeed, is the main parameter in the majority of BCF in silico models), ability to form hydrogen bonds and molecule flexibility and size. Apart from log k w IAM , which is strongly related to solutes' lipophilicity, several molecular descriptors calculated using SwissADME software were investigated. The improved Equation (23) (MLR2, Figure 2) was generated using forward stepwise regression: Membranes 2022, 12, x FOR PEER REVIEW 8 of 18

Equation (23), MLR2
Training set (n = 120) External test set (n = 67) The additional independent variables incorporated into Equation (23) (MLR2) were statistically significant and accounted for ca. 7% of total variability. They were introduced in the following order: TPSA, FCsp3 and HD, which confirmed the relationship between TPSA and the phenomenon of bioconcentration reported earlier by Tsopelas [4] (who also demonstrated the contribution of a biodegradation estimate, BioWin5, calculated using EPISuite TM software). Polar surface area is an important parameter that defines the polar part of a molecule. It is strongly related to the passive transport of molecules through membranes, and it is known to influence the ADME processes in humans (e.g., the blood and brain barrier permeability, transdermal or intestinal absorption [62][63][64]). Other BCF predictors incorporated in Equation (23) (MLR2, Figure  2) are the fraction of sp 3 carbons FCsp3 (which, in simple terms, can be considered a measure of molecule's flexibility and is positively correlated with log BCF) and the count of H-bond donors HD. The coefficients for both HD and PSA in Equation (MLR2) are negative-high polar surface area and the molecule's strong tendency to form hydrogen bonds reduce its uptake by aquatic organisms.
Further attempts to improve the MLR models by incorporating other parameters expected to influence the compounds' ability to bioconcentrate were not very successful-Equation (24) (MLR3, Figure 3), obtained using six variables selected by forward stepwise regression, had slightly better parameters of cross-validation than the The additional independent variables incorporated into Equation (23) (MLR2) were statistically significant and accounted for ca. 7% of total variability. They were introduced in the following order: TPSA, F Csp3 and HD, which confirmed the relationship between TPSA and the phenomenon of bioconcentration reported earlier by Tsopelas [4] (who also demonstrated the contribution of a biodegradation estimate, BioWin5, calculated using EPISuite TM software). Polar surface area is an important parameter that defines the polar part of a molecule. It is strongly related to the passive transport of molecules through membranes, and it is known to influence the ADME processes in humans (e.g., the blood and brain barrier permeability, transdermal or intestinal absorption [62][63][64]). Other BCF predictors incorporated in Equation (23)  The coefficients for both HD and PSA in Equation (MLR2) are negative-high polar surface area and the molecule's strong tendency to form hydrogen bonds reduce its uptake by aquatic organisms.
Further attempts to improve the MLR models by incorporating other parameters expected to influence the compounds' ability to bioconcentrate were not very successful-Equation (24) (MLR3, Figure 3), obtained using six variables selected by forward stepwise regression, had slightly better parameters of cross-validation than the model MLR2 (Equation (23)), but this gain didnot justify the risk of over-fitting related to incorporation of two more parameters (FRB and DipH) that, although both statistically significant, accounted together for only slightly over 1% of total variability. The ability of Equation (24) to predict log BCF for new cases (the external test set) and the relationship between log BCF values predicted using this model and the experimental values were comparable to those reported for Equation (23)

Partial Least Square (PLS) Models
In this study, the following PLS models were investigated (details to be found in Supplementary Materials): • Models PLS1 based on 16 independent variables-including those involved in MLR analysis and some other descriptors that were not included in MLR to avoid colinearity problems; • Model PLS2 based on a reduced set of independent variables.
PLS1 models based on the set of 16 independent variables and involving between 4 and 12 components were compared using RMSEPext, RMSEPvivo and RMSECV values (Supplementary Materials). At a later step, multiple linear forward stepwise regression was also performed on the X-scores of all the possible 16 PLS components. Using these two approaches, it was established that the optimum number of components is six (Figure 4)-it led to a model that fitted the training dataset reasonably well, the model's predictive potential was satisfying (i.e., the model was neither over-fitted or under-fitted) and all six PLS components selected by MLR were statistically significant.

Partial Least Square (PLS) Models
In this study, the following PLS models were investigated (details to be found in Supplementary Materials): • Models PLS1 based on 16 independent variables-including those involved in MLR analysis and some other descriptors that were not included in MLR to avoid colinearity problems; • Model PLS2 based on a reduced set of independent variables.
PLS1 models based on the set of 16 independent variables and involving between 4 and 12 components were compared using RMSEP ext , RMSEP vivo and RMSECV values (Supplementary Materials). At a later step, multiple linear forward stepwise regression was also performed on the X-scores of all the possible 16 PLS components. Using these two approaches, it was established that the optimum number of components is six (Figure 4)-it led to a model that fitted the training dataset reasonably well, the model's predictive potential was satisfying (i.e., the model was neither over-fitted or under-fitted) and all six PLS components selected by MLR were statistically significant. The importance of descriptors used in PLS models can be evaluated manually based on their variable importance in the projection (VIP) values calculated for the particular number of components (descriptors with VIP < 1 in a PLS model are excluded from the next one) [65] (Table 2). This procedure was applied to PLS1, and it was established that only two variables, log kw IAM and MR (a descriptor connected with polarizability of molecules, not selected in MLR) had a strong influence on log BCF (model PLS2, Figure  5). Surprisingly, the descriptors selected by stepwise multiple regression (apart from log kw IAM , which is of utmost importance in all the models developed in this study) were of lesser importance in the PLS regression. Model PLS2, however, seemed excessively simplified, and its performance, evaluated using RMSECV, RMSEext and RMSEvivo, was slightly worse than that of PLS1 (Table 3).  The importance of descriptors used in PLS models can be evaluated manually based on their variable importance in the projection (VIP) values calculated for the particular number of components (descriptors with VIP < 1 in a PLS model are excluded from the next one) [65] (Table 2). This procedure was applied to PLS1, and it was established that only two variables, log k w IAM and MR (a descriptor connected with polarizability of molecules, not selected in MLR) had a strong influence on log BCF (model PLS2, Figure 5). Surprisingly, the descriptors selected by stepwise multiple regression (apart from log k w IAM , which is of utmost importance in all the models developed in this study) were of lesser importance in the PLS regression. Model PLS2, however, seemed excessively simplified, and its performance, evaluated using RMSECV, RMSE ext and RMSE vivo , was slightly worse than that of PLS1 (Table 3).

Artificial Neural Networks
Artificial neural networks are widely used to predict drugs' bioavailability [66] or properties such as affinity for phospholipids using IAM chromatography and calculated descriptors [67]. The great advantages of neural networks compared to MLR are the possibility of utilizing both linear and non-linear relationships between input data and a predicted parameter and the ability of ANNs to learn these relationships directly from the data being modeled.
In this study, the ANN models were built for the same group of compounds (1 to 120) that was used as the training set in the MLR and PLS analyses. This group of compounds was randomly assigned to three subgroups: train (70%), test (15%) and validation (15%)-the latter two groups were needed to optimize the ANNs as they were being created. Similarly to the MLR and PLS analyses presented in this study, the compounds 121 to 240 were used as an additional, external test set. At this point, 1000 networks were generated, and 50 with the smallest error were retained for further examination in search of those that give the results in the closest agreement with the reference data (log BCFEPI) for compounds 121 to 187 (RMSEPext) and with the experimental data (log BCFvivo) for a subgroup of 40 cases, whose experimental log BCF values were available (R 2 vivo, RMSEPvivo). The selection of the best exemplary networks generated in this study (ANN14, ANN43 and ANN44, Figures 6-8) was based on their ability to predict new cases (RMSEPext) and to obtain the results in the closest possible agreement with the experimental data (R 2 vivo, RMSEPvivo) rather than on their ability to fit the training data (Supplementary Materials).
ANNs make it possible to process a large number of descriptors that can be easily obtained using readily available software. The selection of ANN input data is an important step because, if the number of parameters is excessive considering the number

Artificial Neural Networks
Artificial neural networks are widely used to predict drugs' bioavailability [66] or properties such as affinity for phospholipids using IAM chromatography and calculated descriptors [67]. The great advantages of neural networks compared to MLR are the possibility of utilizing both linear and non-linear relationships between input data and a predicted parameter and the ability of ANNs to learn these relationships directly from the data being modeled.
In this study, the ANN models were built for the same group of compounds (1 to 120) that was used as the training set in the MLR and PLS analyses. This group of compounds was randomly assigned to three subgroups: train (70%), test (15%) and validation (15%)-the latter two groups were needed to optimize the ANNs as they were being created. Similarly to the MLR and PLS analyses presented in this study, the compounds 121 to 240 were used as an additional, external test set. At this point, 1000 networks were generated, and 50 with the smallest error were retained for further examination in search of those that give the results in the closest agreement with the reference data (log BCF EPI ) for compounds 121 to 187 (RMSEP ext ) and with the experimental data (log BCF vivo ) for a subgroup of 40 cases, whose experimental log BCF values were available (R 2 vivo , RMSEP vivo ). The selection of the best exemplary networks generated in this study (ANN14, ANN43 and ANN44, Figures 6-8) was based on their ability to predict new cases (RMSEP ext ) and to obtain the results in the closest possible agreement with the experimental data (R 2 vivo , RMSEP vivo ) rather than on their ability to fit the training data (Supplementary Materials).
ANNs make it possible to process a large number of descriptors that can be easily obtained using readily available software. The selection of ANN input data is an important step because, if the number of parameters is excessive considering the number of cases, models are over-fitted. The importance of independent variables can be evaluated using a tool known as global sensitivity analysis (GSA), which rates the importance of the models' input variable by computing sums of squared residuals for the model when the respective predictor is eliminated compared to the full model. When an input variable scores 1 or less than 1 in GSA, it means that this particular network is likely to perform better without this variable; however, in the networks generated in this study, the majority of GSA scores were at least slightly above this threshold.
Membranes 2022, 12, x FOR PEER REVIEW 12 of 18 of cases, models are over-fitted. The importance of independent variables can be evaluated using a tool known as global sensitivity analysis (GSA), which rates the importance of the models' input variable by computing sums of squared residuals for the model when the respective predictor is eliminated compared to the full model. When an input variable scores 1 or less than 1 in GSA, it means that this particular network is likely to perform better without this variable; however, in the networks generated in this study, the majority of GSA scores were at least slightly above this threshold.  Membranes 2022, 12, x FOR PEER REVIEW 12 of 1 of cases, models are over-fitted. The importance of independent variables can b evaluated using a tool known as global sensitivity analysis (GSA), which rates th importance of the models' input variable by computing sums of squared residuals for th model when the respective predictor is eliminated compared to the full model. When a input variable scores 1 or less than 1 in GSA, it means that this particular network i likely to perform better without this variable; however, in the networks generated in thi study, the majority of GSA scores were at least slightly above this threshold.  Log kw IAM is an important predictor accounting for 80% of log BCF variability. It encodes the molecule's properties responsible for its ability to cross biological membranes-lipophilicity and size (molecular weight, heavy atom count), Table 4-and, when additional descriptors are incorporated, it leads to efficient BCF models. In this study, the models were generated using log kw IAM values obtained directly for aqueous mobile phases. Using the external test group of solutes, it was demonstrated, however, that log kw IAM values obtained by extrapolation of log k IAM values to zero concentration of organic modifiers in the mobile phase were sufficient to give reasonable predictions-although, since log kw IAM is the most important descriptor in all the models, imperfections of this variable in the external test dataset always had some influence on the RMSEPext values.
Models MLR2, PLS1 and ANN43 were finally compared ( Figure 9) by plotting the predicted log BCF values against the experimental ones (log BCFvivo), and it was confirmed that their ability to model the experimental log BCF data was similar. Log k w IAM is an important predictor accounting for 80% of log BCF variability. It encodes the molecule's properties responsible for its ability to cross biological membranes-lipophilicity and size (molecular weight, heavy atom count), Table 4-and, when additional descriptors are incorporated, it leads to efficient BCF models. In this study, the models were generated using log k w IAM values obtained directly for aqueous mobile phases. Using the external test group of solutes, it was demonstrated, however, that log k w IAM values obtained by extrapolation of log k IAM values to zero concentration of organic modifiers in the mobile phase were sufficient to give reasonable predictions-although, since log k w IAM is the most important descriptor in all the models, imperfections of this variable in the external test dataset always had some influence on the RMSEP ext values.
Models MLR2, PLS1 and ANN43 were finally compared ( Figure 9) by plotting the predicted log BCF values against the experimental ones (log BCF vivo ), and it was confirmed that their ability to model the experimental log BCF data was similar.

Conclusions
The ability of compounds to bioconcentrate in aquatic organisms is strongly relat to their affinity for phosphatydylocholine-based immobilized artificial membran

Conclusions
The ability of compounds to bioconcentrate in aquatic organisms is strongly related to their affinity for phosphatydylocholine-based immobilized artificial membranes (IAM), and other physico-chemical parameters of a molecule are less important in this process. QSAR models of log BCF involving the IAM chromatographic retention factor and other descriptors were built using multiple linear regression, partial lest square regression and artificial neural networks. The MLR approach is a powerful technique with the great advantage of simplicity-models generated using this technique usually involve a relatively small number of independent variables (parameters), whose physical meaning and contribution towards an dependent variable can be easily understood. In this study, the selected MLR, PLS and ANN models gave fairly comparable results in terms of their ability to predict new cases (log BCF ext ), and the results obtained using these models were in similar agreement with experimental data (log BCF vivo ) (surprisingly, simple MLR equations based on a relatively small number of independent variables seemed to perform slightly better than more complex ANN or PLS models). Generally speaking, PLS regression deals with the colinearity of independent variables, and the ANN approach is especially useful in the case of non-linear relationships, but, in this study, linear equations (especially Equation (23), MLR2) gave satisfying prediction results, and they were more intuitive. All the models reported above can be easily applied during the early steps of the drug discovery process concurrently with IAM chromatographic pharmacokinetic studies and, as described earlier, in the studies of compounds' mobility in the soil-water compartment [6]. In lieu of logk w IAM obtained directly using aqueous mobile phases, extrapolated values can be used, although, in such situations, the quality of BCF predictions is slightly impaired. The models proposed in this study are applicable to compounds over a relatively wide range of lipophilicity, with the exception of very lipophilic molecules (log K ow > ca. 7), whose retention times on the IAM chromatographic support are very long and log k w IAM cannot be conveniently measured. This limitation of the applicability domain of the models presented in this study, however, is not a major drawback-very lipophilic compounds, as demonstrated by some authors, do not bioconcentrate or bioaccumulate easily [16,27,28,34], which is either a direct result of their hydrophobicity or, indirectly, an effect of the larger molecular size of highly lipophilic molecules [32]. Above a certain lipophilicity threshold (log K ow > ca. 7), the bioconcentration factor becomes inversely proportional to lipophilicity and decreases rapidly. On the other hand, a large proportion of compounds released to the environment by agriculture or the pharmaceutical industry (e.g., pesticides or drugs) meets the criteria of optimum intestinal, transdermal or lung absorption [19,[68][69][70][71][72]. Such compounds are usually moderately lipophilic (log K ow rarely higher than 7, in the majority of cases, between 0 and 5), so quantitative studies of their bioconcentration using the models discussed above are feasible.