# Comparison of Different Approaches to Define the Applicability Domain of QSAR Models

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Applicability Domain Methods

#### 2.1. Range-Based and Geometric Methods

#### 2.1.1. Bounding Box

#### 2.1.2. PCA Bounding Box

#### 2.1.3. Convex Hull

#### 2.2. Distance-Based Methods

**X**by obtaining the leverage matrix (

**H**) with the equation below:

**X**is the model matrix while

**X**

^{T}is its transpose matrix.

**H**matrix represent the leverage values for different points in a given dataset. Compounds far from the centroid will be associated with higher leverage and are considered to be influential in model building. Leverage is proportional to Hotellings T

^{2}statistic and Mahalanobis distance measure but can be applied only on the regression models. The approach can be associated with a warning leverage, generally three times the average of the leverage that corresponds to p/n where p is the number of model parameters while n is the number of training compounds. A query chemical with leverage higher than the warning leverage can be associated with unreliable predictions. Such chemicals are outside the descriptor space and thus be considered outside the AD [1,2,5]. In this study, the corresponding Mahalanobis measures were used.

#### K nearest Neighbors Approach

#### 2.3. Probability Density Distribution-Based Method

_{i}and x

_{j}, it can be determined as below:

_{j}by x

_{i}and width of the curve is defined by smoothing parameter s. The cut off value associated with Gaussian potential functions, namely f

_{p}, can be calculated by methods based on sample percentile [18]:

#### 2.4. Other AD Approaches

#### 2.4.1. Decision Trees and Decision Forests Approach

#### 2.4.2. Stepwise Approach to Determine Model’s AD

#### 2.5. Models and Test Sets

#### 2.5.1. CAESAR Models

Model | Training set | Test set | ||
---|---|---|---|---|

R^{2} ^{(a)} | RMSE ^{(b)} | Q^{2} ^{(c)} | RMSEP^{(d)} | |

1) Model 2 | 0.804 | 0.591 | 0.797 | 0.600 |

2) Model 5 | 0.810 | 0.581 | 0.774 | 0.634 |

^{(a)}Determination coefficient R

^{2};

^{(b)}Root-mean-square error RMSE;

^{(c)}Predictive squared correlation coefficient Q

^{2};

^{(d)}Root-mean-square error of prediction RMSEP.

#### 2.5.2. CAESAR and EPI Suite Test Sets

^{2}and RMSEP values for the test sets of CAESAR Model 2 and Model 5 are reported in Table 1.

## 3. Results and Discussion

- i) Number of test compounds considered outside the domain of applicability;

_{TR}is the number of compounds in the training set and n

_{EXT}the number in the test set; is the mean response of the training set. Moreover, in order to somehow quantify the role of the compounds considered inside and outside AD, was defined by the following equation:

_{OUT}is the root mean square error in prediction for the test compounds outside AD, while RMSEP

_{IN}is the root mean square error in prediction for the test compounds inside AD. Negative values indicate that the compounds detected outside AD are predicted better than the compounds inside AD, thus highlighting some possible drawbacks in the definition of interpolation space. On the contrary, positive values of indicate a reliable partition for the compounds detected as inside and outside AD.

#### 3.1. Defining Thresholds for Distance-Based AD Approaches

**Table 2.**Statistics for CAESAR Model 2 implementing distance-based approaches with different thresholds. For the acronyms maxdist, d, p95, dsz, and ΔRMSEP, refer to text.

Approach | Thresholds | Compounds outside the AD | Q^{2} | ΔRMSEP | |||
---|---|---|---|---|---|---|---|

CAESAR | EPI Suite | CAESAR | EPI Suite | CAESAR | EPI Suite | ||

out of 95 (%) | out of 108 (%) | ||||||

Euclidean (maxdist) | 0.942 | 0 (0.0) | 4 (3.7) | 0.797 | 0.703 | - | 1.436 |

Euclidean (3*d) | 1.018 | 0 (0.0) | 1 (0.9) | 0.797 | 0.676 | - | 0 |

Euclidean (2*d) | 0.679 | 7 (7.4) | 12 (11.1) | 0.802 | 0.718 | 0.146 | 0.753 |

Euclidean (p95) | 0.663 | 7 (7.4) | 12 (11.1) | 0.802 | 0.718 | 0.146 | 0.753 |

Euclidean (dsz) | 0.423 | 15 (15.8) | 36 (33.3) | 0.791 | 0.741 | −0.064 | 0.381 |

CityBlock (maxdist) | 1.472 | 0 (0.0) | 1 (0.9) | 0.797 | 0.676 | - | 2.713 |

CityBlock (3*d) | 1.863 | 0 (0.0) | 0 (0.0) | 0.797 | 0.616 | - | - |

CityBlock (2*d) | 1.242 | 3 (3.1) | 6 (5.5) | 0.804 | 0.699 | 0.267 | −1.049 |

CityBlock (p95) | 1.084 | 8 (8.4) | 11 (10.1) | 0.801 | 0.705 | 0.068 | 0.717 |

CityBlock (dsz) | 0.748 | 18 (18.9) | 38 (35.1) | 0.786 | 0.739 | −0.093 | 0.361 |

Mahalanobis (maxdist) | 6.614 | 0 (0.0) | 0 (0.0) | 0.797 | 0.616 | - | - |

Mahalanobis (3*d) | 6.027 | 0 (0.0) | 0 (0.0) | 0.797 | 0.616 | - | - |

Mahalanobis (2*d) | 4.018 | 6 (6.3) | 5 (4.6) | 0.791 | 0.624 | −0.174 | 0.162 |

Mahalanobis (p95) | 4.034 | 6 (6.3) | 5 (4.6) | 0.791 | 0.624 | −0.174 | 0.162 |

Mahalanobis (dsz) | 2.497 | 21 (22.1) | 27 (25.0) | 0.778 | 0.706 | −0.138 | 0.354 |

**Table 3.**Statistics for CAESAR Model 5 implementing distance-based approaches with different thresholds. Maxdist: Maximum distance between training compounds and centroid of the training set; d: Average distance of training compounds from their mean; ΔRMSEP: Difference between RMSEP for compounds outside and inside the AD.

Approach | Thresholds | Compounds outside the AD | Q^{2} | ΔRMSEP | |||
---|---|---|---|---|---|---|---|

CAESAR | EPI Suite | CAESAR | EPI Suite | CAESAR | EPI Suite | ||

out of 95 (%) | out of 108 (%) | ||||||

Euclidean (maxdist) | 0.942 | 0 (0.0) | 2 (1.8) | 0.774 | 0.647 | - | 0.598 |

Euclidean (3*d) | 0.958 | 0 (0.0) | 2 (1.8) | 0.774 | 0.647 | - | 0.598 |

Euclidean (2* d) | 0.639 | 3 (3.1) | 9 (8.3) | 0.783 | 0.665 | 0.329 | 0.354 |

Euclidean (p95) | 0.614 | 4 (4.2) | 11 (10.1) | 0.783 | 0.673 | 0.266 | 0.367 |

Euclidean (dsz) | 0.393 | 23 (24.2) | 32 (29.6) | 0.753 | 0.646 | −0.128 | 0.044 |

CityBlock (maxdist) | 1.472 | 0 (0.0) | 2 (1.8) | 0.774 | 0.647 | - | 0.598 |

CityBlock (3*d) | 1.791 | 0 (0.0) | 1 (0.9) | 0.774 | 0.634 | - | 0.037 |

CityBlock (2*d) | 1.194 | 1 (1.0) | 5 (4.6) | 0.772 | 0.657 | −0.417 | 0.457 |

CityBlock (p95) | 1.085 | 4 (4.2) | 11 (10.1) | 0.767 | 0.665 | 0.309 | 0.308 |

CityBlock (dsz) | 0.723 | 21 (22.1) | 32 (29.6) | 0.751 | 0.639 | −0.156 | 0.022 |

Mahalanobis (maxdist) | 6.957 | 0 (0.0) | 0 (0.0) | 0.774 | 0.633 | - | - |

Mahalanobis (3*d) | 6.121 | 0 (0.0) | 0 (0.0) | 0.774 | 0.633 | - | - |

Mahalanobis (2*d) | 4.081 | 3 (3.1) | 6 (5.5) | 0.767 | 0.621 | −0.445 | −0.275 |

Mahalanobis (p95) | 3.859 | 5 (5.2) | 6 (5.5) | 0.764 | 0.621 | −0.327 | −0.275 |

Mahalanobis (dsz) | 2.495 | 23 (24.2) | 18 (16.6) | 0.760 | 0.637 | −0.081 | 0.035 |

**Table 4.**Statistics for CAESAR Model 2 implementing different 5NN based threshold strategies. For the acronyms D, p95, DSZ, and ΔRMSEP, refer to text.

Approach | Thresholds | Compounds outside the AD | Q^{2} | ΔRMSEP | |||
---|---|---|---|---|---|---|---|

CAESAR | EPI Suite | CAESAR | EPI Suite | CAESAR | EPI Suite | ||

out of 95(%) | out of 108(%) | ||||||

Euclidean (3*D) | 1.522 | 2 (2.1) | 1 (0.9) | 0.804 | 0.676 | 0.394 | 2.713 |

Euclidean (2* D) | 1.015 | 9 (9.5) | 16 (14.8) | 0.795 | 0.750 | −0.037 | 0.765 |

Euclidean (p95) | 1.164 | 8 (8.4) | 13 (12.0) | 0.797 | 0.745 | 0.859 | 1.342 |

Euclidean (DSZ) | 0.693 | 14 (14.7) | 31 (28.7) | 0.787 | 0.767 | −0.113 | 0.517 |

CityBlock (3*D) | 2.371 | 4 (4.2) | 5 (4.6) | 0.803 | 0.679 | 0.187 | 0.968 |

CityBlock (2*D) | 1.581 | 10 (10.5) | 18 (16.7) | 0.794 | 0.742 | −0.042 | 0.664 |

CityBlock (p95) | 1.918 | 7 (7.4) | 11 (10.2) | 0.799 | 0.741 | 0.034 | 0.944 |

CityBlock (DSZ) | 1.083 | 16 (16.8) | 27 (25.0) | 0.801 | 0.731 | 0.037 | 0.446 |

Mahalanobis (3*D) | 1.718 | 3 (3.2) | 4 (3.7) | 0.803 | 0.628 | 0.221 | 0.295 |

Mahalanobis (2*D) | 1.145 | 9 (9.5) | 18 (16.7) | 0.794 | 0.748 | −0.045 | 0.691 |

Mahalanobis (p95) | 1.388 | 6 (6.3) | 11 (10.2) | 0.801 | 0.735 | 0.908 | 1.183 |

Mahalanobis (DSZ) | 0.786 | 19 (20.0) | 29 (26.9) | 0.795 | 0.745 | −0.019 | 0.470 |

**Table 5.**Statistics for CAESAR Model 5 implementing different 5NN based threshold strategies. D: The gross average distance of training set compounds from their 5NN; ΔRMSEP: Difference between RMSEP for compounds outside and inside the AD.

Approach | Thresholds | Compounds outside the AD | Q^{2} | ΔRMSEP | |||
---|---|---|---|---|---|---|---|

CAESAR | EPI Suite | CAESAR | EPI Suite | CAESAR | EPI Suite | ||

out of 95 (%) | out of 108 (%) | ||||||

Euclidean (3*D) | 1.681 | 0 (0.0) | 2 (2.8) | 0.774 | 0.644 | - | 0.364 |

Euclidean (2* D) | 1.121 | 7 (7.4) | 13 (12.0) | 0.781 | 0.690 | 0.130 | 0.437 |

Euclidean (p95) | 1.331 | 1 (1.0) | 7 (6.5) | 0.772 | 0.656 | −0.331 | 0.126 |

Euclidean (DSZ) | 0.782 | 18 (18.9) | 22 (20.4) | 0.784 | 0.743 | 0.072 | 0.512 |

CityBlock (3*D) | 2.684 | 1 (1.1) | 5 (4.6) | 0.772 | 0.648 | −0.456 | 0.307 |

CityBlock (2*D) | 1.789 | 9 (9.5) | 12 (11.1) | 0.788 | 0.690 | 0.190 | 0.462 |

CityBlock (p95) | 2.302 | 2 (2.1) | 8 (7.4) | 0.785 | 0.657 | 0.529 | 0.310 |

CityBlock (DSZ) | 1.232 | 19 (20.0) | 30 (27.8) | 0.782 | 0.753 | 0.055 | 0.433 |

Mahalanobis (3*D) | 2.006 | 0 (0.0) | 4 (3.7) | 0.774 | 0.624 | −0.326 | −0.149 |

Mahalanobis (2*D) | 1.337 | 6 (6.3) | 10 (9.3) | 0.779 | 0.683 | 0.115 | 0.482 |

Mahalanobis (p95) | 1.668 | 2 (2.1) | 6 (5.6) | 0.771 | 0.631 | −0.193 | −0.043 |

Mahalanobis (DSZ) | 0.933 | 21 (22.1) | 24 (22.2) | 0.792 | 0.713 | 0.110 | 0.356 |

#### 3.2. Overall Comparisons

^{2}slightly lowered for Convex Hull that considered several test compounds outside the AD. On the other hand, model statistics improved for Probability Density Distribution approach which was associated with the maximum number of test compounds outside the AD (42.6%). As a general remark, the model statistics improved for several approaches with increase in number of test compounds considered outside the AD. Since the CAESAR test set comprised compounds more similar to the training set, not many test compounds emerged outside the AD; however, the EPI suite test set is comparatively different from the training data and thus considerably more compounds were outside the AD by different approaches. ΔRMSEP remained positive considering most of the AD approaches. Similar pattern for compounds outside the AD was derived for CAESAR model 5 and the corresponding results are reported in Table 7.

**Table 6.**Statistics for CAESAR Model 2 applied to CAESAR and EPI Suite test sets for different AD approaches.

Approach | Compounds outside the AD | Q^{2} | ΔRMSEP | |||
---|---|---|---|---|---|---|

CAESAR | EPI Suite | CAESAR | EPI Suite | CAESAR | EPI Suite | |

out of 95 (%) | out of 108 (%) | |||||

Euclidean Dist. (p95) | 7 (7.4) | 12 (11.1) | 0.802 | 0.718 | 0.146 | 0.753 |

City Block Dist. (p95) | 8 (8.4) | 11 (10.1) | 0.801 | 0.705 | 0.068 | 0.717 |

Mahalanobis Dist. (p95) | 6 (6.3) | 5 (4.6) | 0.791 | 0.624 | −0.174 | 0.162 |

5NN-Euclidean Dist. (p95) | 8 (8.4) | 13 (12.0) | 0.797 | 0.745 | 0.859 | 1.342 |

5NN-CityBlock Dist. (p95) | 7 (7.4) | 11 (10.2) | 0.799 | 0.741 | 0.034 | 0.944 |

5NN-Mahalanobis Dist. (p95) | 6 (6.3) | 11 (10.2) | 0.801 | 0.735 | 0.908 | 1.183 |

Bounding Box | 0 (0.0) | 2 (1.8) | 0.797 | 0.678 | - | 1.798 |

PCA Bounding Box | 2 (2.1) | 3 (2.8) | 0.804 | 0.688 | 0.371 | 1.533 |

Convex Hull | 22 (23.2) | 31 (28.7) | 0.789 | 0.721 | −0.052 | 0.368 |

Potential Function | 29 (30.5) | 46 (42.6) | 0.831 | 0.766 | 0.156 | 0.374 |

**Table 7.**Statistics for CAESAR Model 5 applied to CAESAR and EPI Suite test sets for different AD approaches.

Approach | Compounds outside the AD | Q^{2} | ΔRMSEP | |||
---|---|---|---|---|---|---|

CAESAR | EPI Suite | CAESAR | EPI Suite | CAESAR | EPI Suite | |

out of 95 (%) | out of 108 (%) | |||||

Euclidean Dist. (p95) | 4 (4.2) | 11 (10.1) | 0.783 | 0.673 | 0.266 | 0.367 |

City Block Dist. (p95) | 4 (4.2) | 11 (10.1) | 0.767 | 0.665 | 0.309 | 0.308 |

Mahalanobis Dist. (p95) | 5 (5.2) | 6 (5.5) | 0.764 | 0.621 | −0.327 | −0.275 |

5NN-Euclidean Dist. (p95) | 1 (1.0) | 7 (6.5) | 0.772 | 0.656 | −0.331 | 0.126 |

5NN-CityBlock Dist. (p95) | 2 (2.1) | 8 (7.4) | 0.785 | 0.657 | 0.529 | 0.310 |

5NN-Mahalanobis Dist. (p95) | 2 (2.1) | 6 (5.6) | 0.771 | 0.631 | −0.193 | −0.043 |

Bounding Box | 0 (0.0) | 1 (0.9) | 0.774 | 0.634 | - | 0.037 |

PCA Bounding Box | 0 (0.0) | 2 (1.8) | 0.774 | 0.634 | - | 0.021 |

Convex Hull | 16 (16.8) | 21 (19.4) | 0.780 | 0.643 | 0.049 | 0.051 |

Potential Function | 28 (29.5) | 47 (43.5) | 0.787 | 0.813 | 0.062 | 0.455 |

**Figure 1.**CAESAR test set (

**a**) and Epi Suite test set (

**b**) projected in the training space of Model 2. Training set (+); test set ( ); compounds outside the AD with different approaches; distance based p95 ( ), distance based 5NN ( ), Bound. Box and PCA Bound. Box ( ), Conv. Hull (○), Pot. Funct. ( ).

**Figure 2.**CAESAR test set (

**a**) and Epi Suite test set (

**b**) projected in the training space of Model 5. Training set (+); test set ( ); compounds outside the AD with different approaches; distance based p95 ( ), distance based 5NN ( ), Bound. Box and PCA Bound. Box ( ), Conv. Hull (○), Pot. Funct. ( ).

**Figure 3.**Predicted Vs observed log BCF values for CAESAR test set (

**a**) and Epi Suite test set (

**b**) with Model 2. Training set (+); test set ( ); compounds outside the AD with different approaches; distance based p95 ( ), distance based 5NN ( ), Bound. Box and PCA Bound. Box ( ), Conv. Hull (○), Pot. Funct. ( ).

**Figure 4.**Predicted Vs observed log BCF values for CAESAR test set (

**a**) and Epi Suite test set (

**b**) with Model 5. Training set (+); test set ( ); compounds outside the AD with different approaches; distance based p95 ( ), distance based 5NN ( ), Bound. Box and PCA Bound. Box ( ), Conv. Hull (○), Pot. Funct. ( ).

## 4. Conclusions

## Acknowledgments

## References and Notes

- Netzeva, T.I.; Worth, A.; Aldenberg, T.; Benigni, R.; Cronin, M.T.D.; Gramatica, P.; Jaworska, J.S.; Kahn, S.; Klopman, G.; Marchant, C.A.; et al. Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships. The report and recommendations of ECVAM Workshop 52. Altern. Lab. Anim.
**2005**, 33, 155–173. [Google Scholar] - Jaworska, J.; Nikolova-Jeliazkova, N.; Aldenberg, T. QSAR applicabilty domain estimation by projection of the training set descriptor space: A review. Altern. Lab. Anim.
**2005**, 33, 445–459. [Google Scholar] - Dimitrov, S.; Dimitrova, G.; Pavlov, T.; Dimitrova, N.; Patlewicz, G.; Niemela, J.; Mekenyan, O.A. Stepwise approach for defining the applicability domain of SAR and QSAR models. J. Chem. Inf. Model.
**2005**, 45, 839–849. [Google Scholar] [CrossRef] - REACH. European Community Regulation on chemicals and their safe use. Available online: http://ec.europa.eu/environment/chemicals/reach/reach_intro.htm (accessed on 3 February 2012).
- Worth, A.P.; Bassan, A.; Gallegos, A.; Netzeva, T.I.; Patlewicz, G.; Pavan, M.; Tsakovska, I.; Vracko, M. The Characterisation of (Quantitative) Structure-Activity Relationships: Preliminary Guidance; ECB Report EUR 21866 EN, European Commission, Joint Research Centre: Ispra, Italy, 2005; p. 95. [Google Scholar]
- OECD. Quantitative Structure-Activity Relationships Project [(Q)SARs]. Available online: http://www.oecd.org/document/23/0,3746,en_2649_34377_33957015_1_1_1_1,00.html (accessed on 3 February 2012).
- Worth, A.P.; van Leeuwen, C.J.; Hartung, T. The prospects for using (Q)SARs in a changing political environment: high expectations and a key role for the Commission’s Joint Research Centre. SAR QSAR Environ. Res.
**2004**, 15, 331–343. [Google Scholar] [CrossRef] - Nikolova-Jeliazkova, N.; Jaworska, J. An approach to determining applicability domains for QSAR group contribution models: an analysis of SRC KOWWIN. Altern. Lab. Anim.
**2005**, 33, 461–470. [Google Scholar] - Sheridan, R.; Feuston, R.P.; Maiorov, V.N.; Kearsley, S. Similarity to molecules in the training set is a good discriminator for prediction accuracy in QSAR. J. Chem. Inf. Comp. Sci.
**2004**, 44, 1912–1928. [Google Scholar] [CrossRef] - Zhao, C.; Boriani, E.; Chana, A.; Roncaglioni, A.; Benfenati, E. A new hybrid QSAR model for predicting bioconcentration factor (BCF). Chemosphere
**2008**, 73, 1701–1707. [Google Scholar] [CrossRef] - Lombardo, A.; Roncaglioni, A.; Boriani, E.; Milan, C.; Benfenati, E. Assessment and validation of the CAESAR predictive model for bioconcentration factor (BCF) in fish. Chem. Cent. J.
**2010**, 4 (Suppl 1). [Google Scholar] [CrossRef] - Meylan, W.M.; Howard, P.H.; Aronson, D.; Printup, H.; Gouchie, S. Improved Method for Estimating Bioconcentration Factor (BCF) from Octanol-Water Partition Coefficient, 2nd Update; SRC TR-97-006; Syracuse Research Corp., Environmental Science Center: North Syracuse, NY, USA, 1997; Prepared for: Robert S. Boethling, EPA-OPPT. [Google Scholar]
- Meylan, W.M.; Howard, P.H.; Boethling, R.S.; Aronson, D.; Printup, H.; Gouchie, S. Improved method for estimating bioconcentration/bioaccumulation factor from octanol/water partition coefficient. Environ. Toxicol. Chem.
**1999**, 18, 664–672. [Google Scholar] [CrossRef] - MATLAB. The Language of Technical Computing. Available online: http://www.mathworks.com/products/matlab/ (accessed on 3 February 2012).
- Wold, S.; Esbensen, K.; Geladi, P. Principal component analysis. Chemometr. Intell. Lab.
**1987**, 2, 37–52. [Google Scholar] [CrossRef] - Preparata, F.P.; Shamos, M.I. Convex Hulls: Basic Algorithms. In Computational Geometry: An Introduction; Preparata, F.P., Shamos, M.I., Eds.; Springer-Verlag: New York, NY, USA, 1991; pp. 95–148. [Google Scholar]
- Tropsha, A.; Gramatica, P.; Gombar, V. The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR Models. QSAR Comb.Sci.
**2003**, 22, 69–77. [Google Scholar] [CrossRef] - Jouan-Rimbaud, D.; Bouveresse, E.; Massart, D.L.; de Noord, O.E. Detection of prediction outliers and inliers in multivariate calibration. Anal. Chim. Acta
**1999**, 388, 283–301. [Google Scholar] [CrossRef] - Forina, M.; Armanino, C.; Leardi, R.; Drava, G. A class-modelling technique based on potential functions. J. Chemometr.
**1991**, 5, 435–453. [Google Scholar] [CrossRef] - Tong, W.; Hong, H.; Fang, H.; Xie, Q. Perkins, R. Decision forest: Combining the predictions of multiple independent decision tree models. J. Chem. Inf. Comput. Sci.
**2003**, 43, 525–531. [Google Scholar] - Tong, W.; Hong, H.; Xie, Q.; Xie, L.; Fang, H.; Perkins, R. Assessing QSAR limitations: A regulatory perspective. Curr. Comput. Aid. Drug Des.
**2004**, 1, 65–72. [Google Scholar] - Wan, C.; Harrington, P.B. Self-configuring radial basis function neural networks for chemical pattern recognition. J. Chem. Inf. Comput. Sci.
**1999**, 39, 1049–1056. [Google Scholar] [CrossRef] - DRAGON (Software for Molecular Descriptor Calculations). Talete srl: Milano, Italy. Available online: http://www.talete.mi.it (accessed on 3 February 2012).
- Consonni, V.; Ballabio, D.; Todeschini, R. Comments on the definition of the Q
^{2}parameter for QSAR validation. J. Chem. Inf. Model.**2009**, 49, 1669–1678. [Google Scholar] [CrossRef] - Consonni, V.; Ballabio, D.; Todeschini, R. Evaluation of model predictive ability by external validation techniques. J. Chemometr.
**2010**, 24, 104–201. [Google Scholar] - Tetko, I.V.; Sushko, I.; Pandey, A.K.; Zhu, H.; Tropsha, A.; Papa, E.; Oberg, T.; Todeschini, R.; Fourches, D.; Varnek, A. Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: Focusing on applicability domain and overfitting by variable selection. J. Chem. Inf. Model.
**2008**, 48, 1733–1746. [Google Scholar] [CrossRef] [Green Version] - Weaver, S.; Gleeson, M.P. The importance of the domain of applicability in QSAR modeling. J. Mol. Graph. Model.
**2008**, 26, 1315–1326. [Google Scholar] [CrossRef]

- Sample Availability: The CAESAR data sets used in this study can be requested directly from the CAESAR project (http://www.caesar-project.eu).

© 2012 by the authors; licensee MDPI, Basel, Switzerland. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Sahigara, F.; Mansouri, K.; Ballabio, D.; Mauri, A.; Consonni, V.; Todeschini, R.
Comparison of Different Approaches to Define the Applicability Domain of QSAR Models. *Molecules* **2012**, *17*, 4791-4810.
https://doi.org/10.3390/molecules17054791

**AMA Style**

Sahigara F, Mansouri K, Ballabio D, Mauri A, Consonni V, Todeschini R.
Comparison of Different Approaches to Define the Applicability Domain of QSAR Models. *Molecules*. 2012; 17(5):4791-4810.
https://doi.org/10.3390/molecules17054791

**Chicago/Turabian Style**

Sahigara, Faizan, Kamel Mansouri, Davide Ballabio, Andrea Mauri, Viviana Consonni, and Roberto Todeschini.
2012. "Comparison of Different Approaches to Define the Applicability Domain of QSAR Models" *Molecules* 17, no. 5: 4791-4810.
https://doi.org/10.3390/molecules17054791