1. Introduction
Preclinical toxicology studies are an important step in drug development that provides an early safety assessment prior to clinical trials. One of the key aspects in such studies is the assessment of cytotoxicity in cell lines, which allows the identification of the adverse effects of compounds on cell viability and the estimation of possible general, organ-specific, or tissue toxicity. Cell lines are populations of cells capable of long-term cultivation under artificial conditions. Human cell lines are diverse tools covering dozens of tissue types and hundreds of pathologies, making them indispensable in research. They are widely used to study disease mechanisms, evaluate drug-like compounds, and assess their toxicity. In preclinical toxicology studies, cell lines are used for the analysis of both general and organ-specific toxicity of substances. For example, HepG2 cells are used to assess hepatotoxicity [
1], BEAS-2B cells are needed to describe the effects of substances on the respiratory tract [
2], and HUVEC cells are employed to analyze vascular toxicity [
3]. Due to their high reproducibility, standardization, and availability, cell lines are widely used in preclinical toxicology studies to assess the effects of chemical compounds on cell viability and function [
4].
Modern cell lines are divided into tumor and non-tumor, as well as into transformed and normal cell lines. They are widely used to study the processes of cell division, apoptosis, genetic changes, and metabolic pathways, not to mention their role in the assessment of antitumor activity and toxicity of substances [
5]. Currently, more than 6000 human cell lines have been registered in international databases, such as Cellosaurus (
https://www.cellosaurus.org, accessed on 12 December 2025), covering a variety of tissues and organs, from skin and lungs to liver, kidneys, and the vascular system [
6]. These include both tumor lines (e.g., HepG2, Caki-1) and normal (non-tumor) lines (e.g., NHDF, MRC-5, HUVEC). Such diversity allows the modeling of both systemic and organ-specific effects, including inflammation, metabolism, cellular senescence, and toxicity. The diversity of cell lines makes the selection of an experimental model more flexible and increases the reliability of results in preclinical studies. ATCC (American Type Culture Collection) includes over 500 human cell lines that may be used in toxicological studies (
https://www.atcc.org/, accessed on 12 December 2025).
Today, there has been accumulated a large volume of experimental data on the cytotoxicity of compounds in relation to various cell lines. This enables the creation of predictive models, Quantitative Structure–Activity Relationship (QSAR) models among them. The development of QSAR models can significantly reduce both the time and cost of studies, providing toxicity predictions based on theoretical calculations when only the structural formula of a compound is known. The ChEMBL database [
7] contains a significant body of information on experimental studies, including IC
50 and GI
50 values, making it an important and convenient resource for developing QSAR models [
8]. PubChem also provides data on experimental studies of cytotoxicity for substances on different cell lines [
9].
While traditional non-clinical testing relies predominantly on animal studies, regulatory agencies increasingly recognize computational approaches as complementary tools within integrated testing strategies. The International Council for Harmonization of Technical Requirements for Pharmaceuticals for Human Use (ICH M7(R2)) guideline represents a landmark achievement establishing the first internationally harmonized framework for the regulatory acceptance of QSAR predictions as an alternative to experimental testing in assessing bacterial mutagenicity of pharmaceutical impurities [
10]. However, for organ-specific toxicity endpoints relevant to cell line-based assessments, no equivalent regulatory frameworks have been developed so far to formally endorse QSAR models as a replacement for in vivo studies. The OECD Guidance Document on QSAR Validation previously established five essential principles for model validation: defined endpoint, unambiguous algorithm, defined applicability domain, appropriate validation metrics, and mechanistic interpretation [
11]. The recent OECD (Q)SAR Assessment Framework extends these principles with a systematic methodology for the regulatory assessment within defined contexts of use [
12]. Major regulatory initiatives explicitly support New Approach Methodologies (NAMs) development. The FDA’s Predictive Toxicology Roadmap (2017) emphasizes the “context of use” as the foundation for qualification and regulatory acceptance [
13]. The FDA/CDER 2020 commentary [
14] acknowledges that improvements increasing clinical outcome predictivity are encouraged and needed, explicitly recognizing QSAR as NAMs. The EMA NAMs Horizon Scanning Report [
15] confirms that while no NAMs are currently qualified for regulatory use, such methodologies are “advancing to technology readiness levels sufficient for initial engagement with regulators.” The trajectory established by ICH M7(R2) for mutagenicity prediction provides a precedent suggesting that as experience accumulates with organ-specific toxicity QSAR models and validation datasets expand, similar regulatory frameworks may emerge for cell-based cytotoxicity predictions. The models presented in our study align with the 3Rs principles (Replacement, Reduction, Refinement) and OECD validation principles, providing hazard identification, compound prioritization, and mechanistic investigation in drug development.
Recently, Feitosa with co-authors developed the Cyto-Safe web application with SAR models for predicting cytotoxicity of substances on two cell lines, 3T3 and HEK 293 [
16]. Earlier, we developed freely available web applications CLC-Pred [
17] and CLC-Pred 2.0 [
18] with SAR models that made qualitative predictions of compound toxicity against hundreds of tumor and non-tumor human cell lines based on ChEMBL and PubChem data. Such web applications provide a general idea of cytotoxic potential against cell lines but do not offer quantitative assessments that are important when comparing the activity level. The estimation of IC50 and GI50 values is crucial for the calculation of therapeutic interval and dosage of substances in preclinical and clinical studies and can serve as a characteristic used to select the most promising drug candidates.
Verma and Hansch performed one of the earliest QSAR models of podophyllotoxin derivatives for four cancer cell lines using physicochemical descriptors. The resulting models demonstrated a high degree of agreement with the experimental data (R
2 from 0.960 to 0.836) and a good predictive ability (Q
2 from 0.911 to 0.705) [
19]. Q
2 is the cross-validated R
2, determined by a leave-one-out procedure on the training set. Earlier, we also demonstrated the development and implementation in BC CLC Pred web-application of QSAR models to predict IC
50 and GI
50 values for compounds tested on nine human breast cancer cell lines with a reasonable accuracy of prediction (mean R
2 and RMSE values calculated by 5-fold cross-validation were 0.599 and 0.679, respectively) [
8].
As most scientists lack the opportunity to test substance cytotoxicity on panels of human cell-lines [
8], the development of a freely available in silico tool to predict quantitative values of cytotoxicity against different non-tumor and tumor human cell lines is in high demand. In this study we created high quality QSAR models by GUSAR software (version 2014) [
8,
20,
21] for dozens of human cell lines, which have been implemented in the CLC-Pred 2.0 web application (
https://way2drug.com/clc-pred/, accessed on 12 December 2025) [
18].
3. Discussion
The ability to quantitatively predict IC
50 and GI
50 values for sets of non-tumor and tumor cell lines provides an opportunity to select the most promising drug candidates at the early stages of drug development. This can be best demonstrated using anticancer drugs comparing paclitaxel (Taxol), that has a broad spectrum of cytotoxic effect on both non-tumor and tumor cells, with the targeted anticancer drugs trametinib and dabrafenib. Trametinib, which is an MEK inhibitor, and dabrafenib, which is a BRAF inhibitor, are used to treat solid tumors, lymphomas, or multiple myeloma with BRAF V600E driver mutation [
65]. The prediction results of pIC
50 and pGI
50 values made by the created QSAR models are presented in
Table 6.
Table 6 shows that the prediction results for paclitaxel include high values of pIC
50 and pGI
50 (more 7) for many non-tumor and tumor cell lines. It correlates with the knowledge on general toxicity [
66] and organ-specific toxicity of paclitaxel: gastrointestinal toxicity (esophagus, stomach, small intestine) [
67], hepatotoxicity [
68], endothelium toxicity [
69], lung toxicity [
70], skin toxicity [
71], and immunotoxicity [
72,
73] associated with its mechanism of action—polymerized microtubule accumulation and mitotic arrest. At the same time, the high values of pIC
50 and pGI
50 for some tumor cell lines reflect several main fields of therapeutic applications of paclitaxel: melanoma, lung, and esophageal cancer [
74]. Despite the high predicted cytotoxicity values for non-tumor cell lines averaging 6.889, the average predicted cytotoxicity value for tumor cells is approximately four times higher at 7.502 (7.502 − 6.889 = 0.613 in logarithmic scale). The log-transformed value of 0.613 corresponds to a decimal value of 4.1. It means that paclitaxel is, on average, more cytotoxic against tumor cells than non-tumor ones. This reflects the existence of a reasonable therapeutic window (TI—Therapeutic Index = 4.1) for its clinical use. Hertz with co-authors also claimed that paclitaxel is similar to many anticancer agents in having a relatively narrow therapeutic window [
75].
The prediction results for trametinib and dabrafenib indicate that they may be less toxic than paclitaxel. Their mean predicted cytotoxicity values for non-tumor cell lines were 5.604 and 5.328, respectively. Trametinib and dabrafenib are targeted therapy drugs, and they are considered less toxic than chemotherapy drugs such paclitaxel [
76,
77]. The low predicted values of cytotoxicity for trametinib and dabrafenib on non-tumor cell lines correlate with the absence of dangerous organ-specific toxic effects of these drugs which may be related to cytotoxicity in these cell lines [
78]. Nevertheless, the prediction results with cytotoxic values higher than average ones in non-tumor cell lines may be associated with some types of adverse drug reactions of these drugs. Skin-related toxic effects, hypertension, and diarrhea caused by trametinib [
79] may be associated with the prediction of high cytotoxic action on skin A-431, endothelial HUVEC [
80], and colon (COLO205, HCT-8, SW-620) cell lines, respectively. Dabrafenib also reveals cutaneous adverse reactions and diarrhea [
81] that may be associated with the prediction results for skin A-431 and colon (COLO205, HCT-8) cell lines, respectively. Moreover,
Table 6 includes high predicted values of cytotoxicity for blood cells. While less frequent and severe than other side effects like pyrexia and skin toxicity, hematologic adverse reactions such as anemia, leukopenia, and neutropenia for dabrafenib [
82] and anemia for trametinib [
83] have also been reported. The mean predicted cytotoxicity values of trametinib and dabrafenib for tumor cell lines were 6.241 and 6.119, respectively. These values indicate that the cytotoxicity of trametinib and dabrafenib on tumor cell lines is higher than on non-tumor cell lines. The difference between the average predicted cytotoxic values of these drugs for non-tumor and tumor cell lines also displays the existence of a therapeutic interval. It is higher for dabrafenib (6.119 − 5.328 = 0.791 in logarithmic scale) than for trametinib (6.241 − 5.604 = 0.637 in logarithmic scale). The log-transformed values of 0.791 and 0.637 correspond to decimal values of 6.2 and 4.3, respectively, reflecting the estimation of therapeutic indexes of dabrafenib and trametinib. This correlates with the data on the narrow therapeutic window for these drugs [
84,
85]. There are no strict criteria for TI. TI ≥ 10 is considered preferable for selecting universal drugs and TI > 5 is a criterion for considering a drug candidate for further preclinical study [
86]. Nevertheless, a low TI (<2), known as a narrow therapeutic index drug, is also used for some severe diseases [
87].
This illustration along with the comparison of prediction results of pIC50 and pGI50 values for paclitaxel, trametinib, and dabrafenib show the usefulness of the created QSAR models for the estimation of possible toxicity of drug-candidates and their potential in terms of the presence of a therapeutic window. In the future, when data on new drugs become available, it would be advisable to conduct similar studies on a larger external set to more accurately assess the possibility of using the created QSAR models to determine the therapeutic interval and toxicity of novel pharmaceutical agents.
Regarding the QSAR models themselves, the resulting model analysis revealed that the quality of QSAR predictions depends significantly on the cell line type. In particular, for some lines, models with high R25-fold CV (>0.6) and RMSE5-fold CV (<0.8) values were obtained, indicating high accuracy and reproducibility of the predictions. This is typical, for example, of the HaCaT, GES-1, and HUVEC cell lines that have proven to be successful in cytotoxicology studies, and which exhibit a stable phenotype in vitro.
Organ-specific characteristics also influence the cytotoxic response. For example, cell lines mimicking barrier tissues (skin-HaCaT, intestine-Caco-2, respiratory tract-Calu-3) demonstrated specific sensitivities to certain classes of compounds, which is important to consider when interpreting the results. Endothelial cells (HUVEC) also demonstrated highly reproducible cytotoxic effects, making them useful for assessing angiotoxicity. Liver cell lines (HepG2) have metabolic activity allowing for the impact of compound biotransformation to be considered; however, their predictive accuracy may vary depending on the specific compounds tested.
The comparison of non-tumor and tumor cell lines revealed some trends: in general, models built on non-tumor cells are characterized by higher stability; however, they are sometimes inferior in accuracy to models based on tumor lines, especially if the latter have a pronounced mitotic potential (e.g., Caki-1, A549). This is because tumor lines often have a simplified response to chemical exposure due to impaired regulation of apoptosis and the cell cycle, which facilitates the detection of cytotoxic effects. Non-tumor cell lines, in contrast, are closer to physiological norms, making their results more relevant for assessing potential toxicity in the body, but this requires a more precise modeling approach. Due to the narrow therapeutic index of certain drugs, it is crucial to have highly accurate QSAR models. However, not all QSAR models achieve high predictive performance. To address this limitation and enhance the reliability of the therapeutic index estimates, the following strategies can be employed:
Prioritizing high-accuracy QSAR models: Selecting prediction results from models with the highest accuracy, such as models for predicting IC50 values in non-tumor cell lines (HaCaT, HEK-293T, MCF-10A) as well as in tumor cell lines (COLO 205, HCT-8), and GI50 values in tumor cell lines (THP-1, U-937)
Leveraging models with medium accuracy but large training sets: Consider QSAR models with moderate predictive performance if they are built on extensive training datasets, e.g., for non-tumor cell lines (HEK-293, MRC-5) and tumor cell lines (A-431, THP-1, U-937). Such models typically cover a broader chemical space and exhibit greater robustness due to the diversity of the training data
Excluding predictions beyond the applicability domain (AD): Avoid using predictions marked as “out of AD” that can be identified in the following cases.
the query structure exhibits less than 70 % similarity to any compound in the training set
the predicted value deviates from the values of the three most similar compounds in the training set by more than the RMSE values of a model.
In summary, although the accuracy of individual QSAR models may vary, their combined use can help to overcome the limitations of any one model and provide a more reliable estimation of the therapeutic index.
The results show that the quality of QSAR models depends on both the amount of experimental data and the cell line type. Immune system cell lines (THP-1, U-937) demonstrated better predictive power, possibly due to a more robust biological response and lower heterogeneity. However, for the lines with high biological heterogeneity (e.g., COLO 205 (pGI50) and A549), the models were less accurate.
Therefore, when developing and using QSAR models for cytotoxicity prediction, it is important to consider both the biological nature of the cell line and the specifics of its application. Combining tumor and non-tumor models provides a more comprehensive understanding of a compound’s toxicity profile and increases the reliability of in silico assessments.
Increasing the number of experimental IC
50 and GI
50 values for both active and inactive compounds will lead to the creation of new efficient and reliable QSAR models for a wide range of cell lines and will help to efficiently discover new promising drugs at an early stage of their development. The current implementation of GUSAR is based on atomic neighborhood descriptors QNA and MNA, as well as several physico-chemical descriptors. It also uses the original SCR (Self-Consistent Regression) and RBF–SCR (Radial Basis Function network using SCR) algorithms. The use of additional molecular descriptors and machine learning algorithms to improve the accuracy and predictivity of QSAR models for cytotoxicity prediction may be the subject of discussion. However, as the widespread use of QSAR has demonstrated, the current accuracy of the obtained models is primarily limited by the quantity and accuracy of the experimental data used for training and testing [
20,
88,
89].