# Linear Regression QSAR Models for Polo-Like Kinase-1 Inhibitors

## Abstract

^{2}, and QuBiLs-MAS; such descriptor software complements each other and improves the QSAR results. The best multivariable linear regression models are found with the replacement method variable subset selection technique. The balanced subsets method partitions the dataset into training, validation, and test sets. It is found that the proposed linear QSAR model improves previously reported models by leading to a simpler alternative structure-activity relationship.

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Experimental Dataset

^{−1}, and compounds without reported bioactivities, the dataset consisted of 530 compounds with $I{C}_{50}$ values ranging from 0.8 to 145,000 nM and molecular weights ranging from 164.2 to 949.97 g mol

^{−1}. The complete list of compounds studied here is provided in Table S1 as Supplementary Material.

#### 2.2. Structural Representation and Molecular Descriptors Calculation

^{2}) freeware [33], which generated 777 1D-2D structural variables with molecules in MDL sdf format.

^{2}, and QuBiLs-MAS we derived 26,761 non-conformational molecular descriptors with the intention of exploring the most relevant structural characteristics affecting the studied PLK1 bioactivity.

#### 2.3. Model Development

#### 2.3.1. Molecular Descriptors Selection

#### 2.3.2. Model Validation

#### 2.3.3. Applicability Domain

## 3. Results and Discussion

**171**, 1-{4-[(4-chlorophenyl)methoxy]-3-methoxyphenyl}-N-[(pyridin-4-yl)methyl]methanamine. After close inspection of this specific compound, it is easily concluded that the abnormal behavior can be completely attributed to the highly heterogeneous dataset being analyzed, involving molecular weights from 164.2 to 949.97 g mol

^{−1}and bioactivities from 0.8 to 145,000 nM.

- Two electrotopological state atom-type descriptors: mindssC, the minimum atom-type E-state: =C<; and maxHCsats, the maximum atom-type H E-state: H bonded to B, Si, P, Ge, As, Se, Sn, or Pb.
- A MACCS fingerprint descriptor: M66, the number of CC(C)(C)A fragments, where A is any valid periodic table element symbol.
- Three PubChem fingerprint descriptors: PC494, the presence of O=C-C:N fragment, where ‘:’ denotes bond aromaticity; PC534, the presence of S-C:C-O fragment; and PC686, the presence of O=C-C-C-C-O fragment.
- Two Klekota–Roth fingerprint descriptors: KR3577, the presence of SMARTS substructure Cc1cccc(C)c1NC=O; and KR4268, the presence of SMARTS substructure Nc1ccccc1O.

**516**. The Williams plot (standardized residuals as a function of the ${h}_{i}$ values) is provided in Figure 4.

**457**. Thus, the predicted ${\mathrm{log}}_{10}I{C}_{50}$ values for most of the test set compounds can be considered as reliable.

_{50}≤ 1000 nM as highly active inhibitors and experimental IC

_{50}> 1000 nM as poorly active inhibitors. Then, the Cooper statistics [48] related to accuracy (A%), sensitivity (SE), and specificity (SP) and the Matthews correlation coefficient (MCC) can be calculated. The classification results for Equation (1) in the test set are acceptable as A% = 83%, SE = 0.73, SP = 0.95, MCC = 0.69.

- Our proposed model performs both regression and classification.
- Dataset partitioning: three subsets are considered, such as train, val, and test instead of only two (train and test) in [20]. In this way, it is more convenient for analyzing the predictive performance of the model.
- Model’s size: a fewer number of molecular descriptors are involved in the final selected model—i.e., 8 instead of 10–15. Therefore, the parsimony´s principle is accomplished (Ockham’s razor) [49] by following the common practice of keeping the model’s dimension as small as possible.
- No energy or geometry optimization is performed on the inhibitor chemical structures. The conformation-independent QSAR approach considers only constitutional and topological representations for deriving the molecular descriptors.
- A simpler modeling methodology based on MLR analysis is applied in the present study.

## 4. Conclusions

_{50}values, being predicted as more active PLK1 inhibitors.

## Supplementary Materials

## Acknowledgments

## Conflicts of Interest

**Figure 1.**Some polo-like kinase-1 (PLK1) inhibitors involved in current clinical trials [5].

**Figure 2.**Predicted and experimental ${\mathrm{log}}_{10}I{C}_{50}$ values according to the quantitative structure-activity relationship (QSAR) of Equation (1).

**Table 1.**Molecular descriptors involved in the best linear regression quantitative structure-activity relationship (QSAR) models for polo-like kinase-1 (PLK1) inhibitors. The selected model appears in bold.

d | Descriptors | ${\mathit{R}}_{\mathit{t}\mathit{r}\mathit{a}\mathit{i}\mathit{n}}^{2}$ | ${\mathit{S}}_{\mathit{t}\mathit{r}\mathit{a}\mathit{i}\mathit{n}}$ | ${\mathit{R}}_{\mathit{v}\mathit{a}\mathit{l}}^{2}$ | ${\mathit{S}}_{\mathit{v}\mathit{a}\mathit{l}}$ | ${\mathit{R}}_{\mathit{t}\mathit{e}\mathit{s}\mathit{t}}^{2}$ | ${\mathit{S}}_{\mathit{t}\mathit{e}\mathit{s}\mathit{t}}$ |
---|---|---|---|---|---|---|---|

1 | Sub99 | 0.31 | 1.18 | 0.39 | 1.25 | 0.28 | 1.31 |

2 | PC534; AP170 | 0.49 | 1.02 | 0.56 | 1.08 | 0.52 | 1.06 |

3 | PC534; KR4261; AP170 | 0.52 | 0.99 | 0.68 | 0.95 | 0.57 | 0.98 |

4 | nHBAcc3; PC534; KR4261; AP170 | 0.57 | 0.94 | 0.71 | 0.90 | 0.62 | 0.93 |

5 | PC534; KR3577; KR4268; AP170; KRC3897 | 0.61 | 0.90 | 0.71 | 0.89 | 0.71 | 0.83 |

6 | maxHCsats; M66; PC534; KR3577; KR4268; KRC3897 | 0.64 | 0.87 | 0.74 | 0.85 | 0.69 | 0.84 |

7 | maxHCsats; M66; PC534; PC686; KR3577; KR4268; AP159 | 0.66 | 0.84 | 0.74 | 0.84 | 0.66 | 0.89 |

8 | mindssC; maxHCsats; M66; PC494; PC534; PC686; KR3577; KR4268 | 0.69 | 0.80 | 0.75 | 0.82 | 0.69 | 0.85 |

9 | mindssC; maxHCsats; M66; PC494; PC534; PC686; KR3577; KR4268; APC510 | 0.70 | 0.79 | 0.75 | 0.82 | 0.70 | 0.85 |

