Understanding the Molecular Basis of 5-HT4 Receptor Partial Agonists through 3D-QSAR Studies

Alzheimer’s disease (AD) is a neurodegenerative disorder whose prevalence has an incidence in senior citizens. Unfortunately, current pharmacotherapy only offers symptom relief for patients with side effects such as bradycardia, nausea, and vomiting. Therefore, there is a present need to provide other therapeutic alternatives for treatments for these disorders. The 5-HT4 receptor is an attractive therapeutic target since it has a potential role in central and peripheral nervous system disorders such as AD, irritable bowel syndrome, and gastroparesis. Quantitative structure-activity relationship analysis of a series of 62 active compounds in the 5-HT4 receptor was carried out in the present work. The structure-activity relationship was estimated using three-dimensional quantitative structure-activity relationship (3D-QSAR) techniques based on these structures’ field molecular (force and Gaussian field). The best force-field QSAR models achieve a value for the coefficient of determination of the training set of R2training = 0.821, and for the test set R2test = 0.667, while for Gaussian-field QSAR the training and the test were R2training = 0.898 and R2test = 0.695, respectively. The obtained results were validated using a coefficient of correlation of the leave-one-out cross-validation of Q2LOO = 0.804 and Q2LOO = 0.886 for force- and Gaussian-field QSAR, respectively. Based on these results, novel 5-HT4 partial agonists with potential biological activity (pEC50 8.209–9.417 for force-field QSAR and 9.111–9.856 for Gaussian-field QSAR) were designed. In addition, for the new analogues, their absorption, distribution, metabolism, excretion, and toxicity properties were also analyzed. The results show that these new derivatives also have reasonable pharmacokinetics and drug-like properties. Our findings suggest novel routes for the design and development of new 5-HT4 partial agonists.


S2.1. Dataset Collection
A total of 62 partial agonists of the 5-HT4 receptor which showed promissory potency were collected from the literature [1 -3]. All the compounds with pEC50 values ranging from 5.64 to 10.0 were used in this study. The geometry for all these molecules was converted into a 3D structure using OCHEM. The 3D structure of the molecules was processed with OMEGA [4] module using the following parameters: (i) AM1_BCC Force field, (ii) FixpKa from the QUAPAC package for all possible ionisation states at a given biological pH, (iii) one low energy conformation per ligand. Force-and Gaussian-field 3D-QSAR calculations were performed for all the molecules. All the training and test set molecules with experimental and predicted EC50 values were listed in Table 1.

S2.2. Alignment
Alignment of molecules is the most crucial input for the generation of 3D-QSAR models. The compound with the highest activity (53) was used as the template molecule. A shape-based alignment was used for all conformers of each ligand. These alignments were carried out with ROCS suite [5]. Finally, each ligand's best conformer was filtered considering electrostatic field compound 53, as is shown in Figure S1.

S2.3. Field-Based QSAR Model
3D-QSAR analysis using Field-based methods was performed by QSAR tool of Schrodinger Suite. The 3D-QSAR method constructs the model by relating the known activities and molecular elements of a set of aligned compounds. The steric and electrostatic field around the ligand in a 3D-grid was calculated using field-based 3D-QSAR. Force-field based QSAR model (henceforward FFQSAR) is an alignment-dependent method in which molecular field interaction energy terms are correlated with biological activities/responses using multivariate statistical analyses. In Gaussian-field, 3D-QSAR model interaction energy calculations were performed using steric, electrostatic, hydrogen bond donor (HBD), and hydrogen bond acceptor (HBA) potential fields and it uses Gaussian equations for field calculations (henceforward GFQSAR).
The lattice and probe step sizes were adjusted automatically. The partial least squares (PLS) analysis is applied to construct the best model through the linear correlation of FFQSAR and GFQSAR concerning pEC50 [1-3]. A cross-validation analysis was performed using the leave-one-out method. Finally, the optimum number of components was identified by the cross-validation method. Correlation and cross-validation coefficients (Q 2 and R 2 respectively) were calculated according to the formula: where and are predicted, observed activity values, and and are observed and predicted mean activity values of the training set, respectively. The ∑ − is the predictive residual sum of squares (PRESS). High Q 2 and R 2 (Q 2 > 0.6, R 2 > 0.8) values are regarded as proof of the built model's high predictive ability.

S2.4. Validation of the 3D-QSAR Model
A good internal validation showed only a high Q 2 in the training set of compounds, but it did not indicate the established models' high predictive ability. Therefore, external validation was indispensable. The predictive power of 3D-QSAR models was validated by calculating biological activities of the compounds which were not included in the training set and used as a test set.
The predictive correlation coefficient R 2 test (R 2 test > 0.6) [6], based on the test set was calculated using Equation (3): The sum of squared deviation ( ) between the biological activities of the test set molecules and the mean activity of the training set molecules.
is the sum of squared derivations between the predicted and actual activities of the test set molecules.
The performance of the regression models constructed here was evaluated using the root mean squared error ( ), mean absolute error ( ) ( and close to zero), residual sum of squares (RSS) and concordance correlation coefficient ( ; where > 0.85) of the training and validation sets [7]. The RMSE and the MAE are calculated for the data set as Equations (4) to (7): To obtain the best predictive model for the test set, additional validation of model, the following: 0.85 < k < 1.15 or 0.85 < k' < 1.15 and are squared correlation coefficients of determination for regression lines through the origin between predicted (y) and observed (x) activities and vice versa. The values of k and k0 are the slopes of their models, respectively.
To further assess the models, another statistical validation parameter and ∆ were determined by the following Equations (13) and (14): value of more than 0.5 ( > 0.5) and ∆ < 0.2 show good external predictability of the models.
The internal and external validation for GFQSAR meet threshold values which demonstrates reliability of the model with the SEAD fields. In contrast, FFQSAR has the concordance correlation coefficient (CCC) value below the tolerated threshold value (FFQSAR is 0.815 and the threshold > 0.85). Therefore, the design of new analogues will be based on the GFQSAR model. is the regression coefficient for the test set exclusively; "r 2 0" and "k" are the correlation coefficient between the actual and predicted activities for test set and the respective slope of regression; and "r0' 2 "and "k'" are the correlation coefficient between the predicted and actual activities for test set and the respective slope of regression. "r 2 m" was defined in equation 11. Parameters are defined in the section "Validation of the QSAR model".

S2.5. Prediction ADMET Properties
Drug candidates need to have good ADMET (Absorption, Distribution, Metabolism, Excretion and Toxicity) and druglikeness profiles to initially estimate pharmacokinetic and drug-likeness parameters in the drug discovery process [8].
In this work, new candidates with ADMET properties include human intestinal absorption, steady-state volume of distribution (VDss), hepatic metabolism, total clearance, AMES toxicity, and hepatotoxicity and skin sensitisation properties. ADMET can be predicted using pkCSM [9].
The prediction of drug similarity of new molecules is estimated using parameters based on Lipinski, Ghose, Veber and Egan rules, and their synthetic accessibility by applying the SwissADME web tool [10] (http://www.swissadme.ch). The SwissADME synthetic accessibility score is mainly based on the assumption of the molecular fragments in the "actually" obtainable molecules which correlates with the ease of synthesis. The score is normalised to range from 1 (very easy) to 10 (very difficult to synthesise).    Table S5. Lipophilicity calculated for the 10 compounds selected with SwissADME.