The large amount of data that has been collected so far for G protein-coupled receptors requires machine learning (ML) approaches to fully exploit its potential. Our previous ML model based on gradient boosting used for prediction of drug affinity and selectivity for a receptor subtype was compared with explicit information on ligand-receptor interactions from induced-fit docking. Both methods have proved their usefulness in drug response predictions. Yet, their successful combination still requires allosteric/orthosteric assignment of ligands from datasets. Our ligand datasets included activities of two members of the secretin receptor family: GCGR and GLP-1R. Simultaneous activation of two or three receptors of this family by dual or triple agonists is not a typical kind of information included in compound databases. A precise allosteric/orthosteric ligand assignment requires a continuous update based on new structural and biological data. This data incompleteness remains the main obstacle for current ML methods applied to class B GPCR drug discovery. Even so, for these two class B receptors, our ligand-based ML model demonstrated high accuracy (5-fold cross-validation Q2
> 0.63 and Q2
> 0.67 for GLP-1R and GCGR, respectively). In addition, we performed a ligand annotation using recent cryogenic-electron microscopy (cryo-EM) and X-ray crystallographic data on small-molecule complexes of GCGR and GLP-1R. As a result, we assigned GLP-1R and GCGR actives deposited in ChEMBL to four small-molecule binding sites occupied by positive and negative allosteric modulators and a full agonist. Annotated compounds were added to our recently released repository of GPCR data.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited