Next Article in Journal
Transcriptome Analysis in Renal Transplant Biopsies Not Fulfilling Rejection Criteria
Previous Article in Journal
Sustained Release of Decoy Wnt Receptor (sLRP6E1E2)-Expressing Adenovirus Using Gel-Encapsulation for Scar Remodeling in Pig Model
Previous Article in Special Issue
The Major Cat Allergen Fel d 1 Binds Steroid and Fatty Acid Semiochemicals: A Combined In Silico and In Vitro Study
Open AccessArticle

Accurate Representation of Protein-Ligand Structural Diversity in the Protein Data Bank (PDB)

SBX Corp., Tokyo-to, Shinagawa-ku, Tokyo 141-0022, Japan
Discngine SAS, 75012 Paris, France
INSERM, UMR_S 1134, DSIMB, Univ Paris, INTS, Laboratoire d’Excellence GR-Ex, 75015 Paris, France
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2020, 21(6), 2243;
Received: 28 January 2020 / Revised: 6 March 2020 / Accepted: 20 March 2020 / Published: 24 March 2020
The number of available protein structures in the Protein Data Bank (PDB) has considerably increased in recent years. Thanks to the growth of structures and complexes, numerous large-scale studies have been done in various research areas, e.g., protein–protein, protein–DNA, or in drug discovery. While protein redundancy was only simply managed using simple protein sequence identity threshold, the similarity of protein-ligand complexes should also be considered from a structural perspective. Hence, the protein-ligand duplicates in the PDB are widely known, but were never quantitatively assessed, as they are quite complex to analyze and compare. Here, we present a specific clustering of protein-ligand structures to avoid bias found in different studies. The methodology is based on binding site superposition, and a combination of weighted Root Mean Square Deviation (RMSD) assessment and hierarchical clustering. Repeated structures of proteins of interest are highlighted and only representative conformations were conserved for a non-biased view of protein distribution. Three types of cases are described based on the number of distinct conformations identified for each complex. Defining these categories decreases by 3.84-fold the number of complexes, and offers more refined results compared to a protein sequence-based method. Widely distinct conformations were analyzed using normalized B-factors. Furthermore, a non-redundant dataset was generated for future molecular interactions analysis or virtual screening studies. View Full-Text
Keywords: protein-ligand complexes; dataset; clustering; structural alignment; refinement protein-ligand complexes; dataset; clustering; structural alignment; refinement
Show Figures

Graphical abstract

MDPI and ACS Style

Shinada, N.K.; Schmidtke, P.; de Brevern, A.G. Accurate Representation of Protein-Ligand Structural Diversity in the Protein Data Bank (PDB). Int. J. Mol. Sci. 2020, 21, 2243.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

Back to TopTop